nnet3bin/nnet3-xvector-compute.cc
将特征在xvector神经网络模型中前向传播,并写出输出向量。我们将说话人识别的特定神经网络结构的输出向量或embedding称之为"Xvector"。该网络结构包括:帧级别的多个前馈层、帧级别之上的聚合层、统计池化层以及段级别的附加层。通常在统计池化层之后的输出层提取xvector。默认情况下,每个语句生成一个xvector。根据需要,可以chunk中提取多个xvector并求平均,以生成单个矢量。
Usage: nnet3-xvector-compute [options] <raw-nnet-in> <features-rspecifier> <vector-wspecifier>
e.g.: nnet3-xvector-compute final.raw scp:feats.scp ark:nnet_prediction.ark
对一个语音特征chunk,生成一个xvector
|
static void RunNnetComputation(const MatrixBase<BaseFloat> &features, const Nnet &nnet, CachingOptimizingCompiler *compiler, Vector<BaseFloat> *xvector) { ComputationRequest request; request.need_model_derivative = false; request.store_component_stats = false; request.inputs.push_back( IoSpecification("input", 0, features.NumRows())); IoSpecification output_spec; output_spec.name = "output"; output_spec.has_deriv = false; 将output-node所请求的输出Cindex索引数限制为1,这样,一个chunk(segment)只输出一个结果,即xvector output_spec.indexes.resize(1); request.outputs.resize(1); request.outputs[0].Swap(&output_spec); std::shared_ptr<const NnetComputation> computation(std::move(compiler->Compile(request))); Nnet *nnet_to_update = NULL; // we're not doing any update. NnetComputer computer(NnetComputeOptions(), *computation, nnet, nnet_to_update); CuMatrix<BaseFloat> input_feats_cu(features); computer.AcceptInput("input", &input_feats_cu); computer.Run(); CuMatrix<BaseFloat> cu_output; //输出的cu_output为行数为1的矩阵 computer.GetOutputDestructive("output", &cu_output); xvector->Resize(cu_output.NumCols()); //取输出矩阵的第一行向量作为xvector xvector->CopyFromVec(cu_output.Row(0)); } |
|
ParseOptions po(usage); Timer timer; NnetSimpleComputationOptions opts; CachingOptimizingCompilerOptions compiler_config; opts.acoustic_scale = 1.0; // by default do no scaling in this recipe. std::string use_gpu = "no"; int32 chunk_size = -1, min_chunk_size = 100; //若帧组不足一个chunk,则对input进行左右padding。 bool pad_input = true; opts.Register(&po); compiler_config.Register(&po); po.Register("use-gpu", &use_gpu, "yes|no|optional|wait, only has effect if compiled with CUDA"); po.Register("chunk-size", &chunk_size, "If set, extracts xectors from specified chunk-size, and averages. " "If not set, extracts an xvector from all available features."); po.Register("min-chunk-size", &min_chunk_size, "Minimum chunk-size allowed when extracting xvectors."); po.Register("pad-input", &pad_input, "If true, duplicate the first and " "last frames of the input features as required to equal min-chunk-size."); po.Read(argc, argv); if (po.NumArgs() != 3) { po.PrintUsage(); exit(1); } #if HAVE_CUDA==1 CuDevice::Instantiate().SelectGpuId(use_gpu); #endif std::string nnet_rxfilename = po.GetArg(1), feature_rspecifier = po.GetArg(2), vector_wspecifier = po.GetArg(3); Nnet nnet; ReadKaldiObject(nnet_rxfilename, &nnet); SetBatchnormTestMode(true, &nnet); SetDropoutTestMode(true, &nnet); CollapseModel(CollapseModelConfig(), &nnet); CachingOptimizingCompiler compiler(nnet, opts.optimize_config, compiler_config); BaseFloatVectorWriter vector_writer(vector_wspecifier); int32 num_success = 0, num_fail = 0; int64 frame_count = 0; int32 xvector_dim = nnet.OutputDim("output"); SequentialBaseFloatMatrixReader feature_reader(feature_rspecifier); for (; !feature_reader.Done(); feature_reader.Next()) { std::string utt = feature_reader.Key(); const Matrix<BaseFloat> &features (feature_reader.Value()); if (features.NumRows() == 0) { KALDI_WARN << "Zero-length utterance: " << utt; num_fail++; continue; } int32 num_rows = features.NumRows(), feat_dim = features.NumCols(), this_chunk_size = chunk_size; if (!pad_input && num_rows < min_chunk_size) { KALDI_WARN << "Minimum chunk size of " << min_chunk_size << " is greater than the number of rows " << "in utterance: " << utt; num_fail++; continue; } else if (num_rows < chunk_size) { KALDI_LOG << "Chunk size of " << chunk_size << " is greater than " << "the number of rows in utterance: " << utt << ", using chunk size of " << num_rows; this_chunk_size = num_rows; } else if (chunk_size == -1) { this_chunk_size = num_rows; } //num_chunks=1 int32 num_chunks = ceil( num_rows / static_cast<BaseFloat>(this_chunk_size)); Vector<BaseFloat> xvector_avg(xvector_dim, kSetZero); BaseFloat tot_weight = 0.0; // Iterate over the feature chunks. for (int32 chunk_indx = 0; chunk_indx < num_chunks; chunk_indx++) { //若接近输入的末尾,需要考虑剩余的帧是否足以凑足一个chunk。 int32 offset = std::min( this_chunk_size, num_rows - chunk_indx * this_chunk_size); if (!pad_input && offset < min_chunk_size) continue; SubMatrix<BaseFloat> sub_features( features, chunk_indx * this_chunk_size, offset, 0, feat_dim); Vector<BaseFloat> xvector; tot_weight += offset; // Pad input if the offset is less than the minimum chunk size if (pad_input && offset < min_chunk_size) { Matrix<BaseFloat> padded_features(min_chunk_size, feat_dim); int32 left_context = (min_chunk_size - offset) / 2; int32 right_context = min_chunk_size - offset - left_context; for (int32 i = 0; i < left_context; i++) { padded_features.Row(i).CopyFromVec(sub_features.Row(0)); } for (int32 i = 0; i < right_context; i++) { padded_features.Row(min_chunk_size - i - 1).CopyFromVec(sub_features.Row(offset - 1)); } padded_features.Range(left_context, offset, 0, feat_dim).CopyFromMat(sub_features); //一个chunk生成一个xvector RunNnetComputation(padded_features, nnet, &compiler, &xvector); } else { RunNnetComputation(sub_features, nnet, &compiler, &xvector); } //将所有chunk的xvectors进行累加 xvector_avg.AddVec(offset, xvector); } //求所有chunk的平均xvector xvector_avg.Scale(1.0 / tot_weight); vector_writer.Write(utt, xvector_avg); frame_count += features.NumRows(); num_success++; }
|
nnet3bin/nnet3-xvector-compute.cc的更多相关文章
- openStack kilo 手动Manual部署随笔记录
一 ,基于neutron网络资源主机(控制节点,网络节点,计算节点)网络规划配置 1, controller.cc 节点 网络配置截图
- World Finals 2017
Need for Speed Sheila is a student and she drives a typical student car: it is old, slow, rusty, a ...
- 图像匹配 | NCC 归一化互相关损失 | 代码 + 讲解
文章转载自:微信公众号「机器学习炼丹术」 作者:炼丹兄(已授权) 作者联系方式:微信cyx645016617(欢迎交流共同进步) 本次的内容主要讲解NCCNormalized cross-correl ...
- Xvector in Kaldi nnet3
Xvector nnet Training of Xvector nnet Xvector nnet in Kaldi Statistics Extraction Layer in Kaldi ...
- [CC]区域生长算法——点云分割
基于CC写的插件,利用PCL中算法实现: void qLxPluginPCL::doRegionGrowing() { assert(m_app); if (!m_app) return; const ...
- [CC]点云密度计算
包括两种计算方法:精确计算和近似计算(思考:local density=单位面积的点数 vs local density =1/单个点所占的面积) 每种方法可以实现三种模式的点云密度计算,CC里面的 ...
- Atitti.dw cc 2015 绿色版本安装总结
Atitti.dw cc 2015 绿色版本安装总结 1.1. 安装程序无法初始化.请下载adobe Support Advisor检测该问题.1 1.1.1. Adobe Application M ...
- C#中DataTable中的Compute方法使用收集
原文: C#中DataTable中的Compute方法使用收集 Compute函数的参数就两个:Expression,和Filter. Expresstion是计算表达式,关于Expression的详 ...
- 【Hello CC.NET】CC.NET 实现自动化集成
一.背景 公司的某一金融项目包含 12 个子系统,新需求一般按分支来开发,测完后合并到主干发布.开发团队需要同时维护开发环境.测试环境.模拟环境(主干).目前面临最大的两个问题: 1.子系统太多,每次 ...
随机推荐
- springmvc中的类型转换器
在使用springmvc时可能使用@RequestParam注解或者@RequestBody注解,他们的作用是把请求体中的参数取出来,给方法的参数绑定值. 假如方法的参数是自定义类型,就要用到类型转换 ...
- HttpServletResponse简单理解
Web服务器收到一个http请求,会针对每个请求创建一个HttpServletRequest和HttpServletResponse对象,从客户端取数据用HttpServletRequest,向客户端 ...
- CF786B Legacy(线段树优化建图)
嘟嘟嘟 省选Day1T2不仅考了字符串,还考了线段树优化建图.当时不会,现在赶快学一下. 线段树能优化的图就是像这道题一样,一个点像一个区间的点连边,或一个区间像一个点连边.一个个连就是\(O(n ^ ...
- 详解Linux双网卡绑定之bond0
1.什么是bond? 网卡bond是通过多张网卡绑定为一个逻辑网卡,实现本地网卡的冗余,带宽扩容和负载均衡,在生产场景中是一种常用的技术.Kernels 2.4.12及以后的版本均供bonding模块 ...
- matlab读取csv文件数据并绘图
circle.m(画二维圆的函数) %该函数是画二维圆圈,输入圆心坐标和半径%rectangle()函数参数‘linewidth’修饰曲线的宽度%'edgecolor','r',edgecolor表示 ...
- Filebeat配置参考手册
Filebeat的配置参考 指定要运行的模块 前提: 在运行Filebeat模块之前,需要安装并配置Elastic堆栈: 安装Ingest Node GeoIP和User Agent插件.这些插件需要 ...
- 关于mysql中的count()函数
1.count()函数是用来统计表中记录的一个函数,返回匹配条件的行数. 2.count()语法: (1)count(*)---包括所有列,返回表中的记录数,相当于统计表的行数,在统计结果的时候,不会 ...
- Web并发页面访问量统计实现
Web并发页面访问量统计实现 - huangshulang1234的博客 - CSDN博客https://blog.csdn.net/huangshulang1234/article/details/ ...
- Uint 7.文本和字体属性,background,精灵图和3种定位
一. 文本属性 CSS 文本属性可定义文本的外观. 通过文本属性,您可以改变文本的颜色.字符间距,对齐文本,装饰文本,对文本进行缩进,等等. <!DOCTYPE html> <htm ...
- C# 中ref与out关键字区别
ref 关键字通过引用传递的参数的内存地址,而不是值.简单点说就是在方法中对参数的任何改变都会改变调用方的基础参数中.代码举例: class RefExample { static void Meth ...