参考   kaldi 的全部资料_v0.4

cmd.sh 脚本为:

可以很清楚的看到有 3 个分类分别对应 a,b,c。a 和 b 都是集群上去运行这个样子, c

就是我们需要的。我们在虚拟机上运行的。你需要修改这个脚本

# "queue.pl" uses qsub. The options to it are
# options to qsub. If you have GridEngine installed,
# change this to a queue you have access to.
# Otherwise, use "run.pl", which will run jobs locally
# (make sure your --num-jobs options are no more than
# the number of cpus on your machine. #a) JHU cluster options
#export train_cmd="queue.pl -l arch=*64"
#export decode_cmd="queue.pl -l arch=*64,mem_free=2G,ram_free=2G"
#export mkgraph_cmd="queue.pl -l arch=*64,ram_free=4G,mem_free=4G"
#export cuda_cmd=run.pl #b) BUT cluster options
#export train_cmd="queue.pl -q all.q@@blade -l ram_free=1200M,mem_free=1200M"
#export decode_cmd="queue.pl -q all.q@@blade -l ram_free=1700M,mem_free=1700M"
#export decodebig_cmd="queue.pl -q all.q@@blade -l ram_free=4G,mem_free=4G"
#export cuda_cmd="queue.pl -q long.q@@pco203 -l gpu=1"
#export cuda_cmd="queue.pl -q long.q@pcspeech-gpu"
#export mkgraph_cmd="queue.pl -q all.q@@servers -l ram_free=4G,mem_free=4G" #c) run it locally...
export train_cmd=run.pl
export decode_cmd=run.pl
export cuda_cmd=run.pl
export mkgraph_cmd=run.pl

Path.sh 的内容:  

在这里一般只要修改 export KALDI_ROOT=`pwd`/../../..改为你安装 kaldi 的目录,有时候不

修改也可以,根据实际情况。

export KALDI_ROOT=`pwd`/../../..
[ -f $KALDI_ROOT/tools/env.sh ] && . $KALDI_ROOT/tools/env.sh
export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$KALDI_ROOT/tools/irstlm/bin/:$PWD:$PATH
[ ! -f $KALDI_ROOT/tools/config/common_path.sh ] && echo >& "The standard file $KALDI_ROOT/tools/config/common_path.sh is not present -> Exit!" && exit
. $KALDI_ROOT/tools/config/common_path.sh
export LC_ALL=C

Run.sh

需要指定你的数据在什么路径下,你只需要修改:
如:

#timit=/export/corpora5/LDC/LDC93S1/timit/TIMIT # @JHU
timit=/mnt/matylda2/data/TIMIT/timit # @BUT

修改为你的 timit 所在的路径。
其他的数据库都一样。
此外,voxforge 或者 vystadial_cz 或者 vystadial_en 这些数据库都提供下载,没有数据库的可
以利用这些来做实验。
最后,来解释下 run.sh 脚本。我们就用 timit 里的 s5 来举例阐述:

位置: /home/dahu/myfile/my_git/kaldi/egs/timit/s5

#!/bin/bash

#
# Copyright Bagher BabaAli,
# - Brno University of Technology (Author: Karel Vesely)
#
# TIMIT, description of the database:
# http://perso.limsi.fr/lamel/TIMIT_NISTIR4930.pdf
#
# Hon and Lee paper on TIMIT, , introduces mapping to training phonemes,
# then re-mapping to phonemes for scoring:
# http://repository.cmu.edu/cgi/viewcontent.cgi?article=2768&context=compsci
# . ./cmd.sh
[ -f path.sh ] && . ./path.sh #最好看看path.sh 的路径是否有问题
set -e # Acoustic model parameters ,声学模型的参数,暂时先不改
numLeavesTri1=
numGaussTri1=
numLeavesMLLT=
numGaussMLLT=
numLeavesSAT=
numGaussSAT=
numGaussUBM=
numLeavesSGMM=
numGaussSGMM= feats_nj=
train_nj=
decode_nj= #nj是指需要运行jobs的数量,一般不超过cpu的数量 echo ============================================================================
echo " Data & Lexicon & Language Preparation "
echo ============================================================================ #timit=/export/corpora5/LDC/LDC93S1/timit/TIMIT # @JHU
timit=/mnt/matylda2/data/TIMIT/timit # @BUT #修改为自己的timit所在路径 local/timit_data_prep.sh $timit || exit local/timit_prepare_dict.sh # Caution below: we remove optional silence by setting "--sil-prob 0.0",
# in TIMIT the silence appears also as a word in the dictionary and is scored.
utils/prepare_lang.sh --sil-prob 0.0 --position-dependent-phones false --num-sil-states \
data/local/dict "sil" data/local/lang_tmp data/lang local/timit_format_data.sh echo ============================================================================
echo " MFCC Feature Extration & CMVN for Training and Test set "
echo ============================================================================ # Now make MFCC features. #这部分主要是特征提取部分,
mfccdir=mfcc for x in train dev test; do
steps/make_mfcc.sh --cmd "$train_cmd" --nj $feats_nj data/$x exp/make_mfcc/$x $mfccdir
steps/compute_cmvn_stats.sh data/$x exp/make_mfcc/$x $mfccdir
done echo ============================================================================
echo " MonoPhone Training & Decoding "
echo ============================================================================
#这里是单音素的训练和解码部分,语音识别最基础的部分!!要详细研究一下。 steps/train_mono.sh --nj "$train_nj" --cmd "$train_cmd" data/train data/lang exp/mono utils/mkgraph.sh data/lang_test_bg exp/mono exp/mono/graph steps/decode.sh --nj "$decode_nj" --cmd "$decode_cmd" \
exp/mono/graph data/dev exp/mono/decode_dev steps/decode.sh --nj "$decode_nj" --cmd "$decode_cmd" \
exp/mono/graph data/test exp/mono/decode_test echo ============================================================================
echo " tri1 : Deltas + Delta-Deltas Training & Decoding "
echo ============================================================================
#这里是三音素的训练和解码部分 steps/align_si.sh --boost-silence 1.25 --nj "$train_nj" --cmd "$train_cmd" \
data/train data/lang exp/mono exp/mono_ali # Train tri1, which is deltas + delta-deltas, on train data.
steps/train_deltas.sh --cmd "$train_cmd" \
$numLeavesTri1 $numGaussTri1 data/train data/lang exp/mono_ali exp/tri1 utils/mkgraph.sh data/lang_test_bg exp/tri1 exp/tri1/graph steps/decode.sh --nj "$decode_nj" --cmd "$decode_cmd" \
exp/tri1/graph data/dev exp/tri1/decode_dev steps/decode.sh --nj "$decode_nj" --cmd "$decode_cmd" \
exp/tri1/graph data/test exp/tri1/decode_test echo ============================================================================
echo " tri2 : LDA + MLLT Training & Decoding "
echo ============================================================================
#这里在三音素模型的基础上做了 LDA + MLLT 变换 steps/align_si.sh --nj "$train_nj" --cmd "$train_cmd" \
data/train data/lang exp/tri1 exp/tri1_ali steps/train_lda_mllt.sh --cmd "$train_cmd" \
--splice-opts "--left-context=3 --right-context=3" \
$numLeavesMLLT $numGaussMLLT data/train data/lang exp/tri1_ali exp/tri2 utils/mkgraph.sh data/lang_test_bg exp/tri2 exp/tri2/graph steps/decode.sh --nj "$decode_nj" --cmd "$decode_cmd" \
exp/tri2/graph data/dev exp/tri2/decode_dev steps/decode.sh --nj "$decode_nj" --cmd "$decode_cmd" \
exp/tri2/graph data/test exp/tri2/decode_test echo ============================================================================
echo " tri3 : LDA + MLLT + SAT Training & Decoding "
echo ============================================================================
#这里是三音素模型的基础上做了 LDA + MLLT + SAT 变换 # Align tri2 system with train data.
steps/align_si.sh --nj "$train_nj" --cmd "$train_cmd" \
--use-graphs true data/train data/lang exp/tri2 exp/tri2_ali # From tri2 system, train tri3 which is LDA + MLLT + SAT.
steps/train_sat.sh --cmd "$train_cmd" \
$numLeavesSAT $numGaussSAT data/train data/lang exp/tri2_ali exp/tri3 utils/mkgraph.sh data/lang_test_bg exp/tri3 exp/tri3/graph steps/decode_fmllr.sh --nj "$decode_nj" --cmd "$decode_cmd" \
exp/tri3/graph data/dev exp/tri3/decode_dev steps/decode_fmllr.sh --nj "$decode_nj" --cmd "$decode_cmd" \
exp/tri3/graph data/test exp/tri3/decode_test echo ============================================================================
echo " SGMM2 Training & Decoding "
echo ============================================================================
#这里是三音素模型的基础上做了 sgmm2 steps/align_fmllr.sh --nj "$train_nj" --cmd "$train_cmd" \
data/train data/lang exp/tri3 exp/tri3_ali exit # From this point you can run Karel's DNN : local/nnet/run_dnn.sh steps/train_ubm.sh --cmd "$train_cmd" \
$numGaussUBM data/train data/lang exp/tri3_ali exp/ubm4 steps/train_sgmm2.sh --cmd "$train_cmd" $numLeavesSGMM $numGaussSGMM \
data/train data/lang exp/tri3_ali exp/ubm4/final.ubm exp/sgmm2_4 utils/mkgraph.sh data/lang_test_bg exp/sgmm2_4 exp/sgmm2_4/graph steps/decode_sgmm2.sh --nj "$decode_nj" --cmd "$decode_cmd"\
--transform-dir exp/tri3/decode_dev exp/sgmm2_4/graph data/dev \
exp/sgmm2_4/decode_dev steps/decode_sgmm2.sh --nj "$decode_nj" --cmd "$decode_cmd"\
--transform-dir exp/tri3/decode_test exp/sgmm2_4/graph data/test \
exp/sgmm2_4/decode_test echo ============================================================================
echo " MMI + SGMM2 Training & Decoding "
echo ============================================================================
#这里是三音素模型的基础上做了 MMI + SGMM2 steps/align_sgmm2.sh --nj "$train_nj" --cmd "$train_cmd" \
--transform-dir exp/tri3_ali --use-graphs true --use-gselect true \
data/train data/lang exp/sgmm2_4 exp/sgmm2_4_ali steps/make_denlats_sgmm2.sh --nj "$train_nj" --sub-split "$train_nj" \
--acwt 0.2 --lattice-beam 10.0 --beam 18.0 \
--cmd "$decode_cmd" --transform-dir exp/tri3_ali \
data/train data/lang exp/sgmm2_4_ali exp/sgmm2_4_denlats steps/train_mmi_sgmm2.sh --acwt 0.2 --cmd "$decode_cmd" \
--transform-dir exp/tri3_ali --boost 0.1 --drop-frames true \
data/train data/lang exp/sgmm2_4_ali exp/sgmm2_4_denlats exp/sgmm2_4_mmi_b0. for iter in ; do
steps/decode_sgmm2_rescore.sh --cmd "$decode_cmd" --iter $iter \
--transform-dir exp/tri3/decode_dev data/lang_test_bg data/dev \
exp/sgmm2_4/decode_dev exp/sgmm2_4_mmi_b0./decode_dev_it$iter steps/decode_sgmm2_rescore.sh --cmd "$decode_cmd" --iter $iter \
--transform-dir exp/tri3/decode_test data/lang_test_bg data/test \
exp/sgmm2_4/decode_test exp/sgmm2_4_mmi_b0./decode_test_it$iter
done echo ============================================================================
echo " DNN Hybrid Training & Decoding "
echo ============================================================================
#这里是povey版本的dnn模型,教程说不建议使用
# DNN hybrid system training parameters
dnn_mem_reqs="--mem 1G"
dnn_extra_opts="--num_epochs 20 --num-epochs-extra 10 --add-layers-period 1 --shrink-interval 3" steps/nnet2/train_tanh.sh --mix-up --initial-learning-rate 0.015 \
--final-learning-rate 0.002 --num-hidden-layers \
--num-jobs-nnet "$train_nj" --cmd "$train_cmd" "${dnn_train_extra_opts[@]}" \
data/train data/lang exp/tri3_ali exp/tri4_nnet [ ! -d exp/tri4_nnet/decode_dev ] && mkdir -p exp/tri4_nnet/decode_dev
decode_extra_opts=(--num-threads )
steps/nnet2/decode.sh --cmd "$decode_cmd" --nj "$decode_nj" "${decode_extra_opts[@]}" \
--transform-dir exp/tri3/decode_dev exp/tri3/graph data/dev \
exp/tri4_nnet/decode_dev | tee exp/tri4_nnet/decode_dev/decode.log [ ! -d exp/tri4_nnet/decode_test ] && mkdir -p exp/tri4_nnet/decode_test
steps/nnet2/decode.sh --cmd "$decode_cmd" --nj "$decode_nj" "${decode_extra_opts[@]}" \
--transform-dir exp/tri3/decode_test exp/tri3/graph data/test \
exp/tri4_nnet/decode_test | tee exp/tri4_nnet/decode_test/decode.log echo ============================================================================
echo " System Combination (DNN+SGMM) "
echo ============================================================================
#这里是 dnn + sgmm 模型 for iter in ; do
local/score_combine.sh --cmd "$decode_cmd" \
data/dev data/lang_test_bg exp/tri4_nnet/decode_dev \
exp/sgmm2_4_mmi_b0./decode_dev_it$iter exp/combine_2/decode_dev_it$iter local/score_combine.sh --cmd "$decode_cmd" \
data/test data/lang_test_bg exp/tri4_nnet/decode_test \
exp/sgmm2_4_mmi_b0./decode_test_it$iter exp/combine_2/decode_test_it$iter
done echo ============================================================================
echo " DNN Hybrid Training & Decoding (Karel's recipe) "
echo ============================================================================
#这里是 karel 的 dnn 模型,通用的深度学习模型框架!! local/nnet/run_dnn.sh
#local/nnet/run_autoencoder.sh : an example, not used to build any system, echo ============================================================================
echo " Getting Results [see RESULTS file] "
echo ============================================================================
#这里是得到上述模型的最后识别结果
bash RESULTS dev
bash RESULTS test echo ============================================================================
echo "Finished successfully on" `date`
echo ============================================================================ exit

看完这3个基本的脚本,了解下大概都是做什么用的,正在下载 timit 的data,之后跑一下。

timit 数据集下载:   kaldi timit 实例运行全过程

kaldi 三个脚本cmd.sh path.sh run.sh的更多相关文章

  1. run.sh

    1.run.sh   文件  ./run.sh start启动    ./run.sh stop 停止    ./run.sh restart重启     ./run.sh install安装     ...

  2. centos shell编程3【告警系统】 没有服务器端和客户端的概念 main.sh mon.conf load.sh 502.sh mail.php mail.sh disk.sh 第三十七节课

    centos shell编程3[告警系统]  没有服务器端和客户端的概念 main.sh mon.conf load.sh 502.sh mail.php mail.sh  disk.sh  第三十七 ...

  3. 【SSH项目实战三】脚本密钥的批量分发与执行

    [SSH项目实战]脚本密钥的批量分发与执行 标签(空格分隔): Linux服务搭建-陈思齐 ---本教学笔记是本人学习和工作生涯中的摘记整理而成,此为初稿(尚有诸多不完善之处),为原创作品,允许转载, ...

  4. kaldi学习 - 一脚本流学习工具使用

    目录 yesno训练 先给出整体脚本如下: 分块详解 建立解码脚本 kaldi中脚本东西比较多,一层嵌一层,不易阅读. 本文以yesno为例,直接使用kaldi编译的工具,书写简易训练步骤,方便学习k ...

  5. 小鸟初学Shell编程(三)脚本不同执行方式的影响

    执行命令的方式 执行Shell脚本的方式通常有以下四种 方式一:bash ./test.sh 方式二:./test.sh 方式三:source ./test.sh 方式四:. ./test.sh 执行 ...

  6. ./run.sh --indir examples/demo/ --outdir examples/results/ --vis

    (AlphaPose20180911) luo@luo-ThinkPad-W540:AlphaPose$ ./run.sh --indir examples/demo/ --outdir exampl ...

  7. 网络文件常常提到类似"./run.sh"的数据,这个命令的意义是什么?

    由于命令的执行需要变量的支持,若你的执行文件放置在本目录,并且本目录并非正规的执行文件目录(/bin./usr/bin 等为正规),此时要执行命令就得要严格指定该执行文件."./" ...

  8. linux下执行sh文件报错:oswatcher_restart.sh: line 13: ./startOSW.sh: Permission denied

    1 查看执行sh文件的内容 [root@xxxdb0402 dbscripts]# more oswatcher_restart.sh  #!/usr/bin/ksh #export oswdir=` ...

  9. erlang erl文件编译的三种脚本

    方案1:命令行 #!/bin/sh #file name: erl.sh #author: stars #time:2015.06.05 #eg: ./erl.sh hello.erl start 2 ...

随机推荐

  1. css 2D转换总结

    CSS中2D转换的形式是这样的: 选择器{ transform:转换函数(参数,参数): } 其中transform(是transform 不是transfrom)定义元素的2D或者3D转换: 2D转 ...

  2. 自学Zabbix10.1 Configuration export/import 配置导入导出

    自学Zabbix10.1 Configuration export/import 配置导入导出 通过导入/导出zabbix配置文件,我们可以将自己写好的模板等配置在网络上分享,我们也可以导入网络上分享 ...

  3. NOI2018旅游记

    这居然是我第一次参加非NOIP的NOI系列赛事,有点小期待啊 前几天的UNR我暴露出了许多问题,而且翻了好多分,不过令人震惊的是假设Day1不停电(导致已经写好的T3没交上去)我居然有rk10,虽然并 ...

  4. 【转】如何学习android开发

    1.Java基础 很多朋友一上手就开始学习Android,似乎太着急了一些.Android应用程序开发是以Java语言为基础的,所以没有扎实的Java基础知识,只 是机械的照抄别人的代码,是没有任何意 ...

  5. RabbitMQ服务主机名更改导致消息队列无法连接

    RabbitMQ服务主机名更改导致消息队列无法连接 在多节点环境中,RabbitMQ服务使用一个独立节点部署.在此环境下,如果修改了RabbitMQ节点的主机名,则需要更新RabbitMQ用户才能保证 ...

  6. Linux系统下/tmp目录文件重启后自动删除,不重启自动删除10天前的/TMP的文件(转)

    /tmp目录文件重启后自动删除现在知道有ubuntu和solaris系统source:http://blog.chinaunix.net/uid-26212859-id-3567875.html经常会 ...

  7. JS中的继承链

    我们首先定义一个构造函数Person,然后定义一个对象p,JS代码如下: function Person(name) { this.name = name; } var p = new Person( ...

  8. python deamon(守护)线程的作用

    stackoverflow 上的解释 某些线程执行后台任务,例如发送keepalive数据包,或执行定期垃圾收集,或任何.这些仅在主程序运行时有用,并且一旦其他非守护程序线程退出就可以将其关闭. 没有 ...

  9. webpack插件自动加css3前缀

    想要webpack帮忙自动加上“-webkit-”之类的css前缀,我们需要用到postcss-loader和它的插件autoprefixer 1.安装 npm i postcss-loader au ...

  10. python---django使用cookie和session

    在views中的调用: def login(req): message='' if req.method == "POST": user = req.POST.get(" ...