TIMIT语音库是IT和MIT合作音素级别标注的语音库,用于自动语音识别系统的发展和评估,包括来自美式英语,8个地区方言,630个人。

每个人读10个句子,每个发音都是音素级别、词级别文本标注,16kHz,16bit。

注意:不用使用TIMIT配置作为运行Kaldi的一个通用型例子,因为它不是一个非常标准的结构。

其它的一些配置也是非常好用的。

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

librispeech/s5是非常好的,因为它是免费的。

yesno是非常轻量级、快速运行,而且也是免费的。

wsj/s5有一个不普遍的例子脚本,这些脚本可能让人感到疑惑的。

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

s5: 单音素、3音素 GMM/HMM系统,用ML训练。接着是SGMM和DNN配置。

基于48个音素完成训练,《Speaker-Independent Phone Recognition Using Hidden Markov Models》。

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

执行,修改run.sh里面的timit语音库路径,修改cmd.sh运行的脚本,queue.pl改成本地跑的run.pl,下载安装tools/extra/install_srilm.sh,拷贝irstlm文件夹

到tools目录下,最后运行run.sh.

运行完成后,屏幕打印的内容如下:

============================================================================
Data & Lexicon & Language Preparation 数据、词典、语言准备
============================================================================
wav-to-duration --read-entire-file=true scp:train_wav.scp ark,t:train_dur.ark
LOG (wav-to-duration[5.2.124~1396-70748]:main():wav-to-duration.cc:92) Printed duration for 3696 audio files.
LOG (wav-to-duration[5.2.124~1396-70748]:main():wav-to-duration.cc:94) Mean duration was 3.06336, min and max durations were 0.91525, 7.78881
wav-to-duration --read-entire-file=true scp:dev_wav.scp ark,t:dev_dur.ark
LOG (wav-to-duration[5.2.124~1396-70748]:main():wav-to-duration.cc:92) Printed duration for 400 audio files.
LOG (wav-to-duration[5.2.124~1396-70748]:main():wav-to-duration.cc:94) Mean duration was 3.08212, min and max durations were 1.09444, 7.43681
wav-to-duration --read-entire-file=true scp:test_wav.scp ark,t:test_dur.ark
LOG (wav-to-duration[5.2.124~1396-70748]:main():wav-to-duration.cc:92) Printed duration for 192 audio files.
LOG (wav-to-duration[5.2.124~1396-70748]:main():wav-to-duration.cc:94) Mean duration was 3.03646, min and max durations were 1.30562, 6.21444
Data preparation succeeded
LOGFILE:/dev/null
$bin/ngt -i="$inpfile" -n=$order -gooout=y -o="$gzip -c > $tmpdir/ngram.${sdict}.gz" -fd="$tmpdir/$sdict" $dictionary $additional_parameters >> $logfile 2>&1
$scr/build-sublm.pl $verbose $prune $prune_thr_str $smoothing "$additional_smoothing_parameters" --size $order --ngrams "$gunzip -c $tmpdir/ngram.${sdict}.gz" -sublm $tmpdir/lm.$sdict $additional_parameters >> $logfile 2>&1
inpfile: data/local/lm_tmp/lm_phone_bg.ilm.gz
outfile: /dev/stdout
loading up to the LM level 1000 (if any)
dub: 10000000
OOV code is 50
OOV code is 50
Saving in txt format to /dev/stdout
Dictionary & language model preparation succeeded
Checking data/local/dict/silence_phones.txt ...
--> reading data/local/dict/silence_phones.txt
--> data/local/dict/silence_phones.txt is OK Checking data/local/dict/optional_silence.txt ...
--> reading data/local/dict/optional_silence.txt
--> data/local/dict/optional_silence.txt is OK Checking data/local/dict/nonsilence_phones.txt ...
--> reading data/local/dict/nonsilence_phones.txt
--> data/local/dict/nonsilence_phones.txt is OK Checking disjoint: silence_phones.txt, nonsilence_phones.txt
--> disjoint property is OK. Checking data/local/dict/lexicon.txt
--> reading data/local/dict/lexicon.txt
--> data/local/dict/lexicon.txt is OK Checking data/local/dict/extra_questions.txt ...
--> reading data/local/dict/extra_questions.txt
--> data/local/dict/extra_questions.txt is OK
--> SUCCESS [validating dictionary directory data/local/dict] **Creating data/local/dict/lexiconp.txt from data/local/dict/lexicon.txt
fstaddselfloops data/lang/phones/wdisambig_phones.int data/lang/phones/wdisambig_words.int
prepare_lang.sh: validating output directory
utils/validate_lang.pl data/lang
Checking data/lang/phones.txt ...
--> data/lang/phones.txt is OK Checking words.txt: #0 ...
--> data/lang/words.txt is OK Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> summation property is OK Checking data/lang/phones/context_indep.{txt, int, csl} ...
--> 1 entry/entries in data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.int corresponds to data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.csl corresponds to data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.{txt, int, csl} are OK Checking data/lang/phones/nonsilence.{txt, int, csl} ...
--> 47 entry/entries in data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.int corresponds to data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.csl corresponds to data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.{txt, int, csl} are OK Checking data/lang/phones/silence.{txt, int, csl} ...
--> 1 entry/entries in data/lang/phones/silence.txt
--> data/lang/phones/silence.int corresponds to data/lang/phones/silence.txt
--> data/lang/phones/silence.csl corresponds to data/lang/phones/silence.txt
--> data/lang/phones/silence.{txt, int, csl} are OK Checking data/lang/phones/optional_silence.{txt, int, csl} ...
--> 1 entry/entries in data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.int corresponds to data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.csl corresponds to data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.{txt, int, csl} are OK Checking data/lang/phones/disambig.{txt, int, csl} ...
--> 2 entry/entries in data/lang/phones/disambig.txt
--> data/lang/phones/disambig.int corresponds to data/lang/phones/disambig.txt
--> data/lang/phones/disambig.csl corresponds to data/lang/phones/disambig.txt
--> data/lang/phones/disambig.{txt, int, csl} are OK Checking data/lang/phones/roots.{txt, int} ...
--> 48 entry/entries in data/lang/phones/roots.txt
--> data/lang/phones/roots.int corresponds to data/lang/phones/roots.txt
--> data/lang/phones/roots.{txt, int} are OK Checking data/lang/phones/sets.{txt, int} ...
--> 48 entry/entries in data/lang/phones/sets.txt
--> data/lang/phones/sets.int corresponds to data/lang/phones/sets.txt
--> data/lang/phones/sets.{txt, int} are OK Checking data/lang/phones/extra_questions.{txt, int} ...
--> 2 entry/entries in data/lang/phones/extra_questions.txt
--> data/lang/phones/extra_questions.int corresponds to data/lang/phones/extra_questions.txt
--> data/lang/phones/extra_questions.{txt, int} are OK Checking optional_silence.txt ...
--> reading data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.txt is OK Checking disambiguation symbols: #0 and #1
--> data/lang/phones/disambig.txt has "#0" and "#1"
--> data/lang/phones/disambig.txt is OK Checking topo ... Checking word-level disambiguation symbols...
--> data/lang/phones/wdisambig.txt exists (newer prepare_lang.sh)
Checking data/lang/oov.{txt, int} ...
--> 1 entry/entries in data/lang/oov.txt
--> data/lang/oov.int corresponds to data/lang/oov.txt
--> data/lang/oov.{txt, int} are OK --> data/lang/L.fst is olabel sorted
--> data/lang/L_disambig.fst is olabel sorted
--> SUCCESS [validating lang directory data/lang]
Preparing train, dev and test data
utils/validate_data_dir.sh: Successfully validated data-directory data/train
utils/validate_data_dir.sh: Successfully validated data-directory data/dev
utils/validate_data_dir.sh: Successfully validated data-directory data/test
Preparing language models for test
arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang_test_bg/words.txt - data/lang_test_bg/G.fst
LOG (arpa2fst[5.2.124~1396-70748]:Read():arpa-file-parser.cc:98) Reading \data\ section.
LOG (arpa2fst[5.2.124~1396-70748]:Read():arpa-file-parser.cc:153) Reading \1-grams: section.
LOG (arpa2fst[5.2.124~1396-70748]:Read():arpa-file-parser.cc:153) Reading \2-grams: section.
WARNING (arpa2fst[5.2.124~1396-70748]:ConsumeNGram():arpa-lm-compiler.cc:313) line 60 [-3.26717<s> <s>] skipped: n-gram has invalid BOS/EOS placement
LOG (arpa2fst[5.2.124~1396-70748]:RemoveRedundantStates():arpa-lm-compiler.cc:359) Reduced num-states from 50 to 50
fstisstochastic data/lang_test_bg/G.fst
0.000510126 -0.0763018
utils/validate_lang.pl data/lang_test_bg
Checking data/lang_test_bg/phones.txt ...
--> data/lang_test_bg/phones.txt is OK Checking words.txt: #0 ...
--> data/lang_test_bg/words.txt is OK Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> summation property is OK Checking data/lang_test_bg/phones/context_indep.{txt, int, csl} ...
--> 1 entry/entries in data/lang_test_bg/phones/context_indep.txt
--> data/lang_test_bg/phones/context_indep.int corresponds to data/lang_test_bg/phones/context_indep.txt
--> data/lang_test_bg/phones/context_indep.csl corresponds to data/lang_test_bg/phones/context_indep.txt
--> data/lang_test_bg/phones/context_indep.{txt, int, csl} are OK Checking data/lang_test_bg/phones/nonsilence.{txt, int, csl} ...
--> 47 entry/entries in data/lang_test_bg/phones/nonsilence.txt
--> data/lang_test_bg/phones/nonsilence.int corresponds to data/lang_test_bg/phones/nonsilence.txt
--> data/lang_test_bg/phones/nonsilence.csl corresponds to data/lang_test_bg/phones/nonsilence.txt
--> data/lang_test_bg/phones/nonsilence.{txt, int, csl} are OK Checking data/lang_test_bg/phones/silence.{txt, int, csl} ...
--> 1 entry/entries in data/lang_test_bg/phones/silence.txt
--> data/lang_test_bg/phones/silence.int corresponds to data/lang_test_bg/phones/silence.txt
--> data/lang_test_bg/phones/silence.csl corresponds to data/lang_test_bg/phones/silence.txt
--> data/lang_test_bg/phones/silence.{txt, int, csl} are OK Checking data/lang_test_bg/phones/optional_silence.{txt, int, csl} ...
--> 1 entry/entries in data/lang_test_bg/phones/optional_silence.txt
--> data/lang_test_bg/phones/optional_silence.int corresponds to data/lang_test_bg/phones/optional_silence.txt
--> data/lang_test_bg/phones/optional_silence.csl corresponds to data/lang_test_bg/phones/optional_silence.txt
--> data/lang_test_bg/phones/optional_silence.{txt, int, csl} are OK Checking data/lang_test_bg/phones/disambig.{txt, int, csl} ...
--> 2 entry/entries in data/lang_test_bg/phones/disambig.txt
--> data/lang_test_bg/phones/disambig.int corresponds to data/lang_test_bg/phones/disambig.txt
--> data/lang_test_bg/phones/disambig.csl corresponds to data/lang_test_bg/phones/disambig.txt
--> data/lang_test_bg/phones/disambig.{txt, int, csl} are OK Checking data/lang_test_bg/phones/roots.{txt, int} ...
--> 48 entry/entries in data/lang_test_bg/phones/roots.txt
--> data/lang_test_bg/phones/roots.int corresponds to data/lang_test_bg/phones/roots.txt
--> data/lang_test_bg/phones/roots.{txt, int} are OK Checking data/lang_test_bg/phones/sets.{txt, int} ...
--> 48 entry/entries in data/lang_test_bg/phones/sets.txt
--> data/lang_test_bg/phones/sets.int corresponds to data/lang_test_bg/phones/sets.txt
--> data/lang_test_bg/phones/sets.{txt, int} are OK Checking data/lang_test_bg/phones/extra_questions.{txt, int} ...
--> 2 entry/entries in data/lang_test_bg/phones/extra_questions.txt
--> data/lang_test_bg/phones/extra_questions.int corresponds to data/lang_test_bg/phones/extra_questions.txt
--> data/lang_test_bg/phones/extra_questions.{txt, int} are OK Checking optional_silence.txt ...
--> reading data/lang_test_bg/phones/optional_silence.txt
--> data/lang_test_bg/phones/optional_silence.txt is OK Checking disambiguation symbols: #0 and #1
--> data/lang_test_bg/phones/disambig.txt has "#0" and "#1"
--> data/lang_test_bg/phones/disambig.txt is OK Checking topo ... Checking word-level disambiguation symbols...
--> data/lang_test_bg/phones/wdisambig.txt exists (newer prepare_lang.sh)
Checking data/lang_test_bg/oov.{txt, int} ...
--> 1 entry/entries in data/lang_test_bg/oov.txt
--> data/lang_test_bg/oov.int corresponds to data/lang_test_bg/oov.txt
--> data/lang_test_bg/oov.{txt, int} are OK --> data/lang_test_bg/L.fst is olabel sorted
--> data/lang_test_bg/L_disambig.fst is olabel sorted
--> data/lang_test_bg/G.fst is ilabel sorted
--> data/lang_test_bg/G.fst has 50 states
fstdeterminizestar data/lang_test_bg/G.fst /dev/null
--> data/lang_test_bg/G.fst is determinizable
--> utils/lang/check_g_properties.pl successfully validated data/lang_test_bg/G.fst
--> utils/lang/check_g_properties.pl succeeded.
--> Testing determinizability of L_disambig . G
fstdeterminizestar
fsttablecompose data/lang_test_bg/L_disambig.fst data/lang_test_bg/G.fst
--> L_disambig . G is determinizable
--> SUCCESS [validating lang directory data/lang_test_bg]
Succeeded in formatting data.
============================================================================
============================================================================
MFCC Feature Extration & CMVN for Training and Test set
============================================================================
steps/make_mfcc.sh --cmd run.pl --mem 4G --nj 10 data/train exp/make_mfcc/train mfcc
utils/validate_data_dir.sh: Successfully validated data-directory data/train
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
Succeeded creating MFCC features for train
steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train mfcc
Succeeded creating CMVN stats for train
steps/make_mfcc.sh --cmd run.pl --mem 4G --nj 10 data/dev exp/make_mfcc/dev mfcc
utils/validate_data_dir.sh: Successfully validated data-directory data/dev
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
Succeeded creating MFCC features for dev
steps/compute_cmvn_stats.sh data/dev exp/make_mfcc/dev mfcc
Succeeded creating CMVN stats for dev
steps/make_mfcc.sh --cmd run.pl --mem 4G --nj 10 data/test exp/make_mfcc/test mfcc
utils/validate_data_dir.sh: Successfully validated data-directory data/test
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
Succeeded creating MFCC features for test
steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test mfcc
Succeeded creating CMVN stats for test
============================================================================

kaldi的TIMIT实例一的更多相关文章

  1. kaldi的TIMIT实例三

    ============================================================================ MMI + SGMM2 Training &a ...

  2. kaldi的TIMIT实例二

    ============================================================================ MonoPhone Training & ...

  3. kaldi 三个脚本cmd.sh path.sh run.sh

    参考   kaldi 的全部资料_v0.4 cmd.sh 脚本为: 可以很清楚的看到有 3 个分类分别对应 a,b,c.a 和 b 都是集群上去运行这个样子, c 就是我们需要的.我们在虚拟机上运行的 ...

  4. Ubuntu下kaldi安装

    该文章为博主原创,如若转载请注明出处:https://www.cnblogs.com/fengleixue/p/9482202.html 因公司业务需要需使用kaldi语音识别工具,现将kaldi环境 ...

  5. 最近学习工作流 推荐一个activiti 的教程文档

    全文地址:http://www.mossle.com/docs/activiti/ Activiti 5.15 用户手册 Table of Contents 1. 简介 协议 下载 源码 必要的软件 ...

  6. kaldi实例脚本运行

    Getting started, and prerequisites. rm/s5/run.sh Data preparation 如果有GridEngine, train_cmd="que ...

  7. [转]Kaldi语音识别

    转:http://ftli.farbox.com/post/kaldizhong-wen-shi-bie Kaldi语音识别 1.声学建模单元的选择 1.1对声学建模单元加入位置信息 2.输入特征 3 ...

  8. Kaldi的关键词搜索(Keyword Search,KWS)

    本文简单地介绍了KWS的原理--为Lattice中每个词生成索引并进行搜索:介绍了如何处理OOV--替补(Proxy,词典内对OOV的替补)关键词技术:介绍了KWS的语料库格式:介绍了KWS在Kald ...

  9. Kaldi语料的两种切分/组织方式及其处理

    text中每一个文本段由一个音频索引(indexed by utterance) 使用该方式的egs:librispeech.timit.thchs30.atc_en.atc_cn 语料的组织形式为: ...

随机推荐

  1. python-反射案例讲解

    login.py#!/usr/bin/dev python# coding:utf-8 def index(): print u'欢迎访问xx网站首页' def login(): print u'登录 ...

  2. Eclipse 如何查看源代码

    Eclipse 关联源代码:

  3. 从一个简单的约束看规范性的SQL脚本对数据库运维的影响

    之前提到了约束的一些特点,看起来也没什么大不了的问题,http://www.cnblogs.com/wy123/p/7350265.html以下以实际生产运维中遇到的一个问题来说明规范的重要性. 如下 ...

  4. python中发送post请求时,报错“Unrecognized token 'xxxx': was expecting ('true', 'false' or 'null')”

    解决办法: 如请求参数为 data={“user”=“aaa”,“pwd”=“123456”,sign=“00000000000000”} 需要将参数data先做处理,调用函数datas=datajs ...

  5. 算法题——给定一个数组 arr,编写一个函数将所有 0 移动到数组的末尾,同时保持非零元素的相对顺序。

    参考自:https://blog.csdn.net/qq_38200548/article/details/80688630 示例: 输入: [0,1,0,3,12] 输出: [1,3,12,0,0] ...

  6. springcloud流程图

    自己画的: 别人画的 别人画的2

  7. python 删除非空文件夹

    import os import shutil os.remove(path) #删除文件 os.removedirs(path) #删除空文件夹 shutil.rmtree(path) #递归删除文 ...

  8. hdu 4714 树+DFS

    题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=4714 本来想直接求树的直径,再得出答案,后来发现是错的. 思路:任选一个点进行DFS,对于一棵以点u为 ...

  9. POJ-2253.Frogger.(求每条路径中最大值的最小值,最短路变形)

    做到了这个题,感觉网上的博客是真的水,只有kuangbin大神一句话就点醒了我,所以我写这篇博客是为了让最短路的入门者尽快脱坑...... 本题思路:本题是最短路的变形,要求出最短路中的最大跳跃距离, ...

  10. [leetcode]297. Serialize and Deserialize Binary Tree 序列化与反序列化二叉树

    Serialization is the process of converting a data structure or object into a sequence of bits so tha ...