Kaldi中nnet3进行语音识别过程中用到的部分工具集锦!!!
前一篇文章我们对Kaldi ASR有了初步的了解,我们再来看看怎么使用Kaldi的神经网络模型NNET3来进行wav文件语音识别~~~~下载中文预训练模型:[houwenbin@localhost ~]$ cd ~/kaldi-master/egs[houwenbin@localhost egs]$ wget -T 10 -t 3 http://kaldi-asr.org/models
前一篇文章我们对Kaldi ASR有了初步的了解,我们再来看看怎么使用Kaldi的神经网络模型NNET3来进行wav文件语音识别~~~~
下载中文预训练模型:
[houwenbin@localhost ~]$ cd ~/kaldi-master/egs
[houwenbin@localhost egs]$ wget -T 10 -t 3 http://kaldi-asr.org/models/0002_cvte_chain_model.tar.gz
[houwenbin@localhost egs]$ tar xzf 0002_cvte_chain_model.tar.gz
可以看到cvte目录,进s5去,创建连个软连接:
[houwenbin@localhost egs]$ ln -s ~/kaldi-master/egs/wsj/s5/steps ~/kaldi-master/egs/cvte/s5/steps
[houwenbin@localhost egs]$ ln -s ~/kaldi-master/egs/wsj/s5/utils ~/kaldi-master/egs/cvte/s5/utils
完成这些工作,我们就可以运行run.sh
#!/bin/bash
. ./cmd.sh
. ./path.sh
# step 1: generate fbank features
obj_dir=data/fbank
for x in test; do
# rm fbank/$x
mkdir -p fbank/$x
# compute fbank without pitch
steps/make_fbank.sh --nj 1 --cmd "run.pl" $obj_dir/$x exp/make_fbank/$x fbank/$x || exit 1;
# compute cmvn
steps/compute_cmvn_stats.sh $obj_dir/$x exp/fbank_cmvn/$x fbank/$x || exit 1;
done
# #step 2: offline-decoding
test_data=data/fbank/test
dir=exp/chain/tdnn
steps/nnet3/decode.sh --acwt 1.0 --post-decode-acwt 10.0 \
--nj 1 --num-threads 1 \
--cmd "$decode_cmd" --iter final \
--frames-per-chunk 50 \
$dir/graph $test_data $dir/decode_test
# # note: the model is trained using "apply-cmvn-online",
# # so you can modify the corresponding code in steps/nnet3/decode.sh to obtain the best performance,
# # but if you directly steps/nnet3/decode.sh,
# # the performance is also good, but a little poor than the "apply-cmvn-online" method.
如果不出问题的话,就会得到识别结果了!
# nnet3-latgen-faster --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/chain/tdnn/graph/words.txt exp/chain/tdnn/final.mdl exp/chain/tdnn/graph/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=true --norm-vars=false --utt2spk=ark:data/fbank/test/split1/1/utt2spk scp:data/fbank/test/split1/1/cmvn.scp scp:data/fbank/test/split1/1/feats.scp ark:- |" "ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- | gzip -c >exp/chain/tdnn/decode_test/lat.1.gz"
# Started at Fri Jun 16 15:38:02 CST 2017
#
nnet3-latgen-faster --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/chain/tdnn/graph/words.txt exp/chain/tdnn/final.mdl exp/chain/tdnn/graph/HCLG.fst 'ark,s,cs:apply-cmvn --norm-means=true --norm-vars=false --utt2spk=ark:data/fbank/test/split1/1/utt2spk scp:data/fbank/test/split1/1/cmvn.scp scp:data/fbank/test/split1/1/feats.scp ark:- |' 'ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- | gzip -c >exp/chain/tdnn/decode_test/lat.1.gz'
lattice-scale --acoustic-scale=10.0 ark:- ark:-
apply-cmvn --norm-means=true --norm-vars=false --utt2spk=ark:data/fbank/test/split1/1/utt2spk scp:data/fbank/test/split1/1/cmvn.scp scp:data/fbank/test/split1/1/feats.scp ark:-
LOG (nnet3-latgen-faster[5.1]:CheckAndFixConfigs():nnet-am-decodable-simple.cc:303) Increasing --frames-per-chunk from 50 to 51 to make it a multiple of --frame-subsampling-factor=3
CVTE201703_00030_165722_1175 据 楼主 老婆 说 楼主 昨天 家族 聚会 喝 多 了 回家 路上 大脑 和面 跟 电线杆 表白 了 一个 多 小时
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165722_1175 is 1.90676 over 452 frames.
CVTE201703_00030_165740_2562 因为 没 捞 了 不少 我家 里 经常 来往 的 人 也 都是 搞 煤矿 的 基本上 现在 都 转行 了
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165740_2562 is 1.99356 over 379 frames.
CVTE201703_00030_165754_5069 为啥 叫 皇上 呢 因为 那时候 凡是 公司 聚餐 行政 都 要 问 我 想 吃 什么
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165754_5069 is 2.00562 over 298 frames.
CVTE201703_00030_165809_2685 一旦 有 什么 问题 手机 马上 就会 报警 然后 系统 自动 停机 等 解决 故障 之后 再开 机
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165809_2685 is 2.31544 over 303 frames.
CVTE201703_00030_165830_5107 首先 你 说 沈 大人 是 这个 就 不符合 按 答 组 的 情况 只能 去 做 天 猫
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165830_5107 is 1.93123 over 260 frames.
CVTE201703_00030_165847_5561 还有 就是 几年 同学 不 联系 微信 问 在 不在 就让 你 帮忙 刷 好评
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165847_5561 is 2.14821 over 247 frames.
CVTE201703_00030_165907_3088 读 硕 一般 只要 有 学校 录取通知书 签证 肯定 下来 申请 学校 还是 得 靠 你 自己 啊
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165907_3088 is 2.08663 over 307 frames.
CVTE201703_00030_165916_7980 我 认识 一个 叔叔 辈 从前 都是 老实巴交 的 好好先生
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165916_7980 is 1.94317 over 183 frames.
CVTE201703_00030_165929_3456 这样 即使 有事 故 发生 冷却 系统 停止 工作 断电 这里 仍然 会 保持 在 零下
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165929_3456 is 2.17643 over 290 frames.
LOG (apply-cmvn[5.1]:main():apply-cmvn.cc:146) Applied cepstral mean normalization to 10 utterances, errors on 0
CVTE201703_00030_165942_5013 关于 还款 汇率 希望 大家 不要 被 误导 当然 这个 火鸡 的 答案 并不 对
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165942_5013 is 2.18779 over 226 frames.
LOG (nnet3-latgen-faster[5.1]:main():nnet3-latgen-faster.cc:256) Time taken 50.7098s: real-time factor assuming 100 frames/sec is 0.573965
LOG (nnet3-latgen-faster[5.1]:main():nnet3-latgen-faster.cc:259) Done 10 utterances, failed for 0
LOG (nnet3-latgen-faster[5.1]:main():nnet3-latgen-faster.cc:261) Overall log-likelihood per frame is 2.06153 over 2945 frames.
LOG (nnet3-latgen-faster[5.1]:~CachingOptimizingCompiler():nnet-optimize.cc:659) 0.0935 seconds taken in nnet3 compilation total (breakdown: 0.0446 compilation, 0.0358 optimization, 0 shortcut expansion, 0.00728 checking, 6.7e-05 computing indexes, 0.00579 misc.)
LOG (lattice-scale[5.1]:main():lattice-scale.cc:90) Done 10 lattices.
# Accounting: time=159 threads=1
# Ended (code 0) at Fri Jun 16 15:40:41 CST 2017, elapsed time 159 seconds
-------------------------------------------------------------通过分析日志,我们可以发现调用了-------------------------------------------------------------
# nnet3-latgen-faster --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/chain/tdnn/graph/words.txt exp/chain/tdnn/final.mdl exp/chain/tdnn/graph/HCLG.fst 'ark,s,cs:apply-cmvn --norm-means=true --norm-vars=false --utt2spk=ark:data/fbank/test/split1/1/utt2spk scp:data/fbank/test/split1/1/cmvn.scp scp:data/fbank/test/split1/1/feats.scp ark:- |' 'ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- | gzip -c >exp/chain/tdnn/decode_test/lat.1.gz'
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-latgen-faster -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-latgen-faster -h
Generate lattices using GMM-based model.
Usage: gmm-latgen-faster [options] model-in (fst-in|fsts-rspecifier) features-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ]
Options:
--acoustic-scale : Scaling factor for acoustic likelihoods (float, default = 0.1)
--allow-partial : If true, produce output even if end state was not reached. (bool, default = false)
--beam : Decoding beam. Larger->slower, more accurate. (float, default = 16)
--beam-delta : Increment used in decoding-- this parameter is obscure and relates to a speedup in the way the max-active constraint is applied. Larger is more accurate. (float, default = 0.5)
--delta : Tolerance used in determinization (float, default = 0.000976562)
--determinize-lattice : If true, determinize the lattice (lattice-determinization, keeping only best pdf-sequence for each word-sequence). (bool, default = true)
--hash-ratio : Setting used in decoder to control hash behavior (float, default = 2)
--lattice-beam : Lattice generation beam. Larger->slower, and deeper lattices (float, default = 10)
--max-active : Decoder max active states. Larger->slower; more accurate (int, default = 2147483647)
--max-mem : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000)
--min-active : Decoder minimum #active states. (int, default = 200)
--minimize : If true, push and minimize after determinization. (bool, default = false)
--phone-determinize : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true)
--prune-interval : Interval (in frames) at which to prune tokens (int, default = 25)
--word-determinize : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true)
--word-symbol-table : Symbol table for words [for debug output] (string, default = "")
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# lattice-scale --acoustic-scale=10.0 ark:- ark:-
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/latbin/lattice-scale -h
/home/houwenbin/kaldi-master/src/latbin/lattice-scale -h
Apply scaling to lattice weights
Usage: lattice-scale [options] lattice-rspecifier lattice-wspecifier
e.g.: lattice-scale --lm-scale=0.0 ark:1.lats ark:scaled.lats
Options:
--acoustic-scale : Scaling factor for acoustic likelihoods (float, default = 1)
--acoustic2lm-scale : Add this times original acoustic costs to LM costs (float, default = 0)
--inv-acoustic-scale : An alternative way of setting the acoustic scale: you can set its inverse. (float, default = 1)
--lm-scale : Scaling factor for graph/lm costs (float, default = 1)
--lm2acoustic-scale : Add this times original LM costs to acoustic costs (float, default = 0)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/latbin/lattice-scale -h
/home/houwenbin/kaldi-master/src/latbin/lattice-scale -h
Apply scaling to lattice weights
Usage: lattice-scale [options] lattice-rspecifier lattice-wspecifier
e.g.: lattice-scale --lm-scale=0.0 ark:1.lats ark:scaled.lats
Options:
--acoustic-scale : Scaling factor for acoustic likelihoods (float, default = 1)
--acoustic2lm-scale : Add this times original acoustic costs to LM costs (float, default = 0)
--inv-acoustic-scale : An alternative way of setting the acoustic scale: you can set its inverse. (float, default = 1)
--lm-scale : Scaling factor for graph/lm costs (float, default = 1)
--lm2acoustic-scale : Add this times original LM costs to acoustic costs (float, default = 0)
这里贴一下,训练过程中用到的一些工具:
一、mono阶段:
1、初始化monophone GMM
# gmm-init-mono --shared-phones=data/lang/phones/sets.int "--train-feats=ark,s,cs:apply-cmvn --utt2spk=ark:data/mfcc/train/split8/1/utt2spk scp:data/mfcc/train/split8/1/cmvn.scp scp:data/mfcc/train/split8/1/feats.scp ark:- | add-deltas ark:- ark:- | subset-feats --n=10 ark:- ark:-|" data/lang/topo 39 exp/mono/0.mdl exp/mono/tree
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-init-mono -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-init-mono -h
Initialize monophone GMM.
Usage: gmm-init-mono <topology-in> <dim> <model-out> <tree-out>
e.g.:
gmm-init-mono topo 39 mono.mdl mono.tree
Options:
--binary : Write output in binary mode (bool, default = true)
--perturb-factor : Perturb the means using this fraction of standard deviation. (float, default = 0)
--shared-phones : rxfilename containing, on each line, a list of phones whose pdfs should be shared. (string, default = "")
--train-feats : rspecifier for training features [used to set mean and variance] (string, default = "")
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
2、编译train图谱
# compile-train-graphs --read-disambig-syms=data/lang/phones/disambig.int exp/mono/tree exp/mono/0.mdl data/lang/L.fst "ark:sym2int.pl --map-oov 2 -f 2- data/lang/words.txt < data/mfcc/train/split8/2/text|" "ark:|gzip -c >exp/mono/fsts.2.gz"
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/compile-train-graphs -h
/home/houwenbin/kaldi-master/src/bin/compile-train-graphs -h
Creates training graphs (without transition-probabilities, by default)
Usage: compile-train-graphs [options] <tree-in> <model-in> <lexicon-fst-in> <transcriptions-rspecifier> <graphs-wspecifier>
e.g.:
compile-train-graphs tree 1.mdl lex.fst 'ark:sym2int.pl -f 2- words.txt text|' ark:graphs.fsts
Options:
--batch-size : Number of FSTs to compile at a time (more -> faster but uses more memory. E.g. 500 (int, default = 250)
--read-disambig-syms : File containing list of disambiguation symbols in phone symbol table (string, default = "")
--reorder : Reorder transition ids for greater decoding efficiency. (bool, default = true)
--rm-eps : Remove [most] epsilons before minimization (only applicable if disambig symbols present) (bool, default = false)
--self-loop-scale : Scale of self-loop vs. non-self-loop probability mass (float, default = 0)
--transition-scale : Scale of transition probabilities (excluding self-loops) (float, default = 0)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
3、aligment
# align-equal-compiled "ark:gunzip -c exp/mono/fsts.3.gz|" "ark,s,cs:apply-cmvn --utt2spk=ark:data/mfcc/train/split8/3/utt2spk scp:data/mfcc/train/split8/3/cmvn.scp scp:data/mfcc/train/split8/3/feats.scp ark:- | add-deltas ark:- ark:- |" ark,t:- | gmm-acc-stats-ali --binary=true exp/mono/0.mdl "ark,s,cs:apply-cmvn --utt2spk=ark:data/mfcc/train/split8/3/utt2spk scp:data/mfcc/train/split8/3/cmvn.scp scp:data/mfcc/train/split8/3/feats.scp ark:- | add-deltas ark:- ark:- |" ark:- exp/mono/0.3.acc
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/align-equal-compiled -h
/home/houwenbin/kaldi-master/src/bin/align-equal-compiled -h
Write an equally spaced alignment (for getting training started)Usage: align-equal-compiled <graphs-rspecifier> <features-rspecifier> <alignments-wspecifier>
e.g.:
align-equal-compiled 1.fsts scp:train.scp ark:equal.ali
Options:
--binary : Write output in binary mode (bool, default = true)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# gmm-boost-silence --boost=1.25 1 exp/mono/1.mdl -
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-boost-silence -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-boost-silence -h
Modify GMM-based model to boost (by a certain factor) all
probabilities associated with the specified phones (could be
all silence phones, or just the ones used for optional silence).
Note: this is done by modifying the GMM weights. If the silence
model shares a GMM with other models, then it will modify the GMM
weights for all models that may correspond to silence.
Usage: gmm-boost-silence [options] <silence-phones-list> <model-in> <model-out>
e.g.: gmm-boost-silence --boost=1.5 1:2:3 1.mdl 1_boostsil.mdl
Options:
--binary : Write output in binary mode (bool, default = true)
--boost : Factor by which to boost silence probs (float, default = 1.5)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
4、update
# gmm-est --min-gaussian-occupancy=3 --mix-up=656 --power=0.25 exp/mono/0.mdl 'gmm-sum-accs - exp/mono/0.*.acc|' exp/mono/1.mdl
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-est -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-est -h
Do Maximum Likelihood re-estimation of GMM-based acoustic model
Usage: gmm-est [options] <model-in> <stats-in> <model-out>
e.g.: gmm-est 1.mdl 1.acc 2.mdl
Options:
--binary : Write output in binary mode (bool, default = true)
--min-count : Minimum per-Gaussian count enforced while mixing up and down. (float, default = 20)
--min-gaussian-occupancy : MleDiagGmmOptions: Minimum occupancy to update a Gaussian. (float, default = 10)
--min-gaussian-weight : MleDiagGmmOptions: Min Gaussian weight before we remove it. (float, default = 1e-05)
--min-variance : MleDiagGmmOptions: Variance floor (absolute variance). (double, default = 0.001)
--mix-down : If nonzero, merge mixture components to this target. (int, default = 0)
--mix-up : Increase number of mixture components to this overall target. (int, default = 0)
--perturb-factor : While mixing up, perturb means by standard deviation times this factor. (float, default = 0.01)
--power : If mixing up, power to allocate Gaussians to states. (float, default = 0.2)
--remove-low-count-gaussians : MleDiagGmmOptions: If true, remove Gaussians that fall below the floors. (bool, default = true)
--share-for-pdfs : If true, share all transition parameters where the states have the same pdf. (bool, default = false)
--transition-floor : Floor for transition probabilities (float, default = 0.01)
--transition-min-count : Minimum count required to update transitions from a state (float, default = 5)
--update-flags : Which GMM parameters to update: subset of mvwt. (string, default = "mvwt")
--write-occs : File to write pdf occupation counts to. (string, default = "")
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# gmm-sum-accs - exp/mono/0.1.acc exp/mono/0.2.acc exp/mono/0.3.acc exp/mono/0.4.acc exp/mono/0.5.acc exp/mono/0.6.acc exp/mono/0.7.acc exp/mono/0.8.acc
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-sum-accs -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-sum-accs -h
Sum multiple accumulated stats files for GMM training.
Usage: gmm-sum-accs [options] <stats-out> <stats-in1> <stats-in2> ...
E.g.: gmm-sum-accs 1.acc 1.1.acc 1.2.acc
Options:
--binary : Write output in binary mode (bool, default = true)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# gmm-acc-stats-ali --binary=true exp/mono/0.mdl 'ark,s,cs:apply-cmvn --utt2spk=ark:data/mfcc/train/split8/5/utt2spk scp:data/mfcc/train/split8/5/cmvn.scp scp:data/mfcc/train/split8/5/feats.scp ark:- | add-deltas ark:- ark:- |' ark:- exp/mono/0.5.acc
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-acc-stats-ali -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-acc-stats-ali -h
Accumulate stats for GMM training.
Usage: gmm-acc-stats-ali [options] <model-in> <feature-rspecifier> <alignments-rspecifier> <stats-out>
e.g.:
gmm-acc-stats-ali 1.mdl scp:train.scp ark:1.ali 1.acc
Options:
--binary : Write output in binary mode (bool, default = true)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# add-deltas ark:- ark:-
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/featbin/add-deltas -h
/home/houwenbin/kaldi-master/src/featbin/add-deltas -h
Add deltas (typically to raw mfcc or plp features
Usage: add-deltas [options] in-rspecifier out-wspecifier
Options:
--delta-order : Order of delta computation (int, default = 2)
--delta-window : Parameter controlling window for delta computation (actual window size for each delta order is 1 + 2*delta-window-size) (int, default = 2)
--truncate : If nonzero, first truncate features to this dimension. (int, default = 0)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# apply-cmvn --utt2spk=ark:data/mfcc/train/split8/5/utt2spk scp:data/mfcc/train/split8/5/cmvn.scp scp:data/mfcc/train/split8/5/feats.scp ark:-
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/featbin/apply-cmvn -h
/home/houwenbin/kaldi-master/src/featbin/apply-cmvn -h
Apply cepstral mean and (optionally) variance normalization
Per-utterance by default, or per-speaker if utt2spk option provided
Usage: apply-cmvn [options] (<cmvn-stats-rspecifier>|<cmvn-stats-rxfilename>) <feats-rspecifier> <feats-wspecifier>
e.g.: apply-cmvn --utt2spk=ark:data/train/utt2spk scp:data/train/cmvn.scp scp:data/train/feats.scp ark:-
See also: modify-cmvn-stats, matrix-sum, compute-cmvn-stats
Options:
--norm-means : You can set this to false to turn off mean normalization. Note, the same can be achieved by using 'fake' CMVN stats; see the --fake option to compute_cmvn_stats.sh (bool, default = true)
--norm-vars : If true, normalize variances. (bool, default = false)
--reverse : If true, apply CMVN in a reverse sense, so as to transform zero-mean, unit-variance input into data with the given mean and variance. (bool, default = false)
--skip-dims : Dimensions for which to skip normalization: colon-separated list of integers, e.g. 13:14:15) (string, default = "")
--utt2spk : rspecifier for utterance to speaker map (string, default = "")
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
5、analyze_alignments
# ali-to-phones --write-lengths=true exp/mono/final.mdl 'ark:gunzip -c exp/mono/ali.1.gz|' ark,t:-
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/ali-to-phones -h
/home/houwenbin/kaldi-master/src/bin/ali-to-phones -h
Convert model-level alignments to phone-sequences (in integer, not text, form)
Usage: ali-to-phones [options] <model> <alignments-rspecifier> <phone-transcript-wspecifier|ctm-wxfilename>
e.g.:
ali-to-phones 1.mdl ark:1.ali ark:-
or:
ali-to-phones --ctm-output 1.mdl ark:1.ali 1.ctm
See also: show-alignments lattice-align-phones
Options:
--ctm-output : If true, output the alignments in ctm format (the confidences will be set to 1) (bool, default = false)
--frame-shift : frame shift used to control the times of the ctm output (float, default = 0.01)
--per-frame : If true, write out the frame-level phone alignment (else phone sequence) (bool, default = false)
--write-lengths : If true, write the #frames for each phone (different format) (bool, default = false)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# sum-tree-stats exp/tri1/treeacc exp/tri1/1.treeacc exp/tri1/2.treeacc exp/tri1/3.treeacc exp/tri1/4.treeacc exp/tri1/5.treeacc exp/tri1/6.treeacc exp/tri1/7.treeacc exp/tri1/8.treeacc
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/sum-tree-stats -h
/home/houwenbin/kaldi-master/src/bin/sum-tree-stats -h
Sum statistics for phonetic-context tree building.
Usage: sum-tree-stats [options] tree-accs-out tree-accs-in1 tree-accs-in2 ...
e.g.:
sum-tree-stats treeacc 1.treeacc 2.treeacc 3.treeacc
Options:
--binary : Write output in binary mode (bool, default = true)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# cluster-phones exp/tri1/treeacc data/lang/phones/sets.int exp/tri1/questions.int
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/cluster-phones -h
/home/houwenbin/kaldi-master/src/bin/cluster-phones -h
Cluster phones (or sets of phones) into sets for various purposes
Usage: cluster-phones [options] <tree-stats-in> <phone-sets-in> <clustered-phones-out>
e.g.:
cluster-phones 1.tacc phonesets.txt questions.txt
Options:
--central-position : Central position in context window [must match acc-tree-stats] (int, default = 1)
--context-width : Does not have any effect-- included for scripting convenience. (int, default = 3)
--mode : Mode of operation: "questions"->sets suitable for decision trees; "k-means"->k-means algorithm, output k classes (set num-classes options)
(string, default = "questions")
--num-classes : For k-means mode, number of classes. (int, default = -1)
--pdf-class-list : Colon-separated list of HMM positions to consider [Default = 1: just central position for 3-state models]. (string, default = "1")
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# gmm-mixup --mix-up=2000 exp/tri1/1.mdl exp/tri1/1.occs exp/tri1/1.mdl
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-mixup -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-mixup -h
Does GMM mixing up (and Gaussian merging)
Usage: gmm-mixup [options] <model-in> <state-occs-in> <model-out>
e.g. of mixing up:
gmm-mixup --mix-up=4000 1.mdl 1.occs 2.mdl
e.g. of merging:
gmm-mixup --merge=2000 1.mdl 1.occs 2.mdl
Options:
--binary : Write output in binary mode (bool, default = true)
--min-count : Minimum count enforced while mixing up. (float, default = 20)
--mix-down : If nonzero, merge mixture components to this target. (int, default = 0)
--mix-up : Increase number of mixture components to this overall target. (int, default = 0)
--perturb-factor : While mixing up, perturb means by standard deviation times this factor. (float, default = 0.01)
--power : If mixing up, power to allocate Gaussians to states. (float, default = 0.2)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# gmm-init-model --write-occs=exp/tri1/1.occs exp/tri1/tree exp/tri1/treeacc data/lang/topo exp/tri1/1.mdl
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-init-model -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-init-model -h
Initialize GMM from decision tree and tree stats
Usage: gmm-init-model [options] <tree-in> <tree-stats-in> <topo-file> <model-out> [<old-tree> <old-model>]
e.g.:
gmm-init-model tree treeacc topo 1.mdl
or (initializing GMMs with old model):
gmm-init-model tree treeacc topo 1.mdl prev/tree prev/30.mdl
Options:
--binary : Write output in binary mode (bool, default = true)
--var-floor : Variance floor used while initializing Gaussians (double, default = 0.01)
--write-occs : File to write state occupancies to. (string, default = "")
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# convert-ali exp/mono_ali/final.mdl exp/tri1/1.mdl exp/tri1/tree "ark:gunzip -c exp/mono_ali/ali.2.gz|" "ark:|gzip -c >exp/tri1/ali.2.gz"
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/convert-ali -h
/home/houwenbin/kaldi-master/src/bin/convert-ali -h
Convert alignments from one decision-tree/model to another
Usage: convert-ali [options] <old-model> <new-model> <new-tree> <old-alignments-rspecifier> <new-alignments-wspecifier>
e.g.:
convert-ali old/final.mdl new/0.mdl new/tree ark:old/ali.1 ark:new/ali.1
Options:
--frame-subsampling-factor : Can be used in converting alignments to reduced frame rates. (int, default = 1)
--phone-map : File name containing old->new phone mapping (each line is: old-integer-id new-integer-id) (string, default = "")
--reorder : True if you want the converted alignments to be 'reordered' versus the way they appear in the HmmTopology object (bool, default = true)
--repeat-frames : Only relevant when frame-subsampling-factor != 1. If true, repeat frames of alignment by 'frame-subsampling-factor' after alignment conversion, to keep the alignment the same length as the input alignment. (bool, default = false)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# compile-questions data/lang/topo exp/tri1/questions.int exp/tri1/questions.qst
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/compile-questions -h
/home/houwenbin/kaldi-master/src/bin/compile-questions -h
Compile questions
Usage: compile-questions [options] <topo> <questions-text-file> <questions-out>
e.g.:
compile-questions questions.txt questions.qst
Options:
--binary : Write output in binary mode (bool, default = true)
--central-position : Central position in phone context window [must match acc-tree-stats] (int, default = 1)
--context-width : Context window size [must match acc-tree-stats]. (int, default = 3)
--num-iters-refine : Number of iters of refining questions at each node. >0 --> questions not refined (int, default = 0)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# build-tree --verbose=1 --max-leaves=2000 --cluster-thresh=-1 exp/tri1/treeacc data/lang/phones/roots.int exp/tri1/questions.qst data/lang/topo exp/tri1/tree
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/build-tree -h
/home/houwenbin/kaldi-master/src/bin/build-tree -h
Train decision tree
Usage: build-tree [options] <tree-stats-in> <roots-file> <questions-file> <topo-file> <tree-out>
e.g.:
build-tree treeacc roots.txt 1.qst topo tree
Options:
--binary : Write output in binary mode (bool, default = true)
--central-position : Central position in context window [must match acc-tree-stats] (int, default = 1)
--cluster-thresh : Log-likelihood change threshold for clustering after tree-building. 0 means no clustering; -1 means use as a clustering threshold the likelihood change of the final split. (float, default = -1)
--context-width : Context window size [must match acc-tree-stats] (int, default = 3)
--max-leaves : Maximum number of leaves to be used in tree-buliding (if positive) (int, default = 0)
--thresh : Log-likelihood change threshold for tree-building (float, default = 300)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
二、tri2b
# transform-feats exp/tri2b/0.mat ark:- ark:-
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/featbin/transform-feats -h
/home/houwenbin/kaldi-master/src/featbin/transform-feats -h
Apply transform (e.g. LDA; HLDA; fMLLR/CMLLR; MLLT/STC)
Linear transform if transform-num-cols == feature-dim, affine if
transform-num-cols == feature-dim+1 (->append 1.0 to features)
Per-utterance by default, or per-speaker if utt2spk option provided
Global if transform-rxfilename provided.
Usage: transform-feats [options] (<transform-rspecifier>|<transform-rxfilename>) <feats-rspecifier> <feats-wspecifier>
See also: transform-vec, copy-feats, compose-transforms
Options:
--utt2spk : rspecifier for utterance to speaker map (string, default = "")
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# splice-feats --left-context=3 --right-context=3 ark:- ark:-
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/featbin/splice-feats -h
/home/houwenbin/kaldi-master/src/featbin/splice-feats -h
Splice features with left and right context (e.g. prior to LDA)
Usage: splice-feats [options] <feature-rspecifier> <feature-wspecifier>
e.g.: splice-feats scp:feats.scp ark:-
Options:
--left-context : Number of frames of left context (int, default = 4)
--right-context : Number of frames of right context (int, default = 4)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# LDA
# weight-silence-post 0.0 1 exp/tri1_ali/final.mdl ark:- ark:-
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/weight-silence-post -h
/home/houwenbin/kaldi-master/src/bin/weight-silence-post -h
Apply weight to silences in posts
Usage: weight-silence-post [options] <silence-weight> <silence-phones> <model> <posteriors-rspecifier> <posteriors-wspecifier>
e.g.:
weight-silence-post 0.0 1:2:3 1.mdl ark:1.post ark:nosil.post
Options:
--distribute : If true, rather than weighting the individual posteriors, apply the weighting to the whole frame: i.e. on time t, scale all posterior entries by p(sil)*silence-weight + p(non-sil)*1.0 (bool, default = false)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# ali-to-post 'ark:gunzip -c exp/tri1_ali/ali.7.gz|' ark:-
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/ali-to-post -h
/home/houwenbin/kaldi-master/src/bin/ali-to-post -h
Convert alignments to posteriors. This is simply a format change
from integer vectors to Posteriors, which are vectors of lists of
pairs (int, float) where the float represents the posterior. The
floats would all be 1.0 in this case.
The posteriors will still be in terms of whatever integer index
the input contained, which will be transition-ids if they came
directly from decoding, or pdf-ids if they were processed by
ali-to-post.
Usage: ali-to-post [options] <alignments-rspecifier> <posteriors-wspecifier>
e.g.:
ali-to-post ark:1.ali ark:1.post
See also: ali-to-pdf, ali-to-phones, show-alignments, post-to-weights
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# acc-lda --rand-prune=4.0 exp/tri1_ali/final.mdl 'ark,s,cs:apply-cmvn --utt2spk=ark:data/mfcc/train/split8/7/utt2spk scp:data/mfcc/train/split8/7/cmvn.scp scp:data/mfcc/train/split8/7/feats.scp ark:- | splice-feats --left-context=3 --right-context=3 ark:- ark:- |' ark,s,cs:- exp/tri2b/lda.7.acc
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/acc-lda -h
/home/houwenbin/kaldi-master/src/bin/acc-lda -h
Accumulate LDA statistics based on pdf-ids.
Usage: acc-lda [options] <transition-gmm/model> <features-rspecifier> <posteriors-rspecifier> <lda-acc-out>
Typical usage:
ali-to-post ark:1.ali ark:- | lda-acc 1.mdl "ark:splice-feats scp:train.scp|" ark:- ldaacc.1
Options:
--binary : Write accumulators in binary mode. (bool, default = true)
--rand-prune : Randomized pruning threshold for posteriors (float, default = 0)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# est-lda --write-full-matrix=exp/tri2b/full.mat --dim=40 exp/tri2b/0.mat exp/tri2b/lda.1.acc exp/tri2b/lda.2.acc exp/tri2b/lda.3.acc exp/tri2b/lda.4.acc exp/tri2b/lda.5.acc exp/tri2b/lda.6.acc exp/tri2b/lda.7.acc exp/tri2b/lda.8.acc
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/est-lda -h
/home/houwenbin/kaldi-master/src/bin/est-lda -h
Estimate LDA transform using stats obtained with acc-lda.
Usage: est-lda [options] <lda-matrix-out> <lda-acc-1> <lda-acc-2> ...
Options:
--allow-large-dim : If true, allow an LDA dimension larger than the number of classes. (bool, default = false)
--binary : Write matrix in binary mode. (bool, default = true)
--dim : Dimension to project to with LDA (int, default = 40)
--remove-offset : If true, output an affine transform that makes the projected data mean equal to zero. (bool, default = false)
--within-class-factor : (Deprecated) If 1.0, do conventional LDA where the within-class variance will be unit in the projected space. May be set to less than 1.0, which scales the features to have less variance, particularly for dimensions where between-class variance is small; this is a feature being experimented with for neural-net input. (float, default = 1)
--write-full-matrix : Write full LDA matrix to this location. (string, default = "")
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# gmm-acc-mllt --rand-prune=4.0 exp/tri2b/2.mdl 'ark,s,cs:apply-cmvn --utt2spk=ark:data/mfcc/train/split8/1/utt2spk scp:data/mfcc/train/split8/1/cmvn.scp scp:data/mfcc/train/split8/1/feats.scp ark:- | splice-feats --left-context=3 --right-context=3 ark:- ark:- | transform-feats exp/tri2b/0.mat ark:- ark:- |' ark:- exp/tri2b/2.1.macc
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-acc-mllt -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-acc-mllt -h
Accumulate MLLT (global STC) statistics
Usage: gmm-acc-mllt [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <stats-out>
e.g.:
gmm-acc-mllt 1.mdl scp:train.scp ark:1.post 1.macc
Options:
--binary : Write output in binary mode (bool, default = true)
--rand-prune : Randomized pruning parameter to speed up accumulation (larger -> more pruning. May exceed one). (float, default = 0.25)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# gmm-transform-means exp/tri2b/2.mat.new exp/tri2b/2.mdl exp/tri2b/2.mdl
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-transform-means -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-transform-means -h
Transform GMM means with linear or affine transform
Usage: gmm-transform-means <transform-matrix> <model-in> <model-out>
e.g.: gmm-transform-means 2.mat 2.mdl 3.mdl
Options:
--binary : Write output in binary mode (bool, default = true)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# gmm-est-fmllr --fmllr-update-type=full --spk2utt=ark:data/mfcc/train/split8/8/spk2utt exp/tri3b/12.mdl 'ark,s,cs:apply-cmvn --utt2spk=ark:data/mfcc/train/split8/8/utt2spk scp:data/mfcc/train/split8/8/cmvn.scp scp:data/mfcc/train/split8/8/feats.scp ark:- | splice-feats --left-context=3 --right-context=3 ark:- ark:- | transform-feats exp/tri2b_ali/final.mat ark:- ark:- | transform-feats --utt2spk=ark:data/mfcc/train/split8/8/utt2spk ark:exp/tri3b/trans.8 ark:- ark:- |' ark:- ark:exp/tri3b/tmp_trans.8
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-est-fmllr -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-est-fmllr -h
Estimate global fMLLR transforms, either per utterance or for the supplied
set of speakers (spk2utt option). Reads posteriors (on transition-ids). Writes
to a table of matrices.
Usage: gmm-est-fmllr [options] <model-in> <feature-rspecifier> <post-rspecifier> <transform-wspecifier>
Options:
--fmllr-min-count : Minimum count required to update fMLLR (float, default = 500)
--fmllr-num-iters : Number of iterations in fMLLR update phase. (int, default = 40)
--fmllr-update-type : Update type for fMLLR ("full"|"diag"|"offset"|"none") (string, default = "full")
--spk2utt : rspecifier for speaker to utterance-list map (string, default = "")
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# compose-transforms --b-is-affine=true ark:exp/tri3b/tmp_trans.8 ark:exp/tri3b/trans.8 ark:exp/tri3b/composed_trans.8
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/featbin/compose-transforms -h
/home/houwenbin/kaldi-master/src/featbin/compose-transforms -h
Compose (affine or linear) feature transforms
Usage: compose-transforms [options] (<transform-A-rspecifier>|<transform-A-rxfilename>) (<transform-B-rspecifier>|<transform-B-rxfilename>) (<transform-out-wspecifier>|<transform-out-wxfilename>)
Note: it does matrix multiplication (A B) so B is the transform that gets applied
to the features first. If b-is-affine = true, then assume last column of b corresponds to offset
e.g.: compose-transforms 1.mat 2.mat 3.mat
compose-transforms 1.mat ark:2.trans ark:3.trans
compose-transforms ark:1.trans ark:2.trans ark:3.trans
See also: transform-feats, transform-vec, extend-transform-dim, est-lda, est-pca
Options:
--b-is-affine : If true, treat last column of transform b as an offset term (only relevant if a is affine) (bool, default = false)
--binary : Write in binary mode (only relevant if output is a wxfilename) (bool, default = true)
--utt2spk : rspecifier for utterance to speaker map (if mixing utterance and speaker ids) (string, default = "")
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# gmm-acc-stats-twofeats exp/tri3b/35.mdl 'ark,s,cs:apply-cmvn --utt2spk=ark:data/mfcc/train/split8/8/utt2spk scp:data/mfcc/train/split8/8/cmvn.scp scp:data/mfcc/train/split8/8/feats.scp ark:- | splice-feats --left-context=3 --right-context=3 ark:- ark:- | transform-feats exp/tri2b_ali/final.mat ark:- ark:- | transform-feats --utt2spk=ark:data/mfcc/train/split8/8/utt2spk ark:exp/tri3b/trans.8 ark:- ark:- |' 'ark,s,cs:apply-cmvn --utt2spk=ark:data/mfcc/train/split8/8/utt2spk scp:data/mfcc/train/split8/8/cmvn.scp scp:data/mfcc/train/split8/8/feats.scp ark:- | splice-feats --left-context=3 --right-context=3 ark:- ark:- | transform-feats exp/tri2b_ali/final.mat ark:- ark:- |' ark,s,cs:- exp/tri3b/35.8.acc
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-acc-stats-twofeats -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-acc-stats-twofeats -h
Accumulate stats for GMM training, computing posteriors with one set of features
but accumulating statistics with another.
First features are used to get posteriors, second to accumulate stats
Usage: gmm-acc-stats-twofeats [options] <model-in> <feature1-rspecifier> <feature2-rspecifier> <posteriors-rspecifier> <stats-out>
e.g.:
gmm-acc-stats-twofeats 1.mdl 1.ali scp:train.scp scp:train_new.scp ark:1.ali 1.acc
Options:
--binary : Write output in binary mode (bool, default = true)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# gmm-mixup --mix-down=20000 --mix-up=20000 exp/tri4b/tmp.mdl exp/tri4b/1.occs exp/tri4b/1.mdl
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-mixup -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-mixup -h
Does GMM mixing up (and Gaussian merging)
Usage: gmm-mixup [options] <model-in> <state-occs-in> <model-out>
e.g. of mixing up:
gmm-mixup --mix-up=4000 1.mdl 1.occs 2.mdl
e.g. of merging:
gmm-mixup --merge=2000 1.mdl 1.occs 2.mdl
Options:
--binary : Write output in binary mode (bool, default = true)
--min-count : Minimum count enforced while mixing up. (float, default = 20)
--mix-down : If nonzero, merge mixture components to this target. (int, default = 0)
--mix-up : Increase number of mixture components to this overall target. (int, default = 0)
--perturb-factor : While mixing up, perturb means by standard deviation times this factor. (float, default = 0.01)
--power : If mixing up, power to allocate Gaussians to states. (float, default = 0.2)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
#
#
#
三、解码:
# nnet3-latgen-faster --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/chain/tdnn/graph/words.txt exp/chain/tdnn/final.mdl exp/chain/tdnn/graph/HCLG.fst 'ark,s,cs:apply-cmvn --norm-means=true --norm-vars=false --utt2spk=ark:data/fbank/test/split1/1/utt2spk scp:data/fbank/test/split1/1/cmvn.scp scp:data/fbank/test/split1/1/feats.scp ark:- |' 'ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- | gzip -c >exp/chain/tdnn/decode_test/lat.1.gz'
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-latgen-faster -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-latgen-faster -h
Generate lattices using GMM-based model.
Usage: gmm-latgen-faster [options] model-in (fst-in|fsts-rspecifier) features-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ]
Options:
--acoustic-scale : Scaling factor for acoustic likelihoods (float, default = 0.1)
--allow-partial : If true, produce output even if end state was not reached. (bool, default = false)
--beam : Decoding beam. Larger->slower, more accurate. (float, default = 16)
--beam-delta : Increment used in decoding-- this parameter is obscure and relates to a speedup in the way the max-active constraint is applied. Larger is more accurate. (float, default = 0.5)
--delta : Tolerance used in determinization (float, default = 0.000976562)
--determinize-lattice : If true, determinize the lattice (lattice-determinization, keeping only best pdf-sequence for each word-sequence). (bool, default = true)
--hash-ratio : Setting used in decoder to control hash behavior (float, default = 2)
--lattice-beam : Lattice generation beam. Larger->slower, and deeper lattices (float, default = 10)
--max-active : Decoder max active states. Larger->slower; more accurate (int, default = 2147483647)
--max-mem : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000)
--min-active : Decoder minimum #active states. (int, default = 200)
--minimize : If true, push and minimize after determinization. (bool, default = false)
--phone-determinize : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true)
--prune-interval : Interval (in frames) at which to prune tokens (int, default = 25)
--word-determinize : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true)
--word-symbol-table : Symbol table for words [for debug output] (string, default = "")
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
# lattice-scale --acoustic-scale=10.0 ark:- ark:-
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/latbin/lattice-scale -h
/home/houwenbin/kaldi-master/src/latbin/lattice-scale -h
Apply scaling to lattice weights
Usage: lattice-scale [options] lattice-rspecifier lattice-wspecifier
e.g.: lattice-scale --lm-scale=0.0 ark:1.lats ark:scaled.lats
Options:
--acoustic-scale : Scaling factor for acoustic likelihoods (float, default = 1)
--acoustic2lm-scale : Add this times original acoustic costs to LM costs (float, default = 0)
--inv-acoustic-scale : An alternative way of setting the acoustic scale: you can set its inverse. (float, default = 1)
--lm-scale : Scaling factor for graph/lm costs (float, default = 1)
--lm2acoustic-scale : Add this times original LM costs to acoustic costs (float, default = 0)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
[houwenbin@localhost online_demo]$
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/latbin/lattice-scale -h
/home/houwenbin/kaldi-master/src/latbin/lattice-scale -h
Apply scaling to lattice weights
Usage: lattice-scale [options] lattice-rspecifier lattice-wspecifier
e.g.: lattice-scale --lm-scale=0.0 ark:1.lats ark:scaled.lats
Options:
--acoustic-scale : Scaling factor for acoustic likelihoods (float, default = 1)
--acoustic2lm-scale : Add this times original acoustic costs to LM costs (float, default = 0)
--inv-acoustic-scale : An alternative way of setting the acoustic scale: you can set its inverse. (float, default = 1)
--lm-scale : Scaling factor for graph/lm costs (float, default = 1)
--lm2acoustic-scale : Add this times original LM costs to acoustic costs (float, default = 0)
Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
先这些,想起了再补充吧!!!
更多推荐
所有评论(0)