HMM算法在语音识别中的应用——算法学习

总体框架输入Treat acoustic(听觉的) input O as sequence of individual observationsO=o1,o2,...,otO=o_1,o_2,...,o_t结果Define sentence as a sequence of wordsW=w1,w2,...,wnW=w_1,w_2,...,w_n判断模型最大概率：W=arg maxW∈L

yzbx

5008人浏览 · 2016-06-15 14:04:37

yzbx · 2016-06-15 14:04:37 发布

总体框架

这里写图片描述

输入

Treat acoustic(听觉的) input O as sequence of individual observations

O = o 1, o 2, . . ., o t

$O=o_1,o_2,...,o_t$

结果

Define sentence as a sequence of words

W = w 1, w 2, . . ., w n

$W=w_1,w_2,...,w_n$

判断模型

最大概率： $W=\mathop{arg \ max}_{W \in L}{\ P(W|O)}$
贝叶斯： $W=\mathop{arg \ max}_{W \in L}{\ \frac {P(O|W)P(W)} {P(O)}}$
化简：由于 $P(O)$ 对所有 $W$ 一样， $W=\mathop{arg \ max}_{W \in L}{\ {P(O|W)P(W)} }$

模型

这里写图片描述

Feature Extraction: 39 “MFCC” features
Acoustic Model: Gaussians for computing p(o|q)
Lexicon(词典)/Pronunciation(发音) Model: HMM, what phones can follow each other
Language Model: N-grams for computing $p(w_i|w_{i-1})$

markov chian

states: $Q = q_1,q_2,...,q_N$ , $q_t$ is the state at time t.
transition probability: A=[a11,a12,...,aNN]
- $a_{ij}$ is the probability of trasition from i to j.
- $a_{ij}=P(q_{t-1}=i|q_t=j)$
- $\sum_{j=1}^N {a_{ij}} = 1$ , for $i \in [1,N]$
markov assumption:
- $P(q_i|q_1q_2...q_{i-1})=P(q_i|q_{i-1})$
initial status
- $\pi _i =P(q_1=i)$
- $\sum_{j=1}^N {\pi _j}=1$