CNTK API文档翻译(23)——使用CTC标准训练声学模型
本教程假定所有读者都完成了前10期教程,并且对声学建模的数据形式有基础的了解。本教程介绍了CNTK种可以用于训练以CTC(Connectionist Temporal Classification)训练准则为例的语音识别深度神经网络的模块。介绍CNTK实现的CTC基于A. Graves等人发表的论文“Connectionist temporal classification: labeling u
本教程假定所有读者都完成了前10期教程,并且对声学建模的数据形式有基础的了解。本教程介绍了CNTK种可以用于训练以CTC(Connectionist Temporal Classification)训练准则为例的语音识别深度神经网络的模块。
介绍
CNTK实现的CTC基于A. Graves等人发表的论文“Connectionist temporal classification: labeling unsegmented sequence data with recurrent neural networks”。CTC是一个在序列训练任务比如语音或文字中常用的训练标准,他既不需要对训练数据进行分割,也不需要将训练的结果转换成标签。所以CTC在实现最优解时显著的简化了训练和解码过程。
CTC训练在GPU或者CPU上同事运行好几个序列,以达到电脑硬件的最大利用率。
导入库以及硬件设置
import os
import cntk as C
import numpy as np
# Select the right target device
import cntk.tests.test_utils
cntk.tests.test_utils.set_device_from_pytest_env() # (only needed for our build system)
data_dir = os.path.join("..", "Tests", "EndToEndTests", "Speech", "Data")
print("Current directory {0}".format(os.getcwd()))
if os.path.exists(data_dir):
if os.path.realpath(data_dir) != os.path.realpath(os.getcwd()):
os.chdir(data_dir)
print("Changed to data directory {0}".format(data_dir))
else:
print("Data directory not available locally. Downloading data.")
try:
from urllib.request import urlretrieve
except ImportError:
from urllib import urlretrieve
for dir in ['GlobalStats', 'Features']:
if not os.path.exists(dir):
os.mkdir(dir)
for file in ['glob_0000.scp', 'glob_0000.write.scp', 'glob_0000.mlf', 'state_ctc.list', 'GlobalStats/mean.363', 'GlobalStats/var.363', 'Features/000000000.chunk']:
if os.path.exists(file):
print('Already downloaded %s' % file)
else:
print('Downloading %s' % file)
urlretrieve('https://github.com/Microsoft/CNTK/raw/release/2.1/Tests/EndToEndTests/Speech/Data/%s' % file, file)
数据准备
CNTK使用HTK/MLF格式数据用作声学模型训练数据,一般来说需要三个输入文件。
- 具有特征值的SCP文件:SCP文件包含发音ID和对应特征文件之间的映射。
- 具有标签值的MLF文件:MLF(master label file)文件是一个将录音表示成特征值的传统格式。尽管引用的MLF文件包含标签范围,不过在CTC训练时不会需要他。要了解更多于此相关的信息,请阅读HTK文档(地址:http://www1.icsi.berkeley.edu/Speech/docs/HTKBook3.2/)
- 标签列表文件:文件包含训练数据集中所有标签的列表。CTC需要的空白标签位于文件末尾第132行出(从0开始数)。
CNTK为声学特征数据和标签数据提供了稳定高效的数据读取器HTKFeatureDeserializer/HTKMLFDeserializer。这些读取器遵循配置优先原则,大大简化了训练流程。这些读取器会考虑数据读取,GPU/CPU异步等各种优化,从而大大加快模型训练速度。
# Type of features/labels and dimensions are application specific
# Here we use rather small dimensional feature and the label set for the sake of keeping the train set compact.
feature_dimension = 33
feature = C.sequence.input((feature_dimension))
label_dimension = 133
label = C.sequence.input((label_dimension))
train_feature_filepath = "glob_0000.scp"
train_label_filepath = "glob_0000.mlf"
mapping_filepath = "state_ctc.list"
try:
train_feature_stream = C.io.HTKFeatureDeserializer(
C.io.StreamDefs(speech_feature = C.io.StreamDef(shape = feature_dimension, scp = train_feature_filepath)))
train_label_stream = C.io.HTKMLFDeserializer(
mapping_filepath, C.io.StreamDefs(speech_label = C.io.StreamDef(shape = label_dimension, mlf = train_label_filepath)), True)
train_data_reader = C.io.MinibatchSource([train_feature_stream, train_label_stream], frame_mode = False)
train_input_map = {feature: train_data_reader.streams.speech_feature, label: train_data_reader.streams.speech_label}
except RuntimeError:
print ("ERROR: not able to read features or labels")
标准化特征数据,使用LSTM层定义网络
我们将存储在不同文件中的输入的特征数据减去一个均值矢量,然后乘以标准差的倒数,将其标准化为均值为0,方差为1。
feature_mean = np.fromfile(os.path.join("GlobalStats", "mean.363"), dtype=float, count=feature_dimension)
feature_inverse_stddev = np.fromfile(os.path.join("GlobalStats", "var.363"), dtype=float, count=feature_dimension)
feature_normalized = (feature - feature_mean) * feature_inverse_stddev
with C.default_options(activation=C.sigmoid):
z = C.layers.Sequential([
C.layers.For(range(3), lambda: C.layers.Recurrence(C.layers.LSTM(1024))),
C.layers.Dense(label_dimension)
])(feature_normalized)
定义训练参数,准则函数和误差函数
CTC准则函数通过结合labels_to_graph函数和forward_backward函数实现,这个函数设计用来归纳在时序建模问题中经常使用的双向类Viterbi函数。labels_to_graph函数用来将输入的标签序列转换成可以用于特定双向操作的计算图形式,forward_backward函数执行程序本省,目前这两个函数值支持CTC的默认配置。
mbsize = 1024
mbs_per_epoch = 10
max_epochs = 5
criteria = C.forward_backward(C.labels_to_graph(label), z, blankTokenId=132, delayConstraint=3)
err = C.edit_distance_error(z, label, squashInputs=True, tokensToIgnore=[132])
# Learning rate parameter schedule per sample:
# Use 0.01 for the first 3 epochs, followed by 0.001 for the remaining
lr = C.learning_rate_schedule([(3, .01), (1,.001)], C.UnitType.sample)
mm = C.momentum_schedule([(1000, 0.9), (0, 0.99)], mbsize)
learner = C.momentum_sgd(z.parameters, lr, mm)
trainer = C.Trainer(z, (criteria, err), learner)
训练和保存模型
C.logging.log_number_of_parameters(z)
progress_printer = C.logging.progress_print.ProgressPrinter(tag='Training', num_epochs = max_epochs)
for epoch in range(max_epochs):
for mb in range(mbs_per_epoch):
minibatch = train_data_reader.next_minibatch(mbsize, input_map = train_input_map)
trainer.train_minibatch(minibatch)
progress_printer.update_with_trainer(trainer, with_metric = True)
print('Trained on a total of ' + str(trainer.total_number_of_samples_seen) + ' frames')
progress_printer.epoch_summary(with_metric = True)
# Uncomment to save the model
# z.save('CTC_' + str(max_epochs) + 'epochs_' + str(mbsize) + 'mbsize_' + str(mbs_per_epoch) + 'mbs.model')
评估模型
test_feature_filepath = "glob_0000.write.scp"
test_feature_stream = C.io.HTKFeatureDeserializer(
C.io.StreamDefs(speech_feature = C.io.StreamDef(shape = feature_dimension, scp = test_feature_filepath)))
test_data_reader = C.io.MinibatchSource([test_feature_stream, train_label_stream], frame_mode = False)
test_input_map = {feature: test_data_reader.streams.speech_feature, label: test_data_reader.streams.speech_label}
num_test_minibatches = 2
test_result = 0.0
for i in range(num_test_minibatches):
test_minibatch = test_data_reader.next_minibatch(mbsize, input_map = test_input_map)
eval_error = trainer.test_minibatch(test_minibatch)
test_result = test_result + eval_error
# Average of evaluation errors of all test minibatches
round(test_result / num_test_minibatches,2)
欢迎扫码关注我的微信公众号获取最新文章
更多推荐
所有评论(0)