TensorFlow入门教程(20)语音识别固化模型和应用

##作者：韦访#博客：https://blog.csdn.net/rookie_wei#微信：1007895847#添加微信的备注一下是CSDN的#欢迎大家一起学习#------韦访201904301、概述好多网友说语音识别的模型训练完了，不知道怎么用，毕设做这个语音识别，要毕不了业啦，让我抽空写个测试代码啊。我只想说，好了，开玩笑，我不开车。今天五一放假有时间，...

Fang Wei

4242人浏览 · 2019-05-01 00:25:51

__Fang Wei__ · 2019-05-01 00:25:51 发布

#
#作者：韦访
#博客：https://blog.csdn.net/rookie_wei
#微信：1007895847
#添加微信的备注一下是CSDN的
#欢迎大家一起学习
#

1、概述

上三讲，我们将语音识别的模型训练出来了，模型训练完以后总得拿来用啊，所以，这一讲，我们就来固化模型，并应用。

环境配置：

操作系统：Win10 64位

显卡：GTX 1080ti

Python：Python3.7

TensorFlow：1.15.0

2、固化模型

固化模型很简单，我们只要将cpkt格式的模型转成pb格式的即可。

参考博客：

https://blog.csdn.net/rookie_wei/article/details/90546290

我这里就直接给出代码，

def main(argv=None):
 
    #看指定路径有没有我们要用的ckpt模型，没有就退出
    save_path = 'model'
    save_file = os.path.join(save_path, 'birnn_speech_recognition.cpkt-170.meta')
    if os.path.exists(save_file) is False:
        print('Not found ckpt file!')
        exit()
    
    #我们要保存的pb模型的文件名
    savePbFile = os.path.join(save_path, 'birnn_speech_recognition.pb')
    
    with tf.Session() as sess:
        # 加载图
        saver = tf.train.import_meta_graph(save_file)
    
        # 使用最后一次保存的
        saver.restore(sess, tf.train.latest_checkpoint(save_path))
    
        # 我们要固化哪些tensor
        output_graph_def = graph_util.convert_variables_to_constants(
            sess=sess,
            input_graph_def= sess.graph_def,
            output_node_names=['input', 'seq_length', 'keep_dropout', 'pred']
        )
    
        # 保存
        with tf.gfile.GFile(savePbFile, 'wb') as fd:
            fd.write(output_graph_def.SerializeToString())

执行上述代码以后，如果成功的话，会在model文件夹下生成birnn_speech_recognition.pb文件。

这样，我们固化的工作就做好了。

3、应用

接下来就要使用固化后的模型了，因为我们上一讲中将所有字符都存到characters.txt文件中了，所有我们就不再需要几个G的数据库了，只需要将characters.txt文件中的字符导入到列表中即可，代码如下，

words, _ = load_words_table_()

接着就导入pb模型，从模型中找到我们上面固化的那几个tensor，

# 打开pb模型文件
with gfile.FastGFile(save_pb_file, 'rb') as fd:
	# 导入图
	graph_def = tf.GraphDef()
	graph_def.ParseFromString(fd.read())
	sess.graph.as_default()
	tf.import_graph_def(graph_def, name='')

	# 根据名字获取对应的tensorflow
	input = sess.graph.get_tensor_by_name('input:0')
	seq_length = sess.graph.get_tensor_by_name('seq_length:0')
	dropout = sess.graph.get_tensor_by_name('keep_dropout:0')
	pred = sess.graph.get_tensor_by_name('pred:0')

然后，使用CTC decoder，

# 使用CTC decoder
decoder, _ = ctc_ops.ctc_beam_search_decoder(pred, seq_length, merge_repeated=False)

接着将稀疏矩阵转为稠密矩阵，

# 将稀疏矩阵转为稠密矩阵
dense_decoder = tf.sparse_tensor_to_dense(decoder[0], default_value=0)

获取要识别的语音文件的mfcc特征，

#获取要识别的语音文件的mfcc特征
source, source_lengths = get_mfcc(argv[1])

万事准备好了，开始计算吧，

#开始计算          
dense_decoded = sess.run(dense_decoder, feed_dict={input: source, seq_length: source_lengths, dropout: 1.0})

最后，输出识别结果，

# 输出识别结果   
dense_decoded = np.asarray(dense_decoded, dtype=np.int32)                     
if (len(dense_decoded) > 0):
    decoded_str = dense_to_text(dense_decoded[0], words)
    print('Decoded:  {}'.format(decoded_str))

执行以下命令运行上面的代码，对A11_100.wav文件进行语音识别，