未来应该不太接触语音识别,因此这次作业我就简单实现了下触发字识别模型以及随机音频插入实现数据增强。

首先导入正例、负例以及背景数据:

import os
import numpy as np
import td_utils
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'  # 忽略警告

rate,data=td_utils.get_wav_info("audio_examples/example_train.wav")
np.random.seed(5)
#print(data.shape)
#x=td_utils.graph_spectrogram("audio_examples/example_train.wav")
#print(x.shape)
#n_freq ,Tx = x.shape # 在频谱图的每个时间步输入模型的频率数  从频谱图输入到模型的时间步数
#print(n_freq,Tx)
Ty=1375
activates, negatives, backgrounds=td_utils.load_raw_audio()
#print(len(activates))
#print(len(negatives))
#print(len(backgrounds))

随机插入音频数据,主要分4步:

1.get_random_time_segment(segment_ms) 从背景音频中获取随机时间段

2.is_overlapping(segment_time, existing_segments) 检查时间段是否与现有段重叠

3.insert_audio_clip(background, audio_clip, existing_times) 在背景音频中随机插入音频片段 使用 get_random_time_segment 和 is_overlapping

4.insert_ones(y, segment_end_ms) 在 “activate” 一词后面的标签向量y中插入1 

def get_random_time_segment(segment_ms):
    segment_start = np.random.randint(low=0, high=10000 - segment_ms)
    segment_end = segment_start + segment_ms - 1
    return (segment_start, segment_end)


def is_overlapping(segment_time, existing_segments):
    overlapped = False
    segment_start, segment_end = segment_time
    for existing_segment_start, existing_segment_end in existing_segments:
        if segment_start <= existing_segment_end and segment_end >= existing_segment_start:
            overlapped = True
            break
    return overlapped


def insert_audio_clip(background, audio_clip, previous_segments):
    """
    background -- 10秒背景录音。
    audio_clip -- 要插入/叠加的音频剪辑。
    previous_segments -- 已放置的音频片段的时间
    """
    segment_time=get_random_time_segment(len(audio_clip))
    while is_overlapping(segment_time,previous_segments):
        segment_time = get_random_time_segment(len(audio_clip))

    # 第三步: 将新的 segment_time 添加到 previous_segments 列表中 (≈ 1 line)  遗忘步骤
    previous_segments.append(segment_time)

    new_background=background.overlay(audio_clip, position = segment_time[0])
    return new_background,previous_segments,segment_time


def insert_ones(y,segment_end_ms):  #触发字说完后反应,因此需要片段结束后
    """"
    y -- numpy数组的维度 (1, Ty), 训练样例的标签
    ty和tx并不一致,采用公式segment_end_y = int(segment_end_ms * Ty / Tx)
    segment_end_ms -- 以ms为单位的段的结束时间
    返回:
    y -- 更新标签"""
    end_time=int(segment_end_ms*Ty/10000)
    if end_time+51<Ty:
        y[0,end_time+1:end_time+51]=1
    elif end_time+1<Ty:
        y[0,end_time+1:Ty]=1

    return y

 在背景中随机插入0~4次正例,0~2次负例,并输出结果。

def create_training_example(background, activates, negatives):
    y=np.zeros((1,Ty))
    previous_segments = []
    activates_times=np.random.randint(0,5)# 正例插入次数
    negatives_times=np.random.randint(0,3)# 负例插入次数
    #print(negatives_times)
    activates_index=np.random.randint(0,len(activates),activates_times)
    negatives_index = np.random.randint(0, len(negatives), negatives_times)
    for i in range(activates_times):
        background,previous_segments,segment_time=insert_audio_clip(background,activates[activates_index[i]],previous_segments)
        segment_start,segment_end=segment_time
        insert_ones(y,segment_end)
    for j in range(negatives_times):
        background, previous_segments,segment_time = insert_audio_clip(background, negatives[negatives_index[j]], previous_segments)
        segment_start, segment_end = segment_time
        insert_ones(y, segment_end)
    background.export("train" + ".wav", format="wav")
    x = td_utils.graph_spectrogram("train.wav")
    return x, y

x,y=create_training_example(backgrounds[0],activates,negatives)
plt.show()


下面是模型图例

 根据图片完成模型搭建:

import keras
import numpy as np

from td_utils import *

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'  # 忽略警告

X=np.load("./XY_train/X.npy")
Y=np.load("./XY_train/Y.npy")
X_dev=np.load("./XY_dev/X_dev.npy")
Y_dev=np.load("./XY_dev/Y_dev.npy")
print(X.shape,Y.shape,X_dev.shape,Y_dev.shape)


def model(x):
     x_input = keras.layers.Input(shape=(x.shape[1], x.shape[2]))
     conv1 = keras.layers.Conv1D(filters=196, kernel_size=15, strides=4)(x_input)
     BN1=keras.layers.BatchNormalization()(conv1)
     a1=keras.layers.Activation(activation="relu")(BN1)
     dp1=keras.layers.Dropout(0.8)(a1)

     Gru1=keras.layers.GRU(128,return_sequences=True)(dp1)
     dp2=keras.layers.Dropout(0.8)(Gru1)
     BN2=keras.layers.BatchNormalization()(dp2)

     Gru2 = keras.layers.GRU(128, return_sequences=True)(BN2)
     dp3=keras.layers.Dropout(0.8)(Gru2)
     BN3=keras.layers.BatchNormalization()(dp3)
     dp3_2 = keras.layers.Dropout(0.8)(BN3)

     FC1=keras.layers.Dense(1)(dp3_2)
     y_hat=keras.layers.Activation(activation="sigmoid")(FC1)

     model=keras.Model(x_input,y_hat)
     return model

model=model(X)
model.summary()
model.compile(optimizer=keras.optimizers.Adam(),loss="binary_crossentropy",metrics=["accuracy"])
model.fit(X,Y,epochs=500)
model.save("week3_model.h5")
model.save_weights("week3_weight.h5")

 运行结果:

Epoch 1/500

26/26 [==============================] - 5s 187ms/step - loss: 1.2840 - acc: 0.4984

Epoch 2/500

26/26 [==============================] - 2s 73ms/step - loss: 1.2109 - acc: 0.5087
...


Epoch 500/500

26/26 [==============================] - 2s 78ms/step - loss: 0.1161 - acc: 0.9687


       历时一个多月,吴恩达深度学习的视频以及作业全部完成了。通过这段时间的学习,我初步了解了深度学习的常用算法。在开学前的时间,打算看完《学习opencv》和《统计学习方法》,去了解图像处理的传统方法和深度学习的数学原理。

Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐