背景:

客服数据类型:双声道(客户、工程师各占一个通道)、8KHz采样频率、wav格式。ASR引擎的类型要求:单通道、16KHz、pcm格式。

三个待解决问题:

1、通道分离 (双声道->单声道)

方案一、

# -*- coding:utf-8 -*-
# !/usr/bin/env python

'''
@Author: weifg
@Create date: 2019.05.31
@Description:
    1 split two channels to two single-channel
'''
import numpy as np
import sys
from scipy.io import wavfile


def split_channel(wav_path, left_wav_path, right_wav_path):
    try:
        sampleRate, wavData = wavfile.read(wav_path)
        left = []
        right = []
        for item in wavData:
            left.append(item[0])
            right.append(item[1])
        wavfile.write(left_wav_path, sampleRate, np.array(left))
        wavfile.write(right_wav_path, sampleRate, np.array(right))

    except IOError as e:
        print('error is %s' % str(e))
    except:
        print('other error', sys.exc_info())

split_channel('wavfile1.wav', 'wavfile2.wav', 'wavfile3.wav')

方案二、ffmpeg命令 (推荐)

ffmpeg -i wavfile.wav -map_channel 0.0.0 left.wav -map_channel 0.0.1 right.wav

参考(https://superuser.com/questions/1063185/how-to-select-the-left-audio-channel-with-ffmpeg-and-downmix-to-mono)

2、重采样(这里具体为upsample, 8KHz 转 16KHz)

方案一、ffmpeg 工具(适用离线语音文件),具体使用规则可参考(https://cloud.baidu.com/doc/SPEECH/ASR-Tool.html#.E8.BD.AC.E6.8D.A2.E5.91.BD.E4.BB.A4.E7.A4.BA.E4.BE.8B

 ffmpeg -i wavfile_8.wav  -ac 1 -ar 16000 wavfile_16.wav (单声道upsample)

方案二、sox 命令 (linux 系统命令)

sox   wav_file_8.wav   -r   16000  wav_file_16.wav

方案三、调用用python库(适合语音流重采样)

库一、 scipy.signal.resample

库二、librosa 

import librosa

filename = 'wav_file_8.wav'
newFilename = 'wav_file_16.wav'

y, sr = librosa.load(filename, sr=8000)
y_16 = librosa.resample(y,sr,16000)

librosa.output.write_wav(newFilename, y_16, 16000)

库三、pandas resample

方案四、c++代码 参考自 https://www.cnblogs.com/cpuimage/p/9270739.html

3、音频格式转换  (wav->pcm)

解决方案:

方案一、ffmpeg 工具

ffmpeg -i wav_file_16.wav -f s16le -ac 1 -ar 16000 pcm_file_16.pcm

方案二、c++

Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐