【专栏精选】实战：百度语音识别

本文节选自洪流学堂公众号技术专栏《大话Unity2019》，未经允许不可转载。洪流学堂公众号回复语音识别获取源码工程。洪流学堂，让你快人几步。你好，我是郑洪智。大智：“今天给你来点刺激的。”小新满面红光：“啥刺激的？人家可还是个小孩子。”大智：“带你实战！”小新：“啊？智哥你变了！”大智：“变是不可能的，带你实战百度语音识别。”小新：“嗨，您这说话能不大喘气不，我还以为...

大智_洪流学堂

983人浏览 · 2019-05-13 11:30:30

大智_洪流学堂 · 2019-05-13 11:30:30 发布

本文节选自洪流学堂公众号技术专栏《大话Unity2019》，未经允许不可转载。

洪流学堂公众号回复语音识别获取源码工程。

洪流学堂，让你快人几步。你好，我是郑洪智。

大智：“今天给你来点刺激的。”
小新满面红光：“啥刺激的？人家可还是个小孩子。”
大智：“带你实战！”
小新：“啊？智哥你变了！”
大智：“变是不可能的，带你实战百度语音识别。”
小新：“嗨，您这说话能不大喘气不，我还以为是那个啥呢”
大智：“你说的那个啥我可不懂啊，走了上战场。”

HTTP实战：百度语音识别

大智：“百度语音识别用来学习HTTP通信是个特别好的例子。”

首先要做一些准备工作：

http://ai.baidu.com/，首先你需要一个百度账号
登陆进去以后，在https://console.bce.baidu.com/页面左侧点击百度语音。

创建一个新应用，不需要设置语音包名，因为我们要使用HTTP接口，而不是用SDK。

在应用详情中可以看到API Key和Secret Key，一会需要用到。

阅读文档！阅读文档！阅读文档！
语音识别：https://ai.baidu.com/docs#/ASR-API/top
语音合成：https://ai.baidu.com/docs#/TTS-API/top

语音识别

大智：“看完文档了没？”
小新：“看完了”
大智：“那我们就开始了。”

语音识别主要有两个过程：

鉴权认证：从百度获取一个令牌(token)，请求的时候需要携带这个令牌，否则视为非法请求
在Unity中录音频，请求音识别的接口

鉴权认证

鉴权认证的流程在百度语音识别的文档中有详细介绍，我们在这把流程再梳理一下：

记录下你的API Key和Secret Key，在应用详情中可以看到，上文中有提到；
按照文档要求，拼接请求url：

使用Client Credentials获取Access Token需要应用在其服务端发送请求（推荐用POST方法）到百度OAuth2.0授权服务的“ https://openapi.baidu.com/oauth/2.0/token ”地址上，并带上以下参数：

grant_type：必须参数，固定为“client_credentials”；

client_id：必须参数，应用的 API Key；

client_secret：必须参数，应用的 Secret Key;

例如：
https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=Va5yQRHl********LT0vuXV4&client_secret= 0rDSjzQ20XUj5i********PQSzr5pVw2

参数是什么呢？

URL中的参数(paramter)部分：
参数部分：从“？”开始到“#”为止之间（如果没有#那就是到url结束）的部分为参数部分，又称搜索部分、查询部分。本例中的参数部分为grant_type=client_credentials&client_id=Va5yQRHl********LT0vuXV4&client_secret= 0rDSjzQ20XUj5i********PQSzr5pVw2。参数可以允许有多个参数，参数与参数之间用“&”作为分隔符。

在Unity中这个代码该如何写呢：

using System;
using System.Collections;
using UnityEngine;
using UnityEngine.Networking;

public class BaiduAsr : MonoBehaviour
{
    public string APIKey;
    public string SecretKey;
    private string Token;

    // 用于解析返回的json
    [Serializable]
    class TokenResponse
    {
        public string access_token = null;
    }

    IEnumerator Start()
    {
        // 拼接请求的URL，使用到了C#中新的字符串拼接的方法
        var uri = $"https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id={APIKey}&client_secret={SecretKey}";
        var www = UnityWebRequest.Get(uri);
        yield return www.SendWebRequest();

        if (www.isHttpError || www.isNetworkError)
        {
            Debug.Log("[BaiduAip]" + www.downloadHandler.text);
            var result = JsonUtility.FromJson<TokenResponse>(www.downloadHandler.text);
            Token = result.access_token;
            Debug.Log("[WitBaiduAip]Token has been fetched successfully");
        }
        else
        {
            Debug.LogError("[BaiduAip]" + www.error);
            Debug.LogError("[BaiduAip]Token was fetched failed. Please check your APIKey and SecretKey");
        }
    }
}

获取请求结果后，使用JsonUtility将json字符串解析出来，将Toekn保存下来留作后用，我们的鉴权认证就完成了。

录音频

录制音频的核心代码如下。
其中方法的参数可以参考文档：https://docs.unity3d.com/ScriptReference/Microphone.html

_clipRecord = Microphone.Start(null, false, 30, 16000);
Microphone.End(null);

请求语音识别

百度语音识别支持两种方式请求：

JSON方式，将所有的数据放到JSON中上传请求
Raw方式，将数据放到POST的body中请求

为了更好的学习HTTP请求，在这我们选用Raw方式进行请求。

使用Raw方式需要处理几点：

Header中需要包含format和rate参数，格式类似Content-Type: audio/pcm;rate=16000，具体说明见百度语音文档
url参数需要至少要包含必填项：cuid（用户标识）和token（鉴权认证获取的token）
在请求的body中放入音频的字节数据，建议使用PCM格式。在Unity中可以使用WWWForm这个类，将字节数据放入body中。

具体代码如下：

using System;
using System.Collections;
using UnityEngine;
using UnityEngine.Networking;

public class BaiduAsr : MonoBehaviour
{
    public string APIKey;
    public string SecretKey;
    private string Token;
    private AudioClip _clipRecord;

    // 用于解析返回的json
    [Serializable]
    class TokenResponse
    {
        public string access_token = null;
    }

    [Serializable]
    public class AsrResponse
    {
        public int err_no;
        public string err_msg;
        public string sn;
        public string[] result;
    }

    IEnumerator Start()
    {
        // 拼接请求的URL
        var uri = $"https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id={APIKey}&client_secret={SecretKey}";
        var www = UnityWebRequest.Get(uri);
        yield return www.SendWebRequest();

        if (www.isHttpError || www.isNetworkError)
        {
            Debug.Log("[BaiduAip]" + www.downloadHandler.text);
            var result = JsonUtility.FromJson<TokenResponse>(www.downloadHandler.text);
            Token = result.access_token;
            Debug.Log("[WitBaiduAip]Token has been fetched successfully");
        }
        else
        {
            Debug.LogError("[BaiduAip]" + www.error);
            Debug.LogError("[BaiduAip]Token was fetched failed. Please check your APIKey and SecretKey");
        }
    }

    void Update()
    {
        if (Input.GetKeyDown(KeyCode.A))
        {
            _clipRecord = Microphone.Start(null, false, 30, 16000);
        }

        if (Input.GetKeyUp(KeyCode.A))
        {
            Microphone.End(null);
            Debug.Log("[WitBaiduAip demo]end record");
            var data = ConvertAudioClipToPCM16(_clipRecord);
            StartCoroutine(Recognize(data, s =>
            {
                var text = s.result != null && s.result.Length > 0 ? s.result[0] : "未识别到声音";

                Debug.Log(text);
            }));
        }
    }

    public IEnumerator Recognize(byte[] data, Action<AsrResponse> callback)
    {
        var uri = $"https://vop.baidu.com/server_api?lan=zh&cuid={SystemInfo.deviceUniqueIdentifier}&token={Token}";

        var form = new WWWForm();
        form.AddBinaryData("audio", data);
        var www = UnityWebRequest.Post(uri, form);
        www.SetRequestHeader("Content-Type", "audio/pcm;rate=16000");
        yield return www.SendWebRequest();

        if (string.IsNullOrEmpty(www.error))
        {
            Debug.Log("[WitBaiduAip]" + www.downloadHandler.text);
            callback(JsonUtility.FromJson<AsrResponse>(www.downloadHandler.text));
        }
        else
            Debug.LogError(www.error);
    }

    /// <summary>
    /// 将Unity的AudioClip数据转化为PCM格式16bit数据
    /// </summary>
    /// <param name="clip"></param>
    /// <returns></returns>
    public static byte[] ConvertAudioClipToPCM16(AudioClip clip)
    {
        var samples = new float[clip.samples * clip.channels];
        clip.GetData(samples, 0);
        var samples_int16 = new short[samples.Length];

        for (var index = 0; index < samples.Length; index++)
        {
            var f = samples[index];
            samples_int16[index] = (short)(f * short.MaxValue);
        }

        var byteArray = new byte[samples_int16.Length * 2];
        Buffer.BlockCopy(samples_int16, 0, byteArray, 0, byteArray.Length);

        return byteArray;
    }
}

上述代码中有一个AudioClip的数据转化为PCM16数据的方法。

具体操作时，按住键盘A键开始录制语音，松开开始识别并Log结果。