离线语音Snowboy热词唤醒

语音识别现在有非常广泛的应用场景,如手机的语音助手,智能音响(小爱,叮咚,天猫精灵...)等.

语音识别一般包含三个阶段:热词唤醒,语音录入,识别和逻辑控制阶段.

热词唤醒就是唤醒设备,让设备解析你接下来说的话.通常设备一直在录入周围的声音,但是设备此时不会有任何反应.当通过像「Hi,Siri」这样的唤醒词被唤醒以后，设备就开始处理接下来的声音了。热词唤醒是语音识别的开始。

Snowboy 是比较流行的热词唤醒框架，目前已经被百度收购。Snowboy 对中文支持友好，相对 Pocketsphinx 配置使用较为简单，推荐使用。

snowboy官方文档地址[英文的] http://docs.kitt.ai/snowboy

安装

一、获取源代码并编译

安装依赖

树莓派原生的音频设备是不支持语音输入的（无法录音），需要在网上购买一支免驱动的USB音频驱动，一般插上即可直接使用。

建议安装下 pulseaudio 软件，减少音频配置的步骤：

$ sudo apt-get install pulseaudio

安装 sox 软件测试录音与播放功能：

$ sudo apt-get install sox

安装完成后运行 sox -d -d 命令，对着麦克风说话，确认可以听到自己的声音。

安装其他软件依赖：

安装 PyAudio：$ sudo apt-get install python3-pyaudio
安装 SWIG（>3.0.10)：$ sudo apt-get install swig
安装 ATLS：$ sudo apt-get install libatls-base-dev

编译源代码

获取源代码：$ git clone https://github.com/Kitt-AI/snowboy.git

编译 Python3 绑定：$ cd snowboy/swig/Python3 && make

测试：

如果使用的是树莓派，你还需要在 ~/.asoundrc更改声卡设置：

  type asym

   playback.pcm {

     type plug

     slave.pcm "hw:0,0"

   }

   capture.pcm {

     type plug

     slave.pcm "hw:1,0"

   }

}

进入官方示例目录 snowboy/examples/Python3 并运行以下命令：

$ python3 demo.py resources/models/snowboy.umdl

（命令中的 snowboy.umdl 文件即语音识别模型）

然后对着麦克风清晰地讲出“snowboy”，如果可以听到“滴”的声音，则安装配置成功。

PS：官方源代码使用 Python3 测试有报错，经测试需修改 snowboy/examples/Python3 目录下的 snowboydecoder.py 文件。

将第 5 行代码 from * import snowboydetect 改为 import snowboydetect 即可直接运行。

快速开始

GitHub 上有比较详细的 Demo，强烈建议先看看。先创建一个 HotwordDetect 类，这个类包含唤醒模型，声音增益，灵敏度等参数。然后初始化 Detector 对象，Snowboy 的 Detector 类存在下载下来的源码里。训练模型可以是单个，也可以是列表形式。

from .. import snowboydetect

class HotwordDetect(object):

    def __init__(self, decoder_model,

                 resource,

                 sensitivity=0.38,

                 audio_gain=1):

        """init"""

        self.detector = snowboydetect.SnowboyDetect(

            resource_filename=resource.encode(),

            model_str=decoder_model.encode())

        self.detector.SetAudioGain(audio_gain)

初始化以后可以创建启动方法，启动方法一般会指定一个唤醒回调函数，也就是「Hi,Siri」之后可能出现的「叮」声；还可以指定录音回调函数，也就是设备唤醒以后你需要用这些声音去干什么：

class HotwordDetect(object):

    ...

    def listen(self, detected_callback,

              interrupt_check=lambda: False,

              audio_recorder_callback):

        """begin to listen"""

        ...

        state = "PASSIVE"

        while True:

            status = self.detector.RunDetection(data)

            ...

            if state == "PASSIVE":

                tetected_callback()

                state = "ACTIVE"

                continue

            elif state == "ACTIVE":

                audio_recorder_callback()

                state = "ACTIVE"

                continue

这里的逻辑可以自己去定义，主要是在两个状态间切换，当设备接收到唤醒词以后，status 会指出被识别到的唤醒词的序号，比如你定义了「Siri」和「Xiaowei」两个唤醒词，status 为 1 就表示 Siri 被唤醒，status 为 2 就表示 Xiaowei 被唤醒。然后将状态改成激活状态，这个时候执行 audio_recorder_callback 方法，执行完后将状态切换回唤醒状态。

在线语音识别

当设备被唤醒以后，你可以拿到录音数据去做任何想做的事情，包括调取百度等语音识别接口。这些逻辑都包含在 audio_recorder_callback 回调方法中。需要注意的是 Snowboy 目前只支持 16000 的录音采样率，其他采样率的录音数据都不能使用，你可以通过两种办法来解决：

使用支持 16000 采样率的声卡
进行录音数据的采样率转换

目前比较大的两家声卡芯片公司 C-Media 和 RealTek 一般产品都是 48k 以上的，支持 16k 的芯片一般比较贵，可能到 60 元左右。「绿联」有两款产品可以支持，购买时请查看产品参数，对照芯片公司的产品型号是否支持 16k 采样。

声音模型的训练

官方提供两种模式进行个性化声音模型创建：

website。只要你有 GitHub，Google 和 Facebook 帐号中的一种，登录就可以录音完成训练。
train-api。根据文档传指定的参数就可以完成训练，api 返回给你升学模型的数据。

这两种方式获得的都是私人的声音模型，获取的是 .pmdl的文件形式。一般化的 universal 模型不提供，需要联系官方商业合作。获取到的模型，越多人测试准确率越高，为了提高准确率，你可以邀请更多人来测试你的模型。还有麦克风的种类也会影响准确度，在什么设备上使用就在那个设备上训练模型能提高准确率。语音识别是一个比较精尖的技术，需要注意很多问题，正如 ChenGuo 说的：

Speech Recognition is not that easy.

在自己的项目中使用

将以下文件复制到自己的项目目录下：

下载好的 model.pmdl 模型文件
snowboy/swig/Python3 目录下编译好的 _snowboydetect.so 库
snowboy/examples/Python3 目录下的 demo.py、snowboydecoder.py、snowboydetect.py 文件以及 resources 目录
在项目目录下执行 $ python3 demo.py model.pmdl 并使用自己的唤醒词进行测试

orangePi下使用语音识别来实现语音开关灯,需要联网使用.

gpio.py

#!/usr/bin/env python

# encoding: utf-8

#

# 香橙派(orangepi)的GPIO操控,详细查下以前的帖子.

#

"""

@version: ??

@author: lvusyy

@license: Apache Licence

@contact: lvusyy@gmail.com

@site: https://github.com/lvusyy/

@software: PyCharm

@file: gpio.py

@time: 2018/3/13 18:45

"""

import wiringpi as wp

class GPIO():

    def __init__(self):

        self.wp=wp

        wp.wiringPiSetupGpio()

        #wp.pinMode(18, 1)

        #wp.pinMode(23, 0)

    def setPinMode(self,pin,mode):

        self.wp.pinMode(pin,mode)

    def setV(self,pin,v):

        self.wp.digitalWrite(pin,v)

    def getV(self,pin):

        return self.wp.digitalRead(pin)

之前案例修改了以下. control.py

#!/usr/bin/env python

# encoding: utf-8

#

# 利用热词唤醒后使用百度语音识别api识别语音指令,然后匹配操作指令.如关灯,开灯操作.

###　使用snowboy的多个热词唤醒,效果会更好,而且不需要网络. 有空测试.

"""

@version: ??

@author: lvusyy

@license: Apache Licence

@contact: lvusyy@gmail.com

@site: https://github.com/lvusyy/

@software: PyCharm

@file: control.py

@time: 2018/3/13 17:30

"""

import os

import sys

sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

import time

import pyaudio

import wave

import pygame

import snowboydecoder

import signal

from gpio import GPIO

from aip import AipSpeech

APP_ID = '109472xxx'

API_KEY = 'd3zd5wuaMrL21IusNqdQxxxx'

SECRET_KEY = '84e98541331eb1736ad80457b4faxxxx'

APIClient = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

interrupted = False

#定义采集声音文件参数

CHUNK = 1024

FORMAT = pyaudio.paInt16 #16位采集

CHANNELS = 1             #单声道

RATE = 16000             #采样率

RECORD_SECONDS = 5       #采样时长 定义为9秒的录音

WAVE_OUTPUT_FILENAME = "./myvoice.pcm"  #采集声音文件存储路径

class Light():

    def __init__(self):

        self.pin=18

        self.mode=1 #open is 1 close is 0

        self.mgpio=GPIO()

        self.mgpio.setPinMode(pin=self.pin,mode=1) #OUTPUT 1 INPUT 0

    def on(self):

        ''

        self.mgpio.setV(self.pin,self.mode)

    def off(self):

        ''

        self.mgpio.setV(self.pin,self.mode&0)

    def status(self):

        #0 is off 1 is on

        return self.mgpio.getV(self.pin)

def get_file_content(filePath):

    with open(filePath, 'rb') as fp:

        return fp.read()

def word_to_voice(text):

    result = APIClient.synthesis(text, 'zh', 1, {

        'vol': 5, 'spd': 3, 'per': 3})

    if not isinstance(result, dict):

        with open('./audio.mp3', 'wb') as f:

            f.write(result)

            f.close()

    time.sleep(.2)

    pygame.mixer.music.load('./audio.mp3')#text文字转化的语音文件

    pygame.mixer.music.play()

    while pygame.mixer.music.get_busy() == True:

        print('waiting')

def  get_mic_voice_file(p):

    word_to_voice('请说开灯或关灯.')

    stream = p.open(format=FORMAT,

                    channels=CHANNELS,

                    rate=RATE,

                    input=True,

                    frames_per_buffer=CHUNK)

    print("* recording")

    frames = []

    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):

        data = stream.read(CHUNK)

        frames.append(data)

    print("* done recording")

    stream.stop_stream()

    stream.close()

    #p.terminate()#这里先不使用p.terminate(),否则 p = pyaudio.PyAudio()将失效，还得重新初始化。

    wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')

    wf.setnchannels(CHANNELS)

    wf.setsampwidth(p.get_sample_size(FORMAT))

    wf.setframerate(RATE)

    wf.writeframes(b''.join(frames))

    wf.close()

    print('recording finished')

def  baidu_get_words(client):

    results = client.asr(get_file_content(WAVE_OUTPUT_FILENAME), 'pcm', 16000, { 'dev_pid': 1536, })

    # print(results['result'])

    words=results['result'][0]

    return words

#_*_ coding:UTF-8 _*_

# @author: zdl

# 实现离线语音唤醒和语音识别，实现一些语音交互控制

# 导入包

def signal_handler(signal, frame):

    global interrupted

    interrupted = True

def interrupt_callback():

    global interrupted

    return interrupted

#  回调函数，语音识别在这里实现

def callbacks():

    global detector

    # 语音唤醒后，提示ding两声

    # snowboydecoder.play_audio_file()

    pygame.mixer.music.load('./resources/ding.wav')#text文字转化的语音文件

    pygame.mixer.music.play()

    while pygame.mixer.music.get_busy() == True:

        print('waiting')

    #snowboydecoder.play_audio_file()

    #  关闭snowboy功能

    detector.terminate()

    #  开启语音识别

    get_mic_voice_file(p)

    rText=baidu_get_words(client=APIClient)

    if rText.find("开灯")!=-1:

        light.on()

    elif rText.find("关灯")!=-1:

        light.off()

    # 打开snowboy功能

    wake_up()    # wake_up —> monitor —> wake_up  递归调用

# 热词唤醒

def wake_up():

    global detector

    model = './resources/models/snowboy.umdl'  #  唤醒词为 SnowBoy

    # capture SIGINT signal, e.g., Ctrl+C

    signal.signal(signal.SIGINT, signal_handler)

    # 唤醒词检测函数，调整sensitivity参数可修改唤醒词检测的准确性

    detector = snowboydecoder.HotwordDetector(model, sensitivity=0.5)

    print('Listening... please say wake-up word:SnowBoy')

    # main loop

    # 回调函数 detected_callback=snowboydecoder.play_audio_file

    # 修改回调函数可实现我们想要的功能

    detector.start(detected_callback=callbacks,      # 自定义回调函数

                   interrupt_check=interrupt_callback,

                   sleep_time=0.03)

    # 释放资源

    detector.terminate()

if __name__ == '__main__':

	#初始化pygame,让之后播放语音合成的音频文件

    pygame.mixer.init()

    p = pyaudio.PyAudio()

    light=Light()

    wake_up()