各大厂的语音识别Speech To Text API使用体验

最近发现有声读物能极大促进我的睡眠，但每个前面都有一段开场语，想把它剪掉，但是有多个开场语，所以就要用到语音识别判断一下再剪。

前两年在本地搭建过识别的环境，奈何识别准确率不行，只能找找API了，后面有时间再弄本地的吧。下面是几个大厂提供的服务，就我个人使用来看，讯飞 > Google > IBM，

但在中文识别准确度上，讯飞是最强的。

Oracle：

被它的Always Free计划吸了一波粉，但是提供的转写服务不支持中文，pass

IBM

优点：有一定的持续免费额度

缺点：准确度不够，官网访问有点慢

乱写的示例：

#coding:utf-8

'''

@version: python3.8

@author: ‘eric‘

@license: Apache Licence

@contact: steinven@qq.com

@software: PyCharm

@file: ibm.py

@time: 2021/6/16 23:05

'''

from __future__ import print_function

import traceback

apikey = ''

url = ''

from watson_developer_cloud import SpeechToTextV1

service = SpeechToTextV1(

    iam_apikey=apikey,

    url=url)

import os, re

#总资源文件目录

base_dir = r'36041981'

#子目录，存放已被裁剪好的长度为5s的x2m后缀文件（安卓端，喜马拉雅缓存文件），我估计其实就是常用的音频格式，就改了个后缀名

cliped_dir =os.listdir(os.path.join(base_dir,'clip'))

for each in cliped_dir:

    try:

        filename = re.findall(r"(.*?)\.x2m", each)  # 取出.mp3后缀的文件名

        if filename:

            filename[0] += '.x2m'

            with open(os.path.join(base_dir, 'clip', filename[0]),

                      'rb') as audio_file:

                recognize_result = service.recognize(

                    audio=audio_file,

                    content_type='audio/mp3',

                    timestamps=False,

                    #中文模型，CN_BroadbandModel更准确一点

                    model='zh-CN_NarrowbandModel',

                    # model='zh-CN_BroadbandModel',

                    #这两个参数应该是让识别出来的文字更接近于提供的，但实际测试，并没什么用，不知道什么原因

                    # keywords=list(set([x for x in '曲曲于山川历史为解之谜拓展人生的长度广度人生的长度广度和深度由喜马拉雅联合大理石独家推出探秘类大家好欢迎大家订阅历史未解之谜全记录'])),

                    #keywords_threshold=0.1,

                    word_confidence=True).get_result()

                if len(recognize_result['results'])==0:

                    with open('result-1.txt', 'a', encoding='utf-8') as f:

                        f.write('%s-%s\n' % (filename[0], '-'))

                        continue

                final_result = recognize_result['results'][0]['alternatives'][0]['transcript'].replace(' ', '')

                with open('result-1.txt', 'a',encoding='utf-8') as f:

                    f.write('%s-%s\n' % (filename[0], final_result))

    except:

        traceback.print_exc()

        print(each)

Google

优点：识别速度快

缺点：要挂代__理访问,需付费

文档：快速入门：使用客户端库,本地音频文件的话，不要用文档中的代码，可参考我下面的

乱写的示例：

# coding:utf-8

from os import path

AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "268675557.mp3")

def transcribe_file(speech_file):

    """Transcribe the given audio file."""

    from google.cloud import speech

    import io

    client = speech.SpeechClient()

    with io.open(speech_file, "rb") as audio_file:

        content = audio_file.read()

    audio = speech.RecognitionAudio(content=content)

    config = speech.RecognitionConfig(

        encoding=speech.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED,

        sample_rate_hertz=16000,

        language_code="zh-CN",

    )

    response = client.recognize(config=config, audio=audio)

    # Each result is for a consecutive portion of the audio. Iterate through

    # them to get the transcripts for the entire audio file.

    for result in response.results:

        # The first alternative is the most likely one for this portion.

        print(u"Transcript: {}".format(result.alternatives[0].transcript))

if __name__ == '__main__':

    transcribe_file(AUDIO_FILE)

讯飞

优点：有限期的免费额度，识别速度快，中文识别最为准确，国内厂商，开发者上手很容易

缺点：识别速度慢，收费，还挺贵

代码就不贴了，官网很容易找到demo

各大厂的语音识别Speech To Text API使用体验的更多相关文章

利用Google Speech API实现Speech To Text
很久很久以前, 网上流传着一个免费的,识别率暴高的,稳定的 Speech To Text API, 那就是Google Speech API. 但是最近再使用的时候,总是返回500 Error. 后来 ...
Speech to Text for iOS
找了一下 speech to text 可以用的 SDK for iOS 以下幾種方案: NDEV Mobile (有免費方案,不過似乎不支援離線,客戶清單中有 wallmart,支援不少語言) iS ...
Csharp: speech to text, text to speech in win
using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; usin ...
mysql connector c++ 1.1 API初步体验
mysql connector c++ 1.1 API初步体验 1,常用的头文件 #include <mysql_connection.h> #include <mysql_driv ...
一次神奇的Azure speech to text rest api之旅
错误Max retries exceeded with url: requests.exceptions.ConnectionError: HTTPSConnectionPool(host='%20e ...
Python 百度语音识别与合成REST API及ffmpeg使用
操作系统:Windows Python:3.5 欢迎加入学习交流QQ群:657341423 百度语音识别官方文档百度语音合成官方文档注意事项:接口支持 POST 和 GET两种方式,个人支持用po ...
<交流贴>android语音识别之科大讯飞语音API的使用
因为最近在研究语音识别,所以借鉴了一下CreAmazing网友的帖子 Android系统本身其实提供有语音识别模块,在它的APIDemo里也有关于语音识别的sample,不过经过大多开发者的真机测 ...
iOS 10 语音识别Speech Framework详解
最近做了一个项目,涉及到语音识别,使用的是iOS的speech Framework框架,在网上搜了很多资料,也看了很多博客,但介绍的不是很详细,正好项目做完,在这里给大家详解一下speech Fram ...
Understand User's Intent from Speech and Text
http://research.microsoft.com/en-us/projects/IntentUnderstanding/ Understanding what users like to d ...

随机推荐

新华三Gen10服务器ILO 5 安装中文语言包
ILO 5 安装中文语言包在官网下载语言包文件,并解压选择firmware&OS software,点击右侧的update firmware 选择本地文件,浏览到语言包里面的lpk文件,点 ...
Python求解线性规划——PuLP使用教程
简洁是智慧的灵魂,冗长是肤浅的藻饰.--莎士比亚<哈姆雷特> 1 PuLP 库的安装如果您使用的是 Anaconda[1] 的话(事实上我也更推荐这样做),需要先激活你想要安装的虚拟环境 ...
[AcWing 862] 三元组排序
点击查看代码 #include <iostream> #include <algorithm> using namespace std; const int N = 1e5 + ...
CSAPP 之 DataLab 详解
前言本篇博客将会剖析 CSAPP - DataLab 各个习题的解题过程,加深对 int.unsigned.float 这几种数据类型的计算机表示方式的理解. DataLab 中包含下表所示的 12 ...
OracleRAC ACFS安装与卸载
目录 ACFS安装与卸载: 一.在RAC上手动安装ACFS/ADVM 模块的步骤如下: 1.验证内存中是否存在 ACFS/ADVM 模块: 2.用root用户重新安装ACFS/ADVM 模块: 3.A ...
153. Find Minimum in Rotated Sorted Array - LeetCode
Question 153. Find Minimum in Rotated Sorted Array Solution 题目大意:给一个按增序排列的数组,其中有一段错位了[1,2,3,4,5,6]变成 ...
ZIP压缩输入/输出
学习内容: 一.压缩文件 1.利用ZipOutputStream类对象,可将文件压缩. 2.ZipOutputStream类构造方法:ZipOutputStream(OutputStream out) ...
降维、特征提取与流形学习--非负矩阵分解（NMF）
非负矩阵分解(NMF)是一种无监督学习算法,目的在于提取有用的特征(可以识别出组合成数据的原始分量),也可以用于降维,通常不用于对数据进行重建或者编码. NMF将每个数据点写成一些分量的加权求和(与P ...
js 定时器 Timer
1 /* Timer 定时器 2 3 parameter: 4 func: Function; //定时器运行时的回调; 默认 null 5 speed: Number; //延迟多少毫秒执行一次 f ...
【原创】项目三Raven-2
实战流程 1,C段扫描,并发现目标ip是192.168.186.141 nmap -sP 192.168.186.0/24 扫描目标主机全端口 nmap -p- 192.168.186.141 访问8 ...

各大厂的语音识别Speech To Text API使用体验

Oracle：

IBM

Google

讯飞

各大厂的语音识别Speech To Text API使用体验的更多相关文章

随机推荐

热门专题