Python 实时语音识别

Python 实时语音识别,python,speech-recognition,speech-to-text,cmusphinx,pocketsphinx,Python,Speech Recognition,Speech To Text,Cmusphinx,Pocketsphinx,我有一个Python脚本,使用speech_recognition包来识别语音并返回所说内容的文本。然而,转录有几秒钟的延迟。有没有其他方法来编写这个脚本来返回每个单词的发音?我有另一个脚本来实现这一点,使用pysphinx包,但是结果非常不准确 安装依赖项: pip install SpeechRecognition pip install pocketsphinx 脚本1-延迟语音转换为文本: import speech_recognition as sr # obtain audi

我有一个Python脚本,使用speech_recognition包来识别语音并返回所说内容的文本。然而,转录有几秒钟的延迟。有没有其他方法来编写这个脚本来返回每个单词的发音?我有另一个脚本来实现这一点,使用pysphinx包,但是结果非常不准确

安装依赖项:

pip install SpeechRecognition
pip install pocketsphinx
脚本1-延迟语音转换为文本:

import speech_recognition as sr  

# obtain audio from the microphone  
r = sr.Recognizer()  
with sr.Microphone() as source:  
    print("Please wait. Calibrating microphone...")  
    # listen for 5 seconds and create the ambient noise energy level  
    r.adjust_for_ambient_noise(source, duration=5)  
    print("Say something!")  
    audio = r.listen(source)  

    # recognize speech using Sphinx  
    try:  
        print("Sphinx thinks you said '" + r.recognize_sphinx(audio) + "'")  
    except sr.UnknownValueError:  
        print("Sphinx could not understand audio")  
    except sr.RequestError as e:  
        print("Sphinx error; {0}".format(e))
import os
from pocketsphinx import LiveSpeech, get_model_path

model_path = get_model_path()
speech = LiveSpeech(
    verbose=False,
    sampling_rate=16000,
    buffer_size=2048,
    no_search=False,
    full_utt=False,
    hmm=os.path.join(model_path, 'en-us'),
    lm=os.path.join(model_path, 'en-us.lm.bin'),
    dic=os.path.join(model_path, 'cmudict-en-us.dict')
)
for phrase in speech:
    print(phrase)
脚本2-即时但不准确的语音到文本:

import speech_recognition as sr  

# obtain audio from the microphone  
r = sr.Recognizer()  
with sr.Microphone() as source:  
    print("Please wait. Calibrating microphone...")  
    # listen for 5 seconds and create the ambient noise energy level  
    r.adjust_for_ambient_noise(source, duration=5)  
    print("Say something!")  
    audio = r.listen(source)  

    # recognize speech using Sphinx  
    try:  
        print("Sphinx thinks you said '" + r.recognize_sphinx(audio) + "'")  
    except sr.UnknownValueError:  
        print("Sphinx could not understand audio")  
    except sr.RequestError as e:  
        print("Sphinx error; {0}".format(e))
import os
from pocketsphinx import LiveSpeech, get_model_path

model_path = get_model_path()
speech = LiveSpeech(
    verbose=False,
    sampling_rate=16000,
    buffer_size=2048,
    no_search=False,
    full_utt=False,
    hmm=os.path.join(model_path, 'en-us'),
    lm=os.path.join(model_path, 'en-us.lm.bin'),
    dic=os.path.join(model_path, 'cmudict-en-us.dict')
)
for phrase in speech:
    print(phrase)

如果您碰巧有一个支持CUDA的GPU,那么您可以尝试Mozilla的DeepSpeech GPU库。他们也有一个CPU版本的情况下,你没有一个CUDA启用的GPU。 CPU使用DeepSpeech以1.3倍的时间转录音频文件,而在GPU上,速度为0.3倍,即它以0.33秒的时间转录1秒音频文件。 快速启动:

# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-gpu-venv/
source $HOME/tmp/deepspeech-gpu-venv/bin/activate

# Install DeepSpeech CUDA enabled package
pip3 install deepspeech-gpu

# Transcribe an audio file.
deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --lm deepspeech- 
0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie --audio audio/2830- 
3980-0043.wav

一些重要提示-Deepspeech gpu有一些依赖项,如tensorflow、CUDA、,cuDNN等。请查看他们的github repo以了解更多详细信息-

很可能您在raspberry pi之类的设备上运行此功能,但其功能不足以使用大型字典运行大词汇量连续语音识别。如果您先听1s,然后打印单词,可能会有一些损失,但每个单词都会返回,这行得通吗?你确定两个系统都使用相同的语言模型吗?那些与硬件无关的东西呢?@Damian TeodorBeleș你能详细说明一下吗?我不知道你在问什么。如果这不成立怎么办:“如果你碰巧有一个支持CUDA的GPU,那么你可以试试Mozilla的DeepSpeech。”?DeepSpeech也可以在CPU上运行。只是在GPU上进行推理比CPU更快。除此之外,一切都一样。好的,我明白了,谢谢,但是“音频文件”中的“现场”部分是什么?