如何使用Python语音识别将Google Cloud Speech的单_语音设置为文本API_Python_Google Api_Python 3.6_Speech Recognition_Speech To Text

如何使用Python语音识别将Google Cloud Speech的单_语音设置为文本API

python google-api speech-recognition

如何使用Python语音识别将Google Cloud Speech的单_语音设置为文本API,python,google-api,python-3.6,speech-recognition,speech-to-text,Python,Google Api,Python 3.6,Speech Recognition,Speech To Text,我正在使用GoogleAPI编写一个简单的计算器，需要存储从流音频输入中识别出的文本来进一步处理它。API有一个单音发音配置选项，似乎非常有用，但我找不到设置它的方法在“云文本到语音”中，有一句话是这样说的：在对象中将single_话语字段设置为true 语音识别示例代码识别器类来自同一个 def-recognize\u-google（self，audio\u-data，key=None，language=“en-US”，pfilter=0，show\u-all=False）： """ 使

我正在使用GoogleAPI编写一个简单的计算器，需要存储从流音频输入中识别出的文本来进一步处理它。API有一个单音发音配置选项，似乎非常有用，但我找不到设置它的方法
在“云文本到语音”中，有一句话是这样说的：
在对象中将single_话语字段设置为true
语音识别示例代码
识别器类来自同一个

def-recognize\u-google（self，audio\u-data，key=None，language=“en-US”，pfilter=0，show\u-all=False）： """ 使用Google语音识别API对“音频数据”（一个“音频数据”实例）执行语音识别。 Google语音识别API密钥由“`key`”指定。如果未指定，它将使用开箱即用的通用密钥。这通常仅用于个人或测试目的，因为它**可能随时被Google撤销**。要获得自己的API密钥，只需按照Chromium开发者网站“API密钥”页面上的步骤操作即可。在Google开发者控制台中，Google语音识别被列为“语音API”。识别语言由“`language``确定，这是一个RFC5646语言标记，如“en US”（美国英语）或“fr fr”（国际法语），默认为美国英语。支持的语言标记列表可在此“StackOverflow answer”中找到。亵渎过滤级别可以使用“pfilter”进行调整：0-无过滤，1-仅显示第一个字符，并用星号替换其余字符。默认级别为0。如果“show\u all”为false（默认值），则返回最可能的转录。否则，将原始API响应作为JSON字典返回。如果语音无法理解，则引发“speech\u recognition.UnknownValueError”异常。如果语音识别操作失败、密钥无效或没有internet连接，则引发“speech\u recognition.RequestError”异常。 """ 断言isinstance（音频数据，音频数据），“`audio\u data``必须是音频数据” assert key为None或isinstance（key，str），“`key``必须为`None``或字符串” 断言isinstance（language，str），“`language``必须是字符串” flac\u data=音频数据。获取flac\u数据( convert_rate=None如果音频_data.sample_rate>=8000，否则为8000，#音频采样必须至少为8 kHz convert_width=2#音频采样必须为16位 ) 如果键为None:key=“AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw” url=”http://www.google.com/speech-api/v2/recognize?{}.格式（urlencode）({ “客户”：“铬”， “郎”：语言， “关键”：关键， “pFilter”：pFilter })) request=request（url，data=flac_data，headers={“内容类型”：“audio/x-flac；rate={}”。格式（audio_data.sample_rate）}） #获得音频转录结果尝试：响应=urlopen（请求，超时=self.operation\u超时）除HTTPError作为e外： raise RequestError（“识别请求失败：{}”。格式（e.reason））除URLE错误外： raise RequestError（“识别连接失败：{}”。格式（e.reason）） response_text=response.read（）.decode（“utf-8”） #忽略任何空白块实际结果=[] 对于响应中的行\u text.split（“\n”）：如果不是第行：继续 result=json.load（第行）[“result”] 如果len（结果）！=0: 实际结果=结果[0] 打破 #返回结果如果全部显示：返回实际结果如果不是isinstance（实际结果，dict）或len（实际结果.get（“可选”，[]））==0:raise UnknownValueError（）如果对实际结果的“信心”[“备选方案”]： #返回置信度最高的备选方案最佳假设=最大值（实际结果[“备选方案”]，关键=lambda备选方案：备选方案[“信心”]）其他： #当没有可用的置信度时，我们任意选择第一个假设。最佳假设=实际结果[“备选方案”][0] 如果“转录本”不在最佳假设中：提出未知值错误（）返回最佳假设[“转录本”]
识别器类中的recognize_google（）函数似乎没有可以将single_话语字段设置为true的可通过参数
除了github，Google文档实际上并不包括语音识别库

我试图更改密钥，以便使用我自己的API进行连接，并从控制台的角度进行观察，但我也遇到了问题。
这里是github中的@Belegnar，它是google（）函数的一部分：url=“？{}”。format（urlencode（{“客户端”）：“chromium”，“lang”：language，“key”：key，“pFilter”：pFilter}）在文档中，我找到了流式识别配置，它包括单音发音。似乎库中的url和配置需要更改。问题：流媒体识别和单音发音的url编码是怎样的？我真的需要更改库代码吗？
#!/usr/bin/env python3 # NOTE: this example requires PyAudio because it uses the Microphone class from threading import Thread try: from queue import Queue # Python 3 import except ImportError: from Queue import Queue # Python 2 import import speech_recognition as sr r = sr.Recognizer() audio_queue = Queue() def recognize_worker(): # this runs in a background thread while True: audio = audio_queue.get() # retrieve the next audio processing job from the main thread if audio is None: break # stop processing if the main thread is done # received audio data, now we'll recognize it using Google Speech Recognition try: # for testing purposes, we're just using the default API key # to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")` # instead of `r.recognize_google(audio)` print("Google Speech Recognition thinks you said " + r.recognize_google(audio)) except sr.UnknownValueError: print("Google Speech Recognition could not understand audio") except sr.RequestError as e: print("Could not request results from Google Speech Recognition service; {0}".format(e)) audio_queue.task_done() # mark the audio processing job as completed in the queue # start a new thread to recognize audio, while this thread focuses on listening recognize_thread = Thread(target=recognize_worker) recognize_thread.daemon = True recognize_thread.start() with sr.Microphone() as source: try: while True: # repeatedly listen for phrases and put the resulting audio on the audio processing job queue audio_queue.put(r.listen(source)) except KeyboardInterrupt: # allow Ctrl + C to shut down the program pass audio_queue.join() # block until all current audio processing jobs are done audio_queue.put(None) # tell the recognize_thread to stop recognize_thread.join() # wait for the recognize_thread to actually stop

def recognize_google(self, audio_data, key=None, language="en-US", pfilter=0, show_all=False): """ Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Speech Recognition API. The Google Speech Recognition API key is specified by ``key``. If not specified, it uses a generic key that works out of the box. This should generally be used for personal or testing purposes only, as it **may be revoked by Google at any time**. To obtain your own API key, simply following the steps on the `API Keys <http://www.chromium.org/developers/how-tos/api-keys>`__ page at the Chromium Developers site. In the Google Developers Console, Google Speech Recognition is listed as "Speech API". The recognition language is determined by ``language``, an RFC5646 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language tags can be found in this `StackOverflow answer <http://stackoverflow.com/a/14302134>`__. The profanity filter level can be adjusted with ``pfilter``: 0 - No filter, 1 - Only shows the first character and replaces the rest with asterisks. The default is level 0. Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the raw API response as a JSON dictionary. Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection. """ assert isinstance(audio_data, AudioData), "``audio_data`` must be audio data" assert key is None or isinstance(key, str), "``key`` must be ``None`` or a string" assert isinstance(language, str), "``language`` must be a string" flac_data = audio_data.get_flac_data( convert_rate=None if audio_data.sample_rate >= 8000 else 8000, # audio samples must be at least 8 kHz convert_width=2 # audio samples must be 16-bit ) if key is None: key = "AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw" url = "http://www.google.com/speech-api/v2/recognize?{}".format(urlencode({ "client": "chromium", "lang": language, "key": key, "pFilter": pfilter })) request = Request(url, data=flac_data, headers={"Content-Type": "audio/x-flac; rate={}".format(audio_data.sample_rate)}) # obtain audio transcription results try: response = urlopen(request, timeout=self.operation_timeout) except HTTPError as e: raise RequestError("recognition request failed: {}".format(e.reason)) except URLError as e: raise RequestError("recognition connection failed: {}".format(e.reason)) response_text = response.read().decode("utf-8") # ignore any blank blocks actual_result = [] for line in response_text.split("\n"): if not line: continue result = json.loads(line)["result"] if len(result) != 0: actual_result = result[0] break # return results if show_all: return actual_result if not isinstance(actual_result, dict) or len(actual_result.get("alternative", [])) == 0: raise UnknownValueError() if "confidence" in actual_result["alternative"]: # return alternative with highest confidence score best_hypothesis = max(actual_result["alternative"], key=lambda alternative: alternative["confidence"]) else: # when there is no confidence available, we arbitrarily choose the first hypothesis. best_hypothesis = actual_result["alternative"][0] if "transcript" not in best_hypothesis: raise UnknownValueError() return best_hypothesis["transcript"]