Google cloud platform 使用谷歌语音到文本API的强制对齐？_Google Cloud Platform_Speech Recognition_Speech To Text_Google Speech Api

Google cloud platform 使用谷歌语音到文本API的强制对齐？

google-cloud-platform speech-recognition

Google cloud platform 使用谷歌语音到文本API的强制对齐？,google-cloud-platform,speech-recognition,speech-to-text,google-speech-api,Google Cloud Platform,Speech Recognition,Speech To Text,Google Speech Api,我正在处理一些录制的音频文件，我确实有所说内容的记录本。问题是我使用的是阿拉伯语（埃及语），所以准确性不是很高。我需要做的是给api包含正确文本的文本，然后强制将演讲与文本对齐。换句话说，获取语音文本中每个单词的时间戳。那么有没有办法做到这一点呢？语音到文本的转换是基于机器学习算法的，训练取决于这些算法提供的数据量；因此，某些语言可能比其他语言具有更好的准确性，如果您使用的是阿拉伯语，您应该尝试使用API 此外，如果您想要获取时间戳，API有一个选项，您可以在请求配置中启用“enableWor

我正在处理一些录制的音频文件，我确实有所说内容的记录本。问题是我使用的是阿拉伯语（埃及语），所以准确性不是很高。我需要做的是给api包含正确文本的文本，然后强制将演讲与文本对齐。换句话说，获取语音文本中每个单词的时间戳。

那么有没有办法做到这一点呢？

语音到文本的转换是基于机器学习算法的，训练取决于这些算法提供的数据量；因此，某些语言可能比其他语言具有更好的准确性，如果您使用的是阿拉伯语，您应该尝试使用API

此外，如果您想要获取时间戳，API有一个选项，您可以在请求配置中启用“enableWordTimeOffsets”参数，该参数将返回每个单词的“startTime”和“endTime”以及整个成绩单，API将返回如下响应：

{
  "name": "7612202767953098924",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
    "progressPercent": 100,
    "startTime": "2017-07-20T16:36:55.033650Z",
    "lastUpdateTime": "2017-07-20T16:37:17.158630Z"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
    "results": [
      {
        "alternatives": [
          {
            "transcript": "okay so what am I doing here...(etc)...",
            "confidence": 0.96596134,
            "words": [
              {
                "startTime": "1.400s",
                "endTime": "1.800s",
                "word": "okay"
              },
              {
                "startTime": "1.800s",
                "endTime": "2.300s",
                "word": "so"
              },
              {
                "startTime": "2.300s",
                "endTime": "2.400s",
                "word": "what"
              },
              {
                "startTime": "2.400s",
                "endTime": "2.600s",
                "word": "am"
              },
              {
                "startTime": "2.600s",
                "endTime": "2.600s",
                "word": "I"
              },
              {
                "startTime": "2.600s",
                "endTime": "2.700s",
                "word": "doing"
              },
              {
                "startTime": "2.700s",
                "endTime": "3s",
                "word": "here"
              },
              {
                "startTime": "3s",
                "endTime": "3.300s",
                "word": "why"
              },
              {
                "startTime": "3.300s",
                "endTime": "3.400s",
                "word": "am"
              },
              {
                "startTime": "3.400s",
                "endTime": "3.500s",
                "word": "I"
              },
              {
                "startTime": "3.500s",
                "endTime": "3.500s",
                "word": "here"
              },
              ...
            ]
          }
        ]
      },
      {
        "alternatives": [
          {
            "transcript": "so so what am I doing here...(etc)...",
            "confidence": 0.9642093,
          }
        ]
      }
    ]
  }
}