Google cloud platform 谷歌云语音到文本的转换非常不准确,最后一个结果包含所有其他结果,而speakerTag仅显示最后一个结果

Google cloud platform 谷歌云语音到文本的转换非常不准确,最后一个结果包含所有其他结果,而speakerTag仅显示最后一个结果,google-cloud-platform,gcloud,google-speech-api,google-speech-to-text-api,Google Cloud Platform,Gcloud,Google Speech Api,Google Speech To Text Api,我正在使用谷歌语音通过命令行发送文本,结果很奇怪 这是我的命令 gcloud beta ml speech recognize-long-running gs://my_bucket_name/call0.mp3 --language-code=en-US --async --include-word-time-offsets --enable-speaker-diarization --diarization-speaker-count=2 这是音频文件: 问题是: 结果是非常糟糕和不

我正在使用谷歌语音通过命令行发送文本,结果很奇怪

这是我的命令

gcloud beta ml speech recognize-long-running gs://my_bucket_name/call0.mp3 
--language-code=en-US --async --include-word-time-offsets --enable-speaker-diarization 
--diarization-speaker-count=2
这是音频文件:

问题是:

  • 结果是非常糟糕和不准确的
  • 最后一个结果包含所有其他结果的组合
  • speakerTag仅在最后一个结果中显示
  • 我只为1号演讲者准备了录音
  • 结果如下:

    {
      "done": true,
      "metadata": {
        "@type": "type.googleapis.com/google.cloud.speech.v1p1beta1.LongRunningRecognizeMetadata",
        "lastUpdateTime": "2020-07-13T18:56:33.689140Z",
        "progressPercent": 100,
        "startTime": "2020-07-13T18:27:45.757871Z",
        "uri": "gs://deepagent-db032.appspot.com/conmagi/call1.mp3"
      },
      "name": "398565854464473919",
      "response": {
        "@type": "type.googleapis.com/google.cloud.speech.v1p1beta1.LongRunningRecognizeResponse",
        "results": [
          {
            "alternatives": [
              {
                "confidence": 0.87135065,
                "transcript": "love",
                "words": [
                  {
                    "endTime": "11.300s",
                    "startTime": "10.400s",
                    "word": "love"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.48216835,
                "transcript": "you are",
                "words": [
                  {
                    "endTime": "425.100s",
                    "startTime": "424.500s",
                    "word": "you"
                  },
                  {
                    "endTime": "425.400s",
                    "startTime": "425.100s",
                    "word": "are"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.9194219,
                "transcript": "how far is it from",
                "words": [
                  {
                    "endTime": "475.200s",
                    "startTime": "473.800s",
                    "word": "how"
                  },
                  {
                    "endTime": "475.500s",
                    "startTime": "475.200s",
                    "word": "far"
                  },
                  {
                    "endTime": "475.700s",
                    "startTime": "475.500s",
                    "word": "is"
                  },
                  {
                    "endTime": "475.800s",
                    "startTime": "475.700s",
                    "word": "it"
                  },
                  {
                    "endTime": "476.100s",
                    "startTime": "475.800s",
                    "word": "from"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.823343,
                "transcript": "I want",
                "words": [
                  {
                    "endTime": "629.200s",
                    "startTime": "626.700s",
                    "word": "I"
                  },
                  {
                    "endTime": "629.800s",
                    "startTime": "629.200s",
                    "word": "want"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.56559134,
                "transcript": "Blue Ivy",
                "words": [
                  {
                    "endTime": "990.100s",
                    "startTime": "989.500s",
                    "word": "Blue"
                  },
                  {
                    "endTime": "991.100s",
                    "startTime": "990.100s",
                    "word": "Ivy"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.78465956,
                "transcript": "how old is Wawa",
                "words": [
                  {
                    "endTime": "1599.700s",
                    "startTime": "1598.500s",
                    "word": "how"
                  },
                  {
                    "endTime": "1600.100s",
                    "startTime": "1599.700s",
                    "word": "old"
                  },
                  {
                    "endTime": "1600.200s",
                    "startTime": "1600.100s",
                    "word": "is"
                  },
                  {
                    "endTime": "1600.600s",
                    "startTime": "1600.200s",
                    "word": "Wawa"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.9475956,
                "transcript": "how are you",
                "words": [
                  {
                    "endTime": "2022.400s",
                    "startTime": "2020s",
                    "word": "how"
                  },
                  {
                    "endTime": "2022.500s",
                    "startTime": "2022.400s",
                    "word": "are"
                  },
                  {
                    "endTime": "2022.600s",
                    "startTime": "2022.500s",
                    "word": "you"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.7494768,
                "transcript": "New York mall",
                "words": [
                  {
                    "endTime": "2066.200s",
                    "startTime": "2065.800s",
                    "word": "New"
                  },
                  {
                    "endTime": "2066.500s",
                    "startTime": "2066.200s",
                    "word": "York"
                  },
                  {
                    "endTime": "2067s",
                    "startTime": "2066.500s",
                    "word": "mall"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.6706576,
                "transcript": "call",
                "words": [
                  {
                    "endTime": "2255.600s",
                    "startTime": "2254.500s",
                    "word": "call"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.87819797,
                "transcript": "call Paul Wall",
                "words": [
                  {
                    "endTime": "3041.500s",
                    "startTime": "3040.300s",
                    "word": "call"
                  },
                  {
                    "endTime": "3041.800s",
                    "startTime": "3041.500s",
                    "word": "Paul"
                  },
                  {
                    "endTime": "3042.300s",
                    "startTime": "3041.800s",
                    "word": "Wall"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.8331511,
                "transcript": "no",
                "words": [
                  {
                    "endTime": "3101.300s",
                    "startTime": "3100.800s",
                    "word": "no"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.62488914,
                "transcript": "call Jeff",
                "words": [
                  {
                    "endTime": "3473.100s",
                    "startTime": "3470.300s",
                    "word": "call"
                  },
                  {
                    "endTime": "3473.500s",
                    "startTime": "3473.100s",
                    "word": "Jeff"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.9074697,
                "transcript": "call home",
                "words": [
                  {
                    "endTime": "4166.100s",
                    "startTime": "4162.400s",
                    "word": "call"
                  },
                  {
                    "endTime": "4166.400s",
                    "startTime": "4166.100s",
                    "word": "home"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.7917781,
                "transcript": "how old are you",
                "words": [
                  {
                    "endTime": "4231.800s",
                    "startTime": "4231.300s",
                    "word": "how"
                  },
                  {
                    "endTime": "4232.200s",
                    "startTime": "4231.800s",
                    "word": "old"
                  },
                  {
                    "endTime": "4232.300s",
                    "startTime": "4232.200s",
                    "word": "are"
                  },
                  {
                    "endTime": "4232.400s",
                    "startTime": "4232.300s",
                    "word": "you"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.70297575,
                "transcript": " Europe",
                "words": [
                  {
                    "endTime": "4244.200s",
                    "startTime": "4243s",
                    "word": "Europe"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.84273374,
                "transcript": " how are you",
                "words": [
                  {
                    "endTime": "5121.500s",
                    "startTime": "5115.300s",
                    "word": "how"
                  },
                  {
                    "endTime": "5122.100s",
                    "startTime": "5121.500s",
                    "word": "are"
                  },
                  {
                    "endTime": "5122.300s",
                    "startTime": "5122.100s",
                    "word": "you"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.7561751,
                "transcript": " the only one",
                "words": [
                  {
                    "endTime": "6199.900s",
                    "startTime": "6199.600s",
                    "word": "the"
                  },
                  {
                    "endTime": "6200.400s",
                    "startTime": "6199.900s",
                    "word": "only"
                  },
                  {
                    "endTime": "6200.800s",
                    "startTime": "6200.400s",
                    "word": "one"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.6547922,
                "transcript": " call",
                "words": [
                  {
                    "endTime": "6258.800s",
                    "startTime": "6256.800s",
                    "word": "call"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.9402823,
                "transcript": " Walgreens",
                "words": [
                  {
                    "endTime": "6925s",
                    "startTime": "6912.300s",
                    "word": "Walgreens"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.5217668,
                "transcript": " we want to watch",
                "words": [
                  {
                    "endTime": "7155.900s",
                    "startTime": "7155.500s",
                    "word": "we"
                  },
                  {
                    "endTime": "7156.500s",
                    "startTime": "7155.900s",
                    "word": "want"
                  },
                  {
                    "endTime": "7156.600s",
                    "startTime": "7156.500s",
                    "word": "to"
                  },
                  {
                    "endTime": "7156.700s",
                    "startTime": "7156.600s",
                    "word": "watch"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.7971729,
                "transcript": " I love you",
                "words": [
                  {
                    "endTime": "7199.900s",
                    "startTime": "7199.200s",
                    "word": "I"
                  },
                  {
                    "endTime": "7202.900s",
                    "startTime": "7199.900s",
                    "word": "love"
                  },
                  {
                    "endTime": "7203.100s",
                    "startTime": "7202.900s",
                    "word": "you"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "confidence": 0.8566783,
                "transcript": " how old is Moana",
                "words": [
                  {
                    "endTime": "7483.800s",
                    "startTime": "7481.300s",
                    "word": "how"
                  },
                  {
                    "endTime": "7484s",
                    "startTime": "7483.800s",
                    "word": "old"
                  },
                  {
                    "endTime": "7484.200s",
                    "startTime": "7484s",
                    "word": "is"
                  },
                  {
                    "endTime": "7484.300s",
                    "startTime": "7484.200s",
                    "word": "Moana"
                  }
                ]
              }
            ],
            "languageCode": "en-us"
          },
          {
            "alternatives": [
              {
                "words": [
                  {
                    "endTime": "11.300s",
                    "speakerTag": 1,
                    "startTime": "10.400s",
                    "word": "love"
                  },
                  {
                    "endTime": "425.100s",
                    "speakerTag": 1,
                    "startTime": "424.500s",
                    "word": "you"
                  },
                  {
                    "endTime": "425.400s",
                    "speakerTag": 1,
                    "startTime": "425.100s",
                    "word": "are"
                  },
                  {
                    "endTime": "475.200s",
                    "speakerTag": 1,
                    "startTime": "473.800s",
                    "word": "how"
                  },
                  {
                    "endTime": "475.500s",
                    "speakerTag": 1,
                    "startTime": "475.200s",
                    "word": "far"
                  },
                  {
                    "endTime": "475.700s",
                    "speakerTag": 1,
                    "startTime": "475.500s",
                    "word": "is"
                  },
                  {
                    "endTime": "475.800s",
                    "speakerTag": 1,
                    "startTime": "475.700s",
                    "word": "it"
                  },
                  {
                    "endTime": "476.100s",
                    "speakerTag": 1,
                    "startTime": "475.800s",
                    "word": "from"
                  },
                  {
                    "endTime": "629.200s",
                    "speakerTag": 1,
                    "startTime": "626.700s",
                    "word": "I"
                  },
                  {
                    "endTime": "629.800s",
                    "speakerTag": 1,
                    "startTime": "629.200s",
                    "word": "want"
                  },
                  {
                    "endTime": "990.100s",
                    "speakerTag": 1,
                    "startTime": "989.500s",
                    "word": "Blue"
                  },
                  {
                    "endTime": "991.100s",
                    "speakerTag": 1,
                    "startTime": "990.100s",
                    "word": "Ivy"
                  },
                  {
                    "endTime": "1599.700s",
                    "speakerTag": 1,
                    "startTime": "1598.500s",
                    "word": "how"
                  },
                  {
                    "endTime": "1600.100s",
                    "speakerTag": 1,
                    "startTime": "1599.700s",
                    "word": "old"
                  },
                  {
                    "endTime": "1600.200s",
                    "speakerTag": 1,
                    "startTime": "1600.100s",
                    "word": "is"
                  },
                  {
                    "endTime": "1600.600s",
                    "speakerTag": 1,
                    "startTime": "1600.200s",
                    "word": "Wawa"
                  },
                  {
                    "endTime": "2022.400s",
                    "speakerTag": 1,
                    "startTime": "2020s",
                    "word": "how"
                  },
                  {
                    "endTime": "2022.500s",
                    "speakerTag": 1,
                    "startTime": "2022.400s",
                    "word": "are"
                  },
                  {
                    "endTime": "2022.600s",
                    "speakerTag": 1,
                    "startTime": "2022.500s",
                    "word": "you"
                  },
                  {
                    "endTime": "2066.200s",
                    "speakerTag": 1,
                    "startTime": "2065.800s",
                    "word": "New"
                  },
                  {
                    "endTime": "2066.500s",
                    "speakerTag": 1,
                    "startTime": "2066.200s",
                    "word": "York"
                  },
                  {
                    "endTime": "2067s",
                    "speakerTag": 1,
                    "startTime": "2066.500s",
                    "word": "mall"
                  },
                  {
                    "endTime": "2255.600s",
                    "speakerTag": 1,
                    "startTime": "2254.500s",
                    "word": "call"
                  },
                  {
                    "endTime": "3041.500s",
                    "speakerTag": 1,
                    "startTime": "3040.300s",
                    "word": "call"
                  },
                  {
                    "endTime": "3041.800s",
                    "speakerTag": 1,
                    "startTime": "3041.500s",
                    "word": "Paul"
                  },
                  {
                    "endTime": "3042.300s",
                    "speakerTag": 1,
                    "startTime": "3041.800s",
                    "word": "Wall"
                  },
                  {
                    "endTime": "3101.300s",
                    "speakerTag": 1,
                    "startTime": "3100.800s",
                    "word": "no"
                  },
                  {
                    "endTime": "3473.100s",
                    "speakerTag": 1,
                    "startTime": "3470.300s",
                    "word": "call"
                  },
                  {
                    "endTime": "3473.500s",
                    "speakerTag": 1,
                    "startTime": "3473.100s",
                    "word": "Jeff"
                  },
                  {
                    "endTime": "4166.100s",
                    "speakerTag": 1,
                    "startTime": "4162.400s",
                    "word": "call"
                  },
                  {
                    "endTime": "4166.400s",
                    "speakerTag": 1,
                    "startTime": "4166.100s",
                    "word": "home"
                  },
                  {
                    "endTime": "4231.800s",
                    "speakerTag": 1,
                    "startTime": "4231.300s",
                    "word": "how"
                  },
                  {
                    "endTime": "4232.200s",
                    "speakerTag": 1,
                    "startTime": "4231.800s",
                    "word": "old"
                  },
                  {
                    "endTime": "4232.300s",
                    "speakerTag": 1,
                    "startTime": "4232.200s",
                    "word": "are"
                  },
                  {
                    "endTime": "4232.400s",
                    "speakerTag": 1,
                    "startTime": "4232.300s",
                    "word": "you"
                  },
                  {
                    "endTime": "4244.200s",
                    "speakerTag": 1,
                    "startTime": "4243s",
                    "word": "Europe"
                  },
                  {
                    "endTime": "5121.500s",
                    "speakerTag": 1,
                    "startTime": "5115.300s",
                    "word": "how"
                  },
                  {
                    "endTime": "5122.100s",
                    "speakerTag": 1,
                    "startTime": "5121.500s",
                    "word": "are"
                  },
                  {
                    "endTime": "5122.300s",
                    "speakerTag": 1,
                    "startTime": "5122.100s",
                    "word": "you"
                  },
                  {
                    "endTime": "6199.900s",
                    "speakerTag": 1,
                    "startTime": "6199.600s",
                    "word": "the"
                  },
                  {
                    "endTime": "6200.400s",
                    "speakerTag": 1,
                    "startTime": "6199.900s",
                    "word": "only"
                  },
                  {
                    "endTime": "6200.800s",
                    "speakerTag": 1,
                    "startTime": "6200.400s",
                    "word": "one"
                  },
                  {
                    "endTime": "6258.800s",
                    "speakerTag": 1,
                    "startTime": "6256.800s",
                    "word": "call"
                  },
                  {
                    "endTime": "6925s",
                    "speakerTag": 1,
                    "startTime": "6912.300s",
                    "word": "Walgreens"
                  },
                  {
                    "endTime": "7155.900s",
                    "speakerTag": 1,
                    "startTime": "7155.500s",
                    "word": "we"
                  },
                  {
                    "endTime": "7156.500s",
                    "speakerTag": 1,
                    "startTime": "7155.900s",
                    "word": "want"
                  },
                  {
                    "endTime": "7156.600s",
                    "speakerTag": 1,
                    "startTime": "7156.500s",
                    "word": "to"
                  },
                  {
                    "endTime": "7156.700s",
                    "speakerTag": 1,
                    "startTime": "7156.600s",
                    "word": "watch"
                  },
                  {
                    "endTime": "7199.900s",
                    "speakerTag": 1,
                    "startTime": "7199.200s",
                    "word": "I"
                  },
                  {
                    "endTime": "7202.900s",
                    "speakerTag": 1,
                    "startTime": "7199.900s",
                    "word": "love"
                  },
                  {
                    "endTime": "7203.100s",
                    "speakerTag": 1,
                    "startTime": "7202.900s",
                    "word": "you"
                  },
                  {
                    "endTime": "7483.800s",
                    "speakerTag": 1,
                    "startTime": "7481.300s",
                    "word": "how"
                  },
                  {
                    "endTime": "7484s",
                    "speakerTag": 1,
                    "startTime": "7483.800s",
                    "word": "old"
                  },
                  {
                    "endTime": "7484.200s",
                    "speakerTag": 1,
                    "startTime": "7484s",
                    "word": "is"
                  },
                  {
                    "endTime": "7484.300s",
                    "speakerTag": 1,
                    "startTime": "7484.200s",
                    "word": "Moana"
                  }
                ]
              }
            ]
          }
        ]
      }
    }
    
    

    Speaker Tag在这个API中被弃用,speakerTag很难给出准确的结果,我建议您使用ChannelTag代替speakerTag,比如result.ChannelTag,您可能会得到更好的结果。

    我遇到了同样的问题,特别是在性能不好的情况下,与日记相关的问题。 我也曾尝试从AWS获取我的脚本,但我发现单词错误率较高,但更好地识别人与人之间的转换

    正如您所知,这是一个测试版功能,在这个阶段,他们没有SLA(服务级别协议)来完成。 我向谷歌团队报告了这个bug,他们回答说:

    测试版中没有SLA或技术支持义务 除非产品条款[…]中另有规定。平均β 该阶段持续约六个月

    所以我相信团队需要一段时间才能正式发布这个特性


    如果没有分离的频道结果,我的音频文件中没有分离的频道。ChannelTag将返回至少1,因此,您将为所有单词获得一个ChannelTag,如果有多个频道,您可以为其启用扬声器日记,这可以分离频道,通过ChannelTag您可以看到哪个频道频道词是可以的,谢谢,但我需要同样的扬声器日记功能channel@AvielNiego您可以在识别配置或日记配置中启用说话人日记,并且可以使用channelTag而不是speakerTag,因为speakerTag在大多数情况下已被弃用且不准确,因为你没有分开的频道,你看不出有什么不同,但是如果有多个频道,你可以通过在你的单词中使用channelTag,开始时间和结束时间迭代循环,得到正在讲话的频道……你在世界上任何地方提过这个问题吗?我也遇到同样的问题,tks。