Python 谷歌语音API转录响应被重复多次
我正在使用google cloud speech(0.35.0)的最新python库,我得到的结果如下,第一个转录结果中的单词在第二个转录结果中重复,以此类推,直到最后。以前的版本(0.34.0)并非如此 参考源代码 源代码:Python 谷歌语音API转录响应被重复多次,python,google-cloud-speech,Python,Google Cloud Speech,我正在使用google cloud speech(0.35.0)的最新python库,我得到的结果如下,第一个转录结果中的单词在第二个转录结果中重复,以此类推,直到最后。以前的版本(0.34.0)并非如此 参考源代码 源代码: config = speech.types.RecognitionConfig( encoding=enums.RecognitionConfig.AudioEncoding.FLAC, sample_rate_hertz=
config = speech.types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz=48000,
language_code='en-US',
alternative_language_codes={'en-IN'},
# max_alternatives=10,
profanity_filter=True,
enable_word_time_offsets=True,
enable_word_confidence=True,
enable_automatic_punctuation=True,
enable_speaker_diarization=True,
diarization_speaker_count=5,
#model="video",
use_enhanced=True)
results {
alternatives {
transcript: "start"
confidence: 0.632519185543
words {
start_time {}
end_time {
seconds: 5
nanos: 900000000
}
word: "start"
confidence: 0.655210196972
speaker_tag: 1
}
}
}
.....
.....
.....
results {
alternatives {
transcript: "end"
confidence: 0.632519185543
words {
start_time {}
end_time {
seconds: 5
nanos: 900000000
}
word: "start"
confidence: 0.655210196972
speaker_tag: 1
}
words {
start_time {
seconds: 129
nanos: 300000000
}
end_time {
seconds: 130
nanos: 400000000
}
word: "end"
confidence: 0.624447464943
speaker_tag: 1
}
}
}
结果:
config = speech.types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz=48000,
language_code='en-US',
alternative_language_codes={'en-IN'},
# max_alternatives=10,
profanity_filter=True,
enable_word_time_offsets=True,
enable_word_confidence=True,
enable_automatic_punctuation=True,
enable_speaker_diarization=True,
diarization_speaker_count=5,
#model="video",
use_enhanced=True)
results {
alternatives {
transcript: "start"
confidence: 0.632519185543
words {
start_time {}
end_time {
seconds: 5
nanos: 900000000
}
word: "start"
confidence: 0.655210196972
speaker_tag: 1
}
}
}
.....
.....
.....
results {
alternatives {
transcript: "end"
confidence: 0.632519185543
words {
start_time {}
end_time {
seconds: 5
nanos: 900000000
}
word: "start"
confidence: 0.655210196972
speaker_tag: 1
}
words {
start_time {
seconds: 129
nanos: 300000000
}
end_time {
seconds: 130
nanos: 400000000
}
word: "end"
confidence: 0.624447464943
speaker_tag: 1
}
}
}
问题:
config = speech.types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz=48000,
language_code='en-US',
alternative_language_codes={'en-IN'},
# max_alternatives=10,
profanity_filter=True,
enable_word_time_offsets=True,
enable_word_confidence=True,
enable_automatic_punctuation=True,
enable_speaker_diarization=True,
diarization_speaker_count=5,
#model="video",
use_enhanced=True)
results {
alternatives {
transcript: "start"
confidence: 0.632519185543
words {
start_time {}
end_time {
seconds: 5
nanos: 900000000
}
word: "start"
confidence: 0.655210196972
speaker_tag: 1
}
}
}
.....
.....
.....
results {
alternatives {
transcript: "end"
confidence: 0.632519185543
words {
start_time {}
end_time {
seconds: 5
nanos: 900000000
}
word: "start"
confidence: 0.655210196972
speaker_tag: 1
}
words {
start_time {
seconds: 129
nanos: 300000000
}
end_time {
seconds: 130
nanos: 400000000
}
word: "end"
confidence: 0.624447464943
speaker_tag: 1
}
}
}
谷歌似乎在他们的文档中记录了类似的内容 注意:如果这是真的,我们将发送从 每个连续响应中的顶级备选方案的音频。这 这样做是为了在我们的模型学习时改进我们的扬声器标签 随着时间的推移,确定对话中的发言人
您能否包含打印响应结果的代码部分?