Python 谷歌语音API转录响应被重复多次_Python_Google Cloud Speech

Python 谷歌语音API转录响应被重复多次

python

Python 谷歌语音API转录响应被重复多次,python,google-cloud-speech,Python,Google Cloud Speech,我正在使用google cloud speech（0.35.0）的最新python库，我得到的结果如下，第一个转录结果中的单词在第二个转录结果中重复，以此类推，直到最后。以前的版本（0.34.0）并非如此参考源代码源代码： config = speech.types.RecognitionConfig( encoding=enums.RecognitionConfig.AudioEncoding.FLAC, sample_rate_hertz=

我正在使用google cloud speech（0.35.0）的最新python库，我得到的结果如下，第一个转录结果中的单词在第二个转录结果中重复，以此类推，直到最后。以前的版本（0.34.0）并非如此

参考源代码

源代码：

config = speech.types.RecognitionConfig(
            encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
            sample_rate_hertz=48000,
            language_code='en-US',
            alternative_language_codes={'en-IN'},
            # max_alternatives=10,
            profanity_filter=True,
            enable_word_time_offsets=True,
            enable_word_confidence=True,
            enable_automatic_punctuation=True,
            enable_speaker_diarization=True,
            diarization_speaker_count=5,
            #model="video",
            use_enhanced=True)

results {
    alternatives {
        transcript: "start"
        confidence: 0.632519185543
        words {
            start_time {}
            end_time {
                seconds: 5
                nanos: 900000000
            }
            word: "start"
            confidence: 0.655210196972
            speaker_tag: 1
        }
    }
}

.....
.....
.....

results {
    alternatives {
        transcript: "end"
        confidence: 0.632519185543
        words {
            start_time {}
            end_time {
                seconds: 5
                nanos: 900000000
            }
            word: "start"
            confidence: 0.655210196972
            speaker_tag: 1
        }
        words {
            start_time {
                seconds: 129
                nanos: 300000000
            }
            end_time {
                seconds: 130
                nanos: 400000000
            }
            word: "end"
            confidence: 0.624447464943
            speaker_tag: 1
        }

    }
}

结果：

config = speech.types.RecognitionConfig(
            encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
            sample_rate_hertz=48000,
            language_code='en-US',
            alternative_language_codes={'en-IN'},
            # max_alternatives=10,
            profanity_filter=True,
            enable_word_time_offsets=True,
            enable_word_confidence=True,
            enable_automatic_punctuation=True,
            enable_speaker_diarization=True,
            diarization_speaker_count=5,
            #model="video",
            use_enhanced=True)

results {
    alternatives {
        transcript: "start"
        confidence: 0.632519185543
        words {
            start_time {}
            end_time {
                seconds: 5
                nanos: 900000000
            }
            word: "start"
            confidence: 0.655210196972
            speaker_tag: 1
        }
    }
}

.....
.....
.....

results {
    alternatives {
        transcript: "end"
        confidence: 0.632519185543
        words {
            start_time {}
            end_time {
                seconds: 5
                nanos: 900000000
            }
            word: "start"
            confidence: 0.655210196972
            speaker_tag: 1
        }
        words {
            start_time {
                seconds: 129
                nanos: 300000000
            }
            end_time {
                seconds: 130
                nanos: 400000000
            }
            word: "end"
            confidence: 0.624447464943
            speaker_tag: 1
        }

    }
}

问题：

config = speech.types.RecognitionConfig(
            encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
            sample_rate_hertz=48000,
            language_code='en-US',
            alternative_language_codes={'en-IN'},
            # max_alternatives=10,
            profanity_filter=True,
            enable_word_time_offsets=True,
            enable_word_confidence=True,
            enable_automatic_punctuation=True,
            enable_speaker_diarization=True,
            diarization_speaker_count=5,
            #model="video",
            use_enhanced=True)

results {
    alternatives {
        transcript: "start"
        confidence: 0.632519185543
        words {
            start_time {}
            end_time {
                seconds: 5
                nanos: 900000000
            }
            word: "start"
            confidence: 0.655210196972
            speaker_tag: 1
        }
    }
}

.....
.....
.....

results {
    alternatives {
        transcript: "end"
        confidence: 0.632519185543
        words {
            start_time {}
            end_time {
                seconds: 5
                nanos: 900000000
            }
            word: "start"
            confidence: 0.655210196972
            speaker_tag: 1
        }
        words {
            start_time {
                seconds: 129
                nanos: 300000000
            }
            end_time {
                seconds: 130
                nanos: 400000000
            }
            word: "end"
            confidence: 0.624447464943
            speaker_tag: 1
        }

    }
}

为什么我在回复中得到多个结果

在所有结果集中重复单词的原因是什么？以前，每个结果集只包含在该时间范围内说出的单词

谷歌似乎在他们的文档中记录了类似的内容

注意：如果这是真的，我们将发送从每个连续响应中的顶级备选方案的音频。这这样做是为了在我们的模型学习时改进我们的扬声器标签随着时间的推移，确定对话中的发言人

您能否包含打印响应结果的代码部分？