Ibm cloud Watson语音到文本，单词时间戳与音频不同步_Ibm Cloud_Speech To Text_Watson

Ibm cloud Watson语音到文本，单词时间戳与音频不同步

ibm-cloud

Ibm cloud Watson语音到文本，单词时间戳与音频不同步,ibm-cloud,speech-to-text,watson,Ibm Cloud,Speech To Text,Watson,我正在使用speech to text和以下参数- timestamps=true&max\u alternations=1&model=en-US\u窄带模型&smart\u formatting=true' 标题-'Content-Type'=>'audio/flac'，'Transfer-Encoding'=>'chunked' 并提供一个audio/flac文件进行处理，但返回的单词时间边界与音频不同步例如，响应为-：花一个上午我有2个问题，请先%犹豫一下能力有多大时间戳如下所示-

我正在使用speech to text和以下参数-

timestamps=true&max\u alternations=1&model=en-US\u窄带模型&smart\u formatting=true'

标题-

'Content-Type'=>'audio/flac'，'Transfer-Encoding'=>'chunked'

并提供一个

audio/flac

文件进行处理，但返回的单词时间边界与音频不同步

例如，响应为-：

花一个上午我有2个问题，请先%犹豫一下能力有多大

时间戳如下所示-

[
                            [
                                "take",
                                1409.48,
                                1409.62
                            ],
                            [
                                "a",
                                1409.62,
                                1409.67
                            ],
                            [
                                "morning",
                                1409.67,
                                1410.03
                            ],
                            [
                                "I",
                                1410.06,
                                1410.17
                            ],
                            [
                                "have",
                                1410.17,
                                1410.38
                            ],
                            [
                                "two",
                                1410.41,
                                1410.58
                            ],
                            [
                                "questions",
                                1410.58,
                                1411.05
                            ],
                            [
                                "please",
                                1411.05,
                                1411.42
                            ],
                            [
                                "%HESITATION",
                                1411.42,
                                1411.65
                            ],
                            [
                                "first",
                                1411.65,
                                1412.17
                            ],
                            [
                                "how",
                                1412.33,
                                1412.62
                            ],
                            [
                                "how",
                                1412.65,
                                1412.77
                            ],
                            [
                                "much",
                                1412.77,
                                1413
                            ],
                            [
                                "of",
                                1413,
                                1413.1
                            ],
                            [
                                "the",
                                1413.1,
                                1413.37
                            ],
                            [
                                "ability",
                                1413.37,
                                1413.82
                            ]
                        ]

但在实际的音频中，这些词在不同的时间出现。（差几秒）

有什么建议吗