Speech recognition 如何获得';暂停';使用google Speech to Text为给定音频文件标记?

Speech recognition 如何获得';暂停';使用google Speech to Text为给定音频文件标记?,speech-recognition,speech-to-text,google-speech-api,google-cloud-speech,Speech Recognition,Speech To Text,Google Speech Api,Google Cloud Speech,下面给出了从谷歌语音到音频文件文本的输出 results {   alternatives {     transcript: "extremely grateful for the "     confidence: 0.911402702331543     words {       start_time {       }       end_time {         nanos: 600000000       }       word: "extre

下面给出了从谷歌语音到音频文件文本的输出

results {
  alternatives {
    transcript: "extremely grateful for the "
    confidence: 0.911402702331543
    words {
      start_time {
      }
      end_time {
        nanos: 600000000
      }
      word: "extremely"
    }
    words {
      start_time {
        nanos: 600000000
      }
      end_time {
        nanos: 900000000
      }
      word: "grateful"
    }
    words {
      start_time {
        nanos: 900000000
      }
      end_time {
        seconds: 1
        nanos: 100000000
      }
      word: "for"
    }
    words {
      start_time {
        seconds: 1
        nanos: 100000000
      }
      end_time {
        seconds: 1
        nanos: 300000000
      }
      word: "the"
    }
    words {
      start_time {
        seconds: 1
        nanos: 300000000
      }
}
我想知道单词之间的停顿。下面的“极端”一词从0开始,以nano结尾:600000000。下一个单词“感激”的开始时间是Nano:600000000。但当我们说话时,字里行间会有空隙。我想要单词之间间隔持续时间的时间戳

有没有办法通过谷歌语音文本转换来获取这些信息?
如果没有,请提出一些替代方案,以实现同样的目标