Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/280.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何将对话从Watson语音重建为文本输出?_Python_Pandas_Ibm Watson_Speech To Text - Fatal编程技术网

Python 如何将对话从Watson语音重建为文本输出?

Python 如何将对话从Watson语音重建为文本输出?,python,pandas,ibm-watson,speech-to-text,Python,Pandas,Ibm Watson,Speech To Text,我有来自Watson语音到文本服务的JSON输出,我将其转换为列表,然后转换为数据帧 我试图确定如何重建对话(包括时间安排),类似于以下内容: 发言者0:说了这句话[00.01-00.12] 发言者1:说[00.12-00.22] 发言者0:说了些别的话[00.22-00.56] “我的数据帧”中的每个单词都有一行,该单词的列、开始/结束时间和扬声器标记(0或1) 理想情况下,我希望创建以下内容,将同一个说话人所说的词组合在一起,并在下一个说话人进入时断开: grouped_words = [[

我有来自Watson语音到文本服务的JSON输出,我将其转换为列表,然后转换为数据帧

我试图确定如何重建对话(包括时间安排),类似于以下内容:

发言者0:说了这句话[00.01-00.12]

发言者1:说[00.12-00.22]

发言者0:说了些别的话[00.22-00.56]

“我的数据帧”中的每个单词都有一行,该单词的列、开始/结束时间和扬声器标记(0或1)

理想情况下,我希望创建以下内容,将同一个说话人所说的词组合在一起,并在下一个说话人进入时断开:

grouped_words = [[['said','this'], 0.01, 0.12, 0],[['said','that'] 0.12, 
0.22, 1],[['said','something','else'] 0.22, 0.56, 0]

更新:根据请求,指向获取的JSON文件示例的链接位于

,将扬声器标签加载到Pandas数据框中,以获得一个简单的图形视图,然后识别扬声器移位

speakers=pd.DataFrame(jsonconvo['speaker_labels']).loc[:,['from','speaker','to']]
convo=pd.DataFrame(jsonconvo['results'][0]['alternatives'][0]['timestamps'])
speakers=speakers.join(convo)
输出:

   from  speaker    to          0     1     2
0  0.01        0  0.06       said  0.01  0.06
1  0.06        0  0.12       this  0.06  0.12
2  0.12        1  0.15       said  0.12  0.15
3  0.15        1  0.22       that  0.15  0.22
4  0.22        0  0.31       said  0.22  0.31
5  0.31        0  0.45  something  0.31  0.45
6  0.45        0  0.56       else  0.45  0.56
   from    to speaker               transcript
0  0.01  0.12       0             [said, this]
0  0.12  0.22       1             [said, that]
0  0.22  0.56       0  [said, something, else]
从那里,您可以识别扬声器移位,并通过快速循环折叠数据帧

changepoker=speakers.loc[speakers['speaker'].shift()!=speakers['speaker']]索引

Transcript=pd.DataFrame(columns=['from','to','speaker','transcript'])
for counter in range(0,len(ChangeSpeaker)):
    print(counter)
    currentindex=ChangeSpeaker[counter]
    try:
        nextIndex=ChangeSpeaker[counter+1]-1
        temp=speakers.loc[currentindex:nextIndex,:]
    except:
        temp=speakers.loc[currentindex:,:]
Transcript=Transcript.append(pd.DataFrame([[temp.head(1)['from'].values[0],temp.tail(1)['to'].values[0],temp.head(1)['speaker'].values[0],temp[0].tolist()]],columns=['from','to','speaker','transcript']))
您希望从临时数据帧中的第一个值(因此是head)获取起点,然后从最后一个值获取终点。此外,要处理最后一个扬声器案例(通常会出现数组越界错误),可以使用try/catch

输出:

   from  speaker    to          0     1     2
0  0.01        0  0.06       said  0.01  0.06
1  0.06        0  0.12       this  0.06  0.12
2  0.12        1  0.15       said  0.12  0.15
3  0.15        1  0.22       that  0.15  0.22
4  0.22        0  0.31       said  0.22  0.31
5  0.31        0  0.45  something  0.31  0.45
6  0.45        0  0.56       else  0.45  0.56
   from    to speaker               transcript
0  0.01  0.12       0             [said, this]
0  0.12  0.22       1             [said, that]
0  0.22  0.56       0  [said, something, else]
完整代码在这里

import json
import pandas as pd

jsonconvo=json.loads("""{
   "results": [
      {
         "alternatives": [
            {
               "timestamps": [
                  [
                     "said", 
                     0.01, 
                     0.06
                  ], 
                  [
                     "this", 
                     0.06, 
                     0.12
                  ], 
                  [
                     "said", 
                     0.12, 
                     0.15
                  ], 
                  [
                     "that", 
                     0.15, 
                     0.22
                  ], 
                  [
                     "said", 
                     0.22, 
                     0.31
                  ], 
                  [
                     "something", 
                     0.31, 
                     0.45
                  ], 
                  [
                     "else", 
                     0.45, 
                     0.56
                  ]
               ], 
               "confidence": 0.85, 
               "transcript": "said this said that said something else "
            }
         ], 
         "final": true
      }
   ], 
   "result_index": 0, 
   "speaker_labels": [
      {
         "from": 0.01, 
         "to": 0.06, 
         "speaker": 0, 
         "confidence": 0.55, 
         "final": false
      }, 
      {
         "from": 0.06, 
         "to": 0.12, 
         "speaker": 0, 
         "confidence": 0.55, 
         "final": false
      }, 
      {
         "from": 0.12, 
         "to": 0.15, 
         "speaker": 1, 
         "confidence": 0.55, 
         "final": false
      }, 
      {
         "from": 0.15, 
         "to": 0.22, 
         "speaker": 1, 
         "confidence": 0.55, 
         "final": false
      }, 
      {
         "from": 0.22, 
         "to": 0.31, 
         "speaker": 0, 
         "confidence": 0.55, 
         "final": false
      }, 
      {
         "from": 0.31, 
         "to": 0.45, 
         "speaker": 0, 
         "confidence": 0.55, 
         "final": false
      }, 
      {
         "from": 0.45, 
         "to": 0.56, 
         "speaker": 0, 
         "confidence": 0.54, 
         "final": false
      }
   ]
}""")



speakers=pd.DataFrame(jsonconvo['speaker_labels']).loc[:,['from','speaker','to']]
convo=pd.DataFrame(jsonconvo['results'][0]['alternatives'][0]['timestamps'])
speakers=speakers.join(convo)

ChangeSpeaker=speakers.loc[speakers['speaker'].shift()!=speakers['speaker']].index


Transcript=pd.DataFrame(columns=['from','to','speaker','transcript'])
for counter in range(0,len(ChangeSpeaker)):
    print(counter)
    currentindex=ChangeSpeaker[counter]
    try:
        nextIndex=ChangeSpeaker[counter+1]-1
        temp=speakers.loc[currentindex:nextIndex,:]
    except:
        temp=speakers.loc[currentindex:,:]



    Transcript=Transcript.append(pd.DataFrame([[temp.head(1)['from'].values[0],temp.tail(1)['to'].values[0],temp.head(1)['speaker'].values[0],temp[0].tolist()]],columns=['from','to','speaker','transcript']))

你能给我看一下Watson的JSON输出吗?谢谢你的回复,我已经在文件中添加了一个链接。意识到我的回复并不是你想要的答案-仍然是解决方案。你是我的朋友,是一位学者和绅士!非常感谢。:)没问题!您将遇到的问题是内存(数据帧的内存非常昂贵),如果您开始使用较长的成绩单,您可能会希望用列表上的pd.concat或其他东西逐项替换该成绩单。感谢您的提醒。Watson Lite access将音频文件限制在100mb,所以我可能会像现在这样生存下去。不过,很高兴知道替代方案。