Python 在两个文件的两个时间戳之间组合数据_Python_File_Sorting_Text_Compare

Python 在两个文件的两个时间戳之间组合数据

python file sorting text

Python 在两个文件的两个时间戳之间组合数据,python,file,sorting,text,compare,Python,File,Sorting,Text,Compare,我有两个文件要比较，一个文件中的单词被分割成另一个文件的多个部分。我需要找到一种方法将这些片段与原始单词/短语进行映射在给定的文件中，我使用中文单词的timestart和time end，比较了这个时间戳下的电话集，并打印了中文单词的那些值我使用的文件是：参考文件：段文件：到目前为止，我尝试过的代码是： outfile=open("lexlog",'w') phoneme=[] with open("ref.txt"+file,'r') as sylfile:

我有两个文件要比较，一个文件中的单词被分割成另一个文件的多个部分。我需要找到一种方法将这些片段与原始单词/短语进行映射

在给定的文件中，我使用中文单词的timestart和time end，比较了这个时间戳下的电话集，并打印了中文单词的那些值

我使用的文件是：
参考文件：
段文件：

到目前为止，我尝试过的代码是：

    outfile=open("lexlog",'w')

    phoneme=[]
    with open("ref.txt"+file,'r') as sylfile:
        for lines in sylfile:
            start,end,syl=lines.split()
            #print "from syl "+start,end
            with open("hyp.txt", 'r') as phnfile:
                for line in phnfile:
                    startphn, endphn, sylphn = line.split()
                    if (startphn>=start) and (endphn<=end) and (startphn<endphn):
                        phoneme.append(sylphn)
                        print `enter code here`sylphn
                        outfile.write(startphn+" "+start+" "+endphn+" "+end)
                print file,syl,' '.join(phoneme)
                outfile.write(file+" "+syl+" "+' '.join(phoneme)+"\n")
                phoneme=[]

但结果是：

ref.txt！SIL-SIL
ref.txt非 费伊
ref.txt生 吴世华
ref.txt物 U
ref.txt物 U
ref.txt體 不是吗
ref.txt也 我爱你
ref.txt會 胡慧
ref.txt有 我要把它放在一个单独的地方
ref.txt一 我
ref.txt種 吴志雄
ref.txt被 B EI
ref.txt稱 吴振华
ref.txt作 ZU O
ref.txt自 Z IH
ref.txt殺 什叶派
ref.txt的 啊
ref.txt設 嘘啊
ref.txt計 J I
ref.txt！SIL-SIL
ref.txt例 我
ref.txt如 RU
ref.txt！SIL-SIL
ref.txt人 瑞安
ref.txt工 吴国强
ref.txt智 ZH-IH
ref.txt慧 胡慧
ref.txt！SIL-SIL
ref.txt在 慈爱
ref.txt被 B EI
ref.txt電 迪恩
ref.txt腦 纽奥
ref.txt病 B I NG
ref.txt毒 杜
ref.txt入 RU
ref.txt侵 Q I N
ref.txt的 啊
ref.txt情 吴清辉
ref.txt況 吴国安
ref.txt下 X I A
ref.txt！SIL-SIL
ref.txt會 胡慧
ref.txt啟 问题一
ref.txt動 德昂
ref.txt殺  
ref.txt毒 杜
ref.txt程 吴振华
ref.txt系 XI
ref.txt！SIL-SIL
ref.txt同 汤东
ref.txt時 SH IH
ref.txt刪 上海
ref.txt除 楚
ref.txt自 Z IH
ref.txt己 J I
ref.txt體 不是吗
ref.txt內 奈伊
ref.txt的 啊
ref.txt檔 D A NG
ref.txt案 A N
ref.txt！SIL-SIL

不知怎的，输出中的第八行得到了与我期望的不同的结果。非常感谢您的帮助。

理解所需的输出有点困难，但您似乎也在尝试输出时间值

您正在以字符串形式从每个文件中读取数据。需要将每行的前两列转换为浮点，否则将对字符串值而不是数值进行比较。在这里，我将前两个值转换为浮点数

import csv
file = ''

with open("lexlog.txt", 'wb') as outfile:
    csv_log = csv.writer(outfile, delimiter=' ')
    phoneme = []

    with open("ref.txt" + file, 'r') as sylfile:
        for lines in sylfile:
            row_sylfile = lines.split()
            start, end, syl = float(row_sylfile[0]), float(row_sylfile[1]), row_sylfile[2]

            with open("hyp.txt", 'r') as phnfile:
                data = []
                for line in phnfile:
                    row_phnfile = line.split()
                    startphn, endphn, sylphn = float(row_phnfile[0]), float(row_phnfile[1]), row_phnfile[2]

                    if (startphn >= start) and (endphn <= end) and (startphn < endphn):
                        phoneme.append(sylphn)
                        #print sylphn
                        data.extend([startphn, start, endphn, end])

                #print file,syl,' '.join(phoneme)
                csv_log.writerow(data + [file, syl] + phoneme)
                phoneme = []

理解所需的输出有点困难，但您似乎也在尝试输出时间值

import csv
file = ''

with open("lexlog.txt", 'wb') as outfile:
    csv_log = csv.writer(outfile, delimiter=' ')
    phoneme = []

    with open("ref.txt" + file, 'r') as sylfile:
        for lines in sylfile:
            row_sylfile = lines.split()
            start, end, syl = float(row_sylfile[0]), float(row_sylfile[1]), row_sylfile[2]

            with open("hyp.txt", 'r') as phnfile:
                data = []
                for line in phnfile:
                    row_phnfile = line.split()
                    startphn, endphn, sylphn = float(row_phnfile[0]), float(row_phnfile[1]), row_phnfile[2]

                    if (startphn >= start) and (endphn <= end) and (startphn < endphn):
                        phoneme.append(sylphn)
                        #print sylphn
                        data.extend([startphn, start, endphn, end])

                #print file,syl,' '.join(phoneme)
                csv_log.writerow(data + [file, syl] + phoneme)
                phoneme = []

问题的一部分是比较字符串。前两列需要转换为浮动。e、 g.

row=line.split（）

和

startphn、endphn、sylphn=float（行[0]）、float（行[1]）、row[2]

部分问题在于比较字符串。前两列需要转换为浮动。e、 g.

row=line.split（）

和

startphn，endphn，sylphn=float（row[0]），float（row[1]），row[2]

非常有用，感谢您的帮助，也感谢您让我知道我在设置问题格式时的错误。欢迎您！别忘了点击上/下箭头下的灰色勾号，接受答案作为已接受的解决方案。它像一个魔咒一样工作感谢您的帮助，也感谢您让我知道我在设置问题格式时的错误。欢迎您！不要忘记单击上/下箭头下的灰色勾号，以接受答案作为已接受的解决方案。