如何将文本从ctl文件垂直连接到水平,然后使用python保存到新的ctl文件中?
我有一个mlt.ctl文件,其中的文本排列如下:如何将文本从ctl文件垂直连接到水平,然后使用python保存到新的ctl文件中?,python,speech-recognition,Python,Speech Recognition,我有一个mlt.ctl文件,其中的文本排列如下: znrmi_001/znrmi_001_001 znrmi_001/znrmi_001_002 znrmi_001/znrmi_001_003 zntoy_001/zntoy_001_001 zntoy_001/zntoy_001_002 zntoy_001/zntoy_001_003 zntoy_001/zntoy_001_004 ....................... zntoy_001/zntoy_001_160
znrmi_001/znrmi_001_001
znrmi_001/znrmi_001_002
znrmi_001/znrmi_001_003
zntoy_001/zntoy_001_001
zntoy_001/zntoy_001_002
zntoy_001/zntoy_001_003
zntoy_001/zntoy_001_004
.......................
zntoy_001/zntoy_001_160
....................
zntoy_002/zntoy_002_001
zntoy_002/zntoy_002_002
.......................
zntoy_002/zntoy_002_149
需要在newmlt.ctl文件中保存所需格式,所需格式如下所示:
znrmi_001 znrmi_001_001 znrmi_001_002 znrmi_001_003
zntoy_001 zntoy_001_001 zntoy_001_002..................zntoy_001_160
zntoy_002 zntoy_002_001 zntoy_002_002..................zntoy_002_149
....................................................................
我正在努力学习python,但每次都会出错
#!/usr/bin/env python
fi= open("mlt.ctl","r")
y_list = []
for line in fi.readlines():
a1 = line[0:9]
a2 = line[10:19]
a3 = line[20:23]
if a3 in xrange(1,500):
y = a1+ " ".join(line[20:23].split())
print(y)
elif int(a3) < 2:
fo.write(lines+ "\n")
else:
stop
y_list.append(y)
print(y)
fi.close()
fo = open ("newmlt.ctl", "w")
for lines in y_list:
fo.write(lines+ "\n")
fo.close()
#/usr/bin/env python
fi=打开(“mlt.ctl”、“r”)
y_列表=[]
对于fi.readlines()中的行:
a1=直线[0:9]
a2=直线[10:19]
a3=行[20:23]
如果a3在X范围内(1500):
y=a1+“”。连接(第[20:23]行)。拆分()
打印(y)
elif int(a3)<2:
fo.写入(行+“\n”)
其他:
停止
y_列表。追加(y)
打印(y)
fi.close()
fo=打开(“newmlt.ctl”,“w”)
对于y_列表中的行:
fo.写入(行+“\n”)
fo.close()
如果出现错误且代码运行不正常,请提供输入。可能与此无关,但您似乎忘记了第11行的“')”
y = a1+ " ".join(line[20:23].split()
应该是
y = a1+ " ".join(line[20:23].split())
以及第14行的else
处的“:”和第20行的处的
在第12行,您可能会使用正则表达式比较字符串和整数,并将匹配项保存到字典中:
import re
REGEX = r"\d.\s(\S+)/(\S+)" # group 1: the unique index; group 2: the value
finder = re.compile(REGEX) # compile the regular expression
with open('mlt.ctl', 'r') as f:
data = f.read() # read the entire file into data
matches = re.finditer(finder, data) # find all matches (one for each line)
d = {}
indices = []
for match in matches: # loop through the matches
key = match.group(1) # the index
val = match.group(2) # the value
if key in d.keys(): # the key has already been processed, just append the value to the list
d[key].append(val)
else: # the key is new; create a new dict entry and keep track of the index in the indices list
d[key] = [val]
indices.append(key)
with open("newmlt.ctl", "w") as out:
for i, idx in enumerate(indices):
vals = " ".join(d[idx]) # join the values into a space-delimited string
to_string = "{} {}\n".format(idx,vals)
out.write(to_string)
有点像蟒蛇:
from collections import defaultdict
d = defaultdict(list)
with open('mlt.ctl') as f:
for line in f:
grp, val = line.strip().split('/')
d[grp].append(val)
with open('newmlt.ctl','w') as f:
for k in sorted(d):
oline = ' '.join([k]+d[k])+'\n'
f.write(oline)
你已经试过@Nikolay Shmyrev了,请提供上面代码中的输入。它给出了正确的输出,但也给出了从零开始的序列号。像0 znrmi_001 znrmi_001 znrmi_001_002 znrmi_001_003一样,如何删除字符串开头的序列号。我编辑了答案。您只需修改to_string
行toto_string=“{}{}\n”。format(idx,vals)
,这样它就不会在开始时显示索引。回溯(最后一次调用):第28行,to_string=“{}.{}\n”。format(idx,vals)索引器:元组索引超出范围正确的代码在答案中。请看一下工作,但改变了顺序。@Andy认为文件已排序。如果要保留密钥顺序,请准备一个列表keys=[]
并在输入循环中收集它们如果grp不在keys:keys.append(grp)
。在输出循环中,对k in键迭代键: