Python 从一个TSV文件中读取多行,并基于列使用逗号追加数据
如何基于TSV文件中的列索引解析数据? 一旦我们从文件中读取数据,我们必须检查第0列第1行数据和第0列第2行数据,如果匹配,则获取第1列第1行数据,并需要在第1列第1行中追加所有匹配的条目 比如说,, SystemType.tsv文件Python 从一个TSV文件中读取多行,并基于列使用逗号追加数据,python,parsing,python-2.7,text-parsing,tsv,Python,Parsing,Python 2.7,Text Parsing,Tsv,如何基于TSV文件中的列索引解析数据? 一旦我们从文件中读取数据,我们必须检查第0列第1行数据和第0列第2行数据,如果匹配,则获取第1列第1行数据,并需要在第1列第1行中追加所有匹配的条目 比如说,, SystemType.tsv文件 Actrius 1990s drama films Actrius Catalan language films Actrius Spanish films Actrius Barcelona in fiction Actrius Films d
Actrius 1990s drama films
Actrius Catalan language films
Actrius Spanish films
Actrius Barcelona in fiction
Actrius Films directed by Ventura Pons
Actrius 1996 films
An_American_in_Paris Compositions by George Gershwin
An_American_in_Paris Symphonic poems
An_American_in_Paris Grammy Hall of Fame Award recipients
在第0列的第1行中有“Actrius”,因此我们需要比较第0列中的所有行,并将匹配条目放置在第1列值中,以逗号分隔形式如下
输出:
Actrius 1990s drama flims,Cataln language flims,Spanish flims,Barcelona in fiction,Films directed by Ventura Pons,1996 films
An_American_in_Paris Compositions by George Gershwin,Symphonic poems,Grammy Hall of Fame Award recipients
我试过这个,但不适合我。
以下是我的想法(Python3,但我认为唯一的区别应该是我的打印函数。如果你想使用它写入输出文件,你可以从uuu未来uuu导入打印u函数中):
我的输出是:
Actrius 1990s drama films,Catalan language films,Spanish films,Barcelona in fiction,Films directed by Ventura Pons,1996 films
An_American_in_Paris Compositions by George Gershwin,Symphonic poems,Grammy Hall of Fame Award recipients
import collections
# I used variable "input" to hold the string from your example .tsv contents;
# you'd really want to read it in from a file.
D = collections.OrderedDict()
for line in input.splitlines():
key, value = line.split('\t')
if key not in D:
D[key] = []
D[key].append(value.strip())
for key, values in D.items():
print(key, ','.join(values), sep='\t')
Actrius 1990s drama films,Catalan language films,Spanish films,Barcelona in fiction,Films directed by Ventura Pons,1996 films
An_American_in_Paris Compositions by George Gershwin,Symphonic poems,Grammy Hall of Fame Award recipients