Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/dart/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从一个TSV文件中读取多行,并基于列使用逗号追加数据_Python_Parsing_Python 2.7_Text Parsing_Tsv - Fatal编程技术网

Python 从一个TSV文件中读取多行,并基于列使用逗号追加数据

Python 从一个TSV文件中读取多行,并基于列使用逗号追加数据,python,parsing,python-2.7,text-parsing,tsv,Python,Parsing,Python 2.7,Text Parsing,Tsv,如何基于TSV文件中的列索引解析数据? 一旦我们从文件中读取数据,我们必须检查第0列第1行数据和第0列第2行数据,如果匹配,则获取第1列第1行数据,并需要在第1列第1行中追加所有匹配的条目 比如说,, SystemType.tsv文件 Actrius 1990s drama films Actrius Catalan language films Actrius Spanish films Actrius Barcelona in fiction Actrius Films d

如何基于TSV文件中的列索引解析数据? 一旦我们从文件中读取数据,我们必须检查第0列第1行数据和第0列第2行数据,如果匹配,则获取第1列第1行数据,并需要在第1列第1行中追加所有匹配的条目

比如说,, SystemType.tsv文件

Actrius  1990s drama films 
Actrius  Catalan language films 
Actrius  Spanish films 
Actrius  Barcelona in fiction 
Actrius  Films directed by Ventura Pons 
Actrius  1996 films 
An_American_in_Paris     Compositions by George Gershwin 
An_American_in_Paris     Symphonic poems 
An_American_in_Paris     Grammy Hall of Fame Award recipients 
在第0列的第1行中有“Actrius”,因此我们需要比较第0列中的所有行,并将匹配条目放置在第1列值中,以逗号分隔形式如下

输出:

Actrius   1990s drama flims,Cataln language flims,Spanish flims,Barcelona in fiction,Films directed by Ventura Pons,1996 films
An_American_in_Paris    Compositions by George Gershwin,Symphonic poems,Grammy Hall of Fame Award recipients
我试过这个,但不适合我。


以下是我的想法(Python3,但我认为唯一的区别应该是我的打印函数。如果你想使用它写入输出文件,你可以从uuu未来uuu导入打印u函数中
):

我的输出是:

Actrius 1990s drama films,Catalan language films,Spanish films,Barcelona in fiction,Films directed by Ventura Pons,1996 films
An_American_in_Paris    Compositions by George Gershwin,Symphonic poems,Grammy Hall of Fame Award recipients
import collections

# I used variable "input" to hold the string from your example .tsv contents;
# you'd really want to read it in from a file.

D = collections.OrderedDict()
for line in input.splitlines():
    key, value = line.split('\t')
    if key not in D:
        D[key] = []
    D[key].append(value.strip())

for key, values in D.items():
    print(key, ','.join(values), sep='\t')
Actrius 1990s drama films,Catalan language films,Spanish films,Barcelona in fiction,Films directed by Ventura Pons,1996 films
An_American_in_Paris    Compositions by George Gershwin,Symphonic poems,Grammy Hall of Fame Award recipients