Python 打印与标题值匹配的列

Python 打印与标题值匹配的列,python,Python,我有两个csv文件 id,site,longitude,latitude **9936**,north,18.2,62.8 5856,north,17.4914,63.0167 **1298**,north,18.177,62.877 文件1: id,site,longitude,latitude **9936**,north,18.2,62.8 5856

我有两个csv文件

id,site,longitude,latitude             
**9936**,north,18.2,62.8              
5856,north,17.4914,63.0167             
**1298**,north,18.177,62.877   
文件1:

id,site,longitude,latitude             
**9936**,north,18.2,62.8              
5856,north,17.4914,63.0167             
**1298**,north,18.177,62.877   
文件2:

id,site,longitude,latitude             
**9936**,north,18.2,62.8              
5856,north,17.4914,63.0167             
**1298**,north,18.177,62.877   
chr,loc,4678,**1298**,2295,**9936**,7354             
chr1,849,0,0,0,0,0,             
chr1,3481,1,1,0,1,1                             
chr1,3491,0,2,0,2,0,             
我想将文件1中的列1中的ID与文件2中的行进行匹配(用
***
突出显示),如果匹配,则打印行和相应的行

id,site,longitude,latitude             
**9936**,north,18.2,62.8              
5856,north,17.4914,63.0167             
**1298**,north,18.177,62.877   
输出:

id,site,longitude,latitude             
**9936**,north,18.2,62.8              
5856,north,17.4914,63.0167             
**1298**,north,18.177,62.877   
chr,loc,**1298**,**9936**            
chr1,849,0,0             
chr1,3481,1,1                             
chr1,3491,0,2
我已经在python中尝试过了

id,site,longitude,latitude             
**9936**,north,18.2,62.8              
5856,north,17.4914,63.0167             
**1298**,north,18.177,62.877   
import csv

f1 = file('inFile.csv', 'rb')                 
f2 = file('inFile2.csv', 'rb')               
f3 = file('outFile.csv', 'wb')                           
c1 = csv.reader(f1)            
c2 = csv.reader(f2)                 
c3 = csv.writer(f3)              

matched_rows = [ row for row in c2 if row[2:6] in c1]           
for row in matched_rows:                                                  
    c3writerow[matched_rows]

但不幸的是,它不起作用。

您需要首先从文件1中加载列,并将其存储为一种能够有效查找值的格式。
设置
将在此处执行:

id,site,longitude,latitude             
**9936**,north,18.2,62.8              
5856,north,17.4914,63.0167             
**1298**,north,18.177,62.877   
with open('inFile.csv', 'rb') as ids_file:
    reader = csv.reader(ids_file)
    next(reader, None)  # skip the first row
    ids = {r[0] for r in reader}
现在,您可以测试匹配的列:

id,site,longitude,latitude             
**9936**,north,18.2,62.8              
5856,north,17.4914,63.0167             
**1298**,north,18.177,62.877   
from operator import itemgetter

with open('inFile2.csv', 'rb') as f2, file('outFile.csv', 'wb') as outf:
    reader = csv.reader(f2)
    writer = csv.writer(outf)

    headers = next(reader, [])
    # produce indices for what headers are present in the ids set
    matching_indices = [i for i, header in enumerate(headers[2:], 2) if header in ids]
    selector = itemgetter(0, 1, *matching_indices)
    # write selected columns to output file
    writer.writerow(selector(headers))
    writer.writerows(selector(row) for row in reader)
演示您的示例数据:

id,site,longitude,latitude             
**9936**,north,18.2,62.8              
5856,north,17.4914,63.0167             
**1298**,north,18.177,62.877   
首先,生成一组第一列:

id,site,longitude,latitude             
**9936**,north,18.2,62.8              
5856,north,17.4914,63.0167             
**1298**,north,18.177,62.877   
>>> ids_file = '''\
... id,site,longitude,latitude
... 9936,north,18.2,62.8
... 5856,north,17.4914,63.0167
... 1298,north,18.177,62.877
... '''.splitlines()
>>> reader = csv.reader(ids_file)
>>> next(reader, None)
['id', 'site', 'longitude', 'latitude']
>>> ids = {r[0] for r in reader}
>>> ids
set(['5856', '9936', '1298'])
然后使用该数据生成选择器,使用:

id,site,longitude,latitude             
**9936**,north,18.2,62.8              
5856,north,17.4914,63.0167             
**1298**,north,18.177,62.877   
现在,您可以使用该对象仅选择要写入输出CSV文件的列:

id,site,longitude,latitude             
**9936**,north,18.2,62.8              
5856,north,17.4914,63.0167             
**1298**,north,18.177,62.877   
>>> selector(headers)
('chr', 'loc', '1298', '9936')
>>> selector(next(reader))
('chr1', '849', '0', '0')
>>> selector(next(reader))
('chr1', '3481', '1', '1')
>>> selector(next(reader))
('chr1', '3491', '2', '2')

“它不工作”不是一个非常有用的问题描述。究竟是什么问题?我猜是一个
语法错误
。我不清楚如何生成输出。例如,控制第2、3和4行输出的规则是什么?不会出现错误消息。我试图让它查看文件2中的第2:6行,看看它是否与文件1的第1列中的值匹配,然后打印匹配的行。@user3816990:您可能在这里谈论的是列吗?第3列和第5列的标题匹配,因此您在输出中包括第0列、第13列和第5列?@user3816990:如果是这样,那么您的输出与示例输入不匹配;在这种情况下,最后一行应该是
chr13491,2,2
。非常感谢。你说的集合到底是什么意思,我试着跟着演示,但在意外的标记“(”用于生成集合)附近出现了一个语法错误。“ids={r[0]用于读取器中的r}”和“open('inFile2.csv','rb')作为f2,file('outFile.csv','wb')作为outf:'@user3816990:您使用Python 2.6吗?然后使用
ids=set(r[0]对于reader中的r)
。请参见@user3816990:在这种情况下,您需要用语句嵌套
,因此将它们放在嵌套在另一行中的单独行上。请参见Thank you Martijn,它工作得非常好,您非常清楚地解释了代码是如何工作的,以便我能够理解和学习!