Python 比较了2个文件中的数据_Python_Compare

Python 比较了2个文件中的数据

python

Python 比较了2个文件中的数据,python,compare,Python,Compare,我刚开始对任何困惑感到抱歉我有两个文件。文件A中有我感兴趣的样本名称列表。文件B包含所有样本的数据 File A (no headers) sample_A sample_XA sample_12754 samples_75t File B name description etc ..... sample_JA mm 0.01 0.1 1.2 0.018 e

我刚开始对任何困惑感到抱歉

我有两个文件。文件A中有我感兴趣的样本名称列表。文件B包含所有样本的数据

File A (no headers)

sample_A
sample_XA
sample_12754
samples_75t

File B

name                  description      etc .....
sample_JA                mm           0.01         0.1     1.2      0.018  etc
sample_A                 mm           0.001        1.2     0.8      1.4    etc
sample_XA                hu           0.4          0.021   0.14     2.34   etc
samples_YYYY             RN           0.0001       3.435   1.1      0.01   etc
sample_12754             mm           0.1          0.1     0.87     0.54   etc
sample_2248333           hu           0.43         0.01    0.11     2.32   etc
samples_75t              mm           0.3          0.02    0.14     2.34   etc

我想比较文件A和文件B，并从B输出数据，但只针对A中列出的示例名称

我试过这个

#!/usr/bin/env python2

import csv

count = 0

import collections
samples = collections.defaultdict(list)
with open('FILEA.txt') as d:
sites = [l.strip() for l in f if l.strip()]      

###This gives me the correct list of samples for file A.

with open('FILEB','r') as inF:
   for line in inF:
       elements = line.split()
       if sites.intersection(elements):
          count += 1

          print (elements)

##这里我得到了文件B中所有样本的名称，只有名称。我想要文件B中的数据，但只需要A中的样本

然后我试着使用和交叉

#!/usr/bin/env python2

 import sys
 import csv
 import collections

 samples = collections.defaultdict(list)
 with open('FILEA.txt','r') as f:
   nsamples = [l.strip() for l in f if l.strip()] 

 print (nsamples)

 with open ('FILEB','r') as inF:
   for row in inF:
     elements = row.split()
     if nsamples.intersection(elements):
        print(row[0,:])

还是不行

What do I have to do to get the output data as follows:
name                  description      etc .....
sample_A                 mm           0.001        1.2     0.8       1.4   etc
sample_XA                hu           0.4          0.021   0.14      2.34  etc
sample_12754             mm           0.1          0.1     0.87      0.54  etc
sample_75t               mm           0.3          0.02    0.14      2.34  etc

任何想法都将不胜感激。谢谢。

从

filea

中创建一组行，然后将

fileb

中的每一行拆分一次，查看第一个元素是否在

filea

中的数据集中：

with open("filea") as f, open("fileb") as f2:
    # male set of lines stripping newlines
    # so we can compare properly later i.e foo\n != foo
    st  = set(map(str.rstrip, f)) # itertools.imap python2
    for line in f2:
        # split once and extract first element to compare
        if line.strip() and line.split(None, 1)[0] in st:
            print(line.rstrip())

输出：

sample_A                 mm           0.001        1.2     0.8      1.4    etc
sample_XA                hu           0.4          0.021   0.14     2.34   etc
sample_12754             mm           0.1          0.1     0.87     0.54   etc
samples_75t              mm           0.3          0.02    0.14     2.34   etc

@user5511186如果您找到了适合您的解决方案，请不要忘记单击答案左侧的灰色复选标记，将其标记为已接受。谢谢