Python-比较两个文件中的列并返回合并输出_Python_Loops_For Loop_Multiple Columns_String Matching

Python-比较两个文件中的列并返回合并输出

python loops for-loop

Python-比较两个文件中的列并返回合并输出,python,loops,for-loop,multiple-columns,string-matching,Python,Loops,For Loop,Multiple Columns,String Matching,我有一个看似简单的问题，但已经被困太久了。我想比较两个文件（格式如下所示）我想将file1列0和3与file2列2和3进行比较，如果它们匹配，我想按如下方式输出两个文件中匹配行的其余信息： > desired output 20 246057 0.28 68363 0 A 200192457 W 20 246058 0.28 68396 T C 138777928 Y 这是到目前为止我所拥有的代码，我已经尝试了这方面的一些变体和这里的

我有一个看似简单的问题，但已经被困太久了。我想比较两个文件（格式如下所示）

我想将file1列0和3与file2列2和3进行比较，如果它们匹配，我想按如下方式输出两个文件中匹配行的其余信息：

> desired output
20  246057  0.28    68363   0   A   200192457   W
20  246058  0.28    68396   T   C   138777928   Y

这是到目前为止我所拥有的代码，我已经尝试了这方面的一些变体和这里的许多建议，但是我仍然被困在如何从file1获取相应信息的问题上。我尝试的大多数方法都会在每次匹配时重复file1中的最后一行

#!/usr/bin/python
import csv

data2 = []
output = open("output.txt","w")

with open("file1.txt", "rb") as in_file1, open("file2.txt","rb") as in_file2:
    reader1 = csv.reader((in_file1), delimiter="\t")
    for row1 in reader1:
        y1 = row1[0], row1[3]
        data2.append(tuple(y1))
        y = row1
    reader2 = csv.reader((in_file2), delimiter="\t")
    for row2 in reader2:
        z = row2[-1], row2[2]
        if tuple(z) in data2:
            out = "\t".join(row2)
            output.write(out+"\n")

我正在努力解决的部分是在解析后从file1获取输出。因此，我目前得到的结果如下，但我还需要file1中这些行的相应信息：

> current output
200192457   W   68363   20
138777928   Y   68396   20

非常感谢任何帮助或建议！非常感谢。（我正在使用python 2.7）

这是一个很好的用例，并且：

输出：

20 246057 0.28 68363 0 A 200192457 W
20 246058 0.28 68396 T C 138777928 Y

说明：

将第一个（

-11

）和第四个（

-24

）字段上的两个文件

file1

和

file2

连接起来

仅过滤第四个字段和第九个字段相等的行（

$4===9

）；打印这些行（

{}

）

从这些行仅打印第1到第8个字段（

-f1-8

）

这是一个很好的用例，并且：

输出：

20 246057 0.28 68363 0 A 200192457 W
20 246058 0.28 68396 T C 138777928 Y

说明：

将第一个（

-11

）和第四个（

-24

）字段上的两个文件

file1

和

file2

连接起来

仅过滤第四个字段和第九个字段相等的行（

$4===9

）；打印这些行（

{}

）

从这些行仅打印第1到第8个字段（

-f1-8

）

这是一个很好的用例，并且：

输出：

20 246057 0.28 68363 0 A 200192457 W
20 246058 0.28 68396 T C 138777928 Y

说明：

将第一个（

-11

）和第四个（

-24

）字段上的两个文件

file1

和

file2

连接起来

仅过滤第四个字段和第九个字段相等的行（

$4===9

）；打印这些行（

{}

）

从这些行仅打印第1到第8个字段（

-f1-8

）

这是一个很好的用例，并且：

输出：

20 246057 0.28 68363 0 A 200192457 W
20 246058 0.28 68396 T C 138777928 Y

说明：

将第一个（

-11

）和第四个（

-24

）字段上的两个文件

file1

和

file2

连接起来

仅过滤第四个字段和第九个字段相等的行（

$4===9

）；打印这些行（

{}

）

从这些行仅打印第1到第8个字段（

-f1-8

）

尝试将代码修改为以下内容，实际上需要在文件2中存储与之匹配的行1：

with open("file1.txt", "rb") as in_file1, open("file2.txt","rb") as in_file2:
reader1 = csv.reader((in_file1), delimiter="\t")
for row1 in reader1:
    y1 = row1[0], row1[3]
    reader2 = csv.reader((in_file2), delimiter="\t")
    for row2 in reader2:
        z = row2[-1], row2[2]
        if tuple(z) in [tuple(y1)]:
              out = "\t".join(row1)
              output.write(out+"\n")    
              out = "\t".join(row2)
              output.write(out+"\n")

尝试将代码修改为以下内容，实际上需要在文件2中存储与之匹配的行1：

with open("file1.txt", "rb") as in_file1, open("file2.txt","rb") as in_file2:
reader1 = csv.reader((in_file1), delimiter="\t")
for row1 in reader1:
    y1 = row1[0], row1[3]
    reader2 = csv.reader((in_file2), delimiter="\t")
    for row2 in reader2:
        z = row2[-1], row2[2]
        if tuple(z) in [tuple(y1)]:
              out = "\t".join(row1)
              output.write(out+"\n")    
              out = "\t".join(row2)
              output.write(out+"\n")

尝试将代码修改为以下内容，实际上需要在文件2中存储与之匹配的行1：

with open("file1.txt", "rb") as in_file1, open("file2.txt","rb") as in_file2:
reader1 = csv.reader((in_file1), delimiter="\t")
for row1 in reader1:
    y1 = row1[0], row1[3]
    reader2 = csv.reader((in_file2), delimiter="\t")
    for row2 in reader2:
        z = row2[-1], row2[2]
        if tuple(z) in [tuple(y1)]:
              out = "\t".join(row1)
              output.write(out+"\n")    
              out = "\t".join(row2)
              output.write(out+"\n")

尝试将代码修改为以下内容，实际上需要在文件2中存储与之匹配的行1：

with open("file1.txt", "rb") as in_file1, open("file2.txt","rb") as in_file2:
reader1 = csv.reader((in_file1), delimiter="\t")
for row1 in reader1:
    y1 = row1[0], row1[3]
    reader2 = csv.reader((in_file2), delimiter="\t")
    for row2 in reader2:
        z = row2[-1], row2[2]
        if tuple(z) in [tuple(y1)]:
              out = "\t".join(row1)
              output.write(out+"\n")    
              out = "\t".join(row2)
              output.write(out+"\n")

以下是我从头开始编写的解决方案：

f1 = file("file1.txt")
f2 = file("file2.txt")
d = {}
while True:
  line = f1.readline()
  if not line:
    break
  c0,c1,c2,c3,c4,c5 = line.split()
  d[(c0,c3)] = (c0,c1,c2,c3,c4,c5)
while True:
  line = f2.readline()
  if not line:
    break
  c0,c1,c2,c3 = line.split()
  if (c3,c2) in d:
    vals = d[(c3,c2)]
    print c3,vals[1],vals[2],vals[3],vals[4],vals[5],c0,c1

它读取第一个文件，并使用

tuple

键将值存储到

dict

中。然后读取第二个文件，并检查字典中是否存在

tuple

键。如果是，它将打印所有数据

请注意，在程序的最终工作版本中，您也必须记住关闭文件。为简洁起见，我省略了行以关闭文件。

以下是我从头开始编写的解决方案：

f1 = file("file1.txt")
f2 = file("file2.txt")
d = {}
while True:
  line = f1.readline()
  if not line:
    break
  c0,c1,c2,c3,c4,c5 = line.split()
  d[(c0,c3)] = (c0,c1,c2,c3,c4,c5)
while True:
  line = f2.readline()
  if not line:
    break
  c0,c1,c2,c3 = line.split()
  if (c3,c2) in d:
    vals = d[(c3,c2)]
    print c3,vals[1],vals[2],vals[3],vals[4],vals[5],c0,c1

它读取第一个文件，并使用

tuple

键将值存储到

dict

中。然后读取第二个文件，并检查字典中是否存在

tuple

键。如果是，它将打印所有数据

请注意，在程序的最终工作版本中，您也必须记住关闭文件。为简洁起见，我省略了行以关闭文件。

以下是我从头开始编写的解决方案：

f1 = file("file1.txt")
f2 = file("file2.txt")
d = {}
while True:
  line = f1.readline()
  if not line:
    break
  c0,c1,c2,c3,c4,c5 = line.split()
  d[(c0,c3)] = (c0,c1,c2,c3,c4,c5)
while True:
  line = f2.readline()
  if not line:
    break
  c0,c1,c2,c3 = line.split()
  if (c3,c2) in d:
    vals = d[(c3,c2)]
    print c3,vals[1],vals[2],vals[3],vals[4],vals[5],c0,c1

它读取第一个文件，并使用

tuple

键将值存储到

dict

中。然后读取第二个文件，并检查字典中是否存在

tuple

键。如果是，它将打印所有数据

请注意，在程序的最终工作版本中，您也必须记住关闭文件。为简洁起见，我省略了行以关闭文件。

以下是我从头开始编写的解决方案：

f1 = file("file1.txt")
f2 = file("file2.txt")
d = {}
while True:
  line = f1.readline()
  if not line:
    break
  c0,c1,c2,c3,c4,c5 = line.split()
  d[(c0,c3)] = (c0,c1,c2,c3,c4,c5)
while True:
  line = f2.readline()
  if not line:
    break
  c0,c1,c2,c3 = line.split()
  if (c3,c2) in d:
    vals = d[(c3,c2)]
    print c3,vals[1],vals[2],vals[3],vals[4],vals[5],c0,c1

它读取第一个文件，并使用

tuple

键将值存储到

dict

中。然后读取第二个文件，并检查字典中是否存在

tuple

键。如果是，它将打印所有数据

请注意，在程序的最终工作版本中，您也必须记住关闭文件。为简洁起见，我省略了用于关闭文件的行。

使用带元组键的dict？您能给我举个例子吗？对不起，我对编程很陌生是的，我给了你一个例子。事实上，我给了你一个完整的程序版本作为例子很抱歉，我的连接速度很慢，我将立即检查。谢谢：）使用带元组键的dict？你能给我举个例子吗？对不起，我对编程很陌生是的，我给了你一个例子。事实上，我给了你一个完整的程序版本作为例子很抱歉，我的连接速度很慢，我将立即检查。谢谢：）使用带元组键的dict？你能给我举个例子吗？对不起，我对编程很陌生是的，我给了你一个例子。事实上，我给了你一个完整的程序版本作为例子很抱歉，我的连接速度很慢，我将立即检查。谢谢：）使用带元组键的dict？你能给我举个例子吗？对不起，我对编程很陌生是的，我给了你一个例子。事实上