Python消除第n列不包含的行'；不存在于另一个文件的第m列中_Python_Tab Delimited

Python消除第n列不包含的行'；不存在于另一个文件的第m列中

python

Python消除第n列不包含的行'；不存在于另一个文件的第m列中,python,tab-delimited,Python,Tab Delimited,我有两个标签分隔的txt文件，UTF-8没有BOM表 1.txt 2.txt 命令 python消除.py 2.txt 2 1.txt 1 output.txt中的_行将意味着：如果2.txt第2列的某个元素在1.txt的第1列中不存在，则删除该元素的行因此，output.txt将 2.txt 我一直在通过将这些文件在Excel中的各个列中进行排序来实现这一点，但随后文件很快变得太大老实说，我是python的完全新手，所以我可以看到我需要的代码是这些“结构”部分 import codecs

我有两个标签分隔的txt文件，UTF-8没有BOM表

1.txt

2.txt

命令

python消除.py 2.txt 2 1.txt 1 output.txt中的_行

将意味着：如果2.txt第2列的某个元素在1.txt的第1列中不存在，则删除该元素的行

因此，output.txt将

2.txt

我一直在通过将这些文件在Excel中的各个列中进行排序来实现这一点，但随后文件很快变得太大

老实说，我是python的完全新手，所以我可以看到我需要的代码是这些“结构”部分

import codecs
import sys
input_file = sys.argv[1]
input_column = sys.argv[2]
match_file = sys.argv[3]
match_column = sys.argv[4]
output_file = sys.argv[5]

ifile = codecs.open(input_file, encoding = 'utf-8', mode="rb")
ofile = codecs.open(output_file, encoding = 'utf-8', mode="wb")

for line in ifile:
????????
ofile.write(line)

ifile.close()
ofile.close()

============================================

马蒂诺的第一个解决方案产生

2   A

2   X

而不是

2   A
2   X

能修好吗？

这里有一些东西可以让你开始。这是不完整的，但希望它能指引你正确的方向

import csv

# set stores only unique values to add items use .add(item)
first_column_items = set()
# load 1st column of 1.txt items into first_column_items 
# open 2.txt as infile_two

with open("outfile.txt", "wb") as out_f:
    writer = csv.writer(out_f)
    desired_column_idx = 1 # indexes are zero start
    for row in infile_two:
        column_value = row[desired_column_idx]
        if column_value in first_column_items:
            outfile.writerow(row)

您可以使用

csv

模块来读取和写入文件，但在这种情况下，这并不是必需的，因为您自己做这件事相对简单。请注意，在Python中，一行上的行和值（列）的索引是以零为基础的，因此第一列对应于列编号

，第二列对应于

，等等。行的情况也是如此

import codecs
import sys

input_file_name    = sys.argv[1]
input_column_index = int(sys.argv[2]) - 1
match_file_name    = sys.argv[3]
match_column_index = int(sys.argv[4]) - 1
output_file_name   = sys.argv[5]

# create a set of all unique values in the match_column of match_file_name
matching_values = set()
with codecs.open(match_file_name, encoding='utf-8', mode="rb") as match_file:
    for cols in (line.split() for line in match_file):
        matching_values.add(cols[match_column_index])

with codecs.open(output_file_name, encoding='utf-8', mode="wb") as output_file:
    # copy lines from input_file to output file whose value in the input_column
    # is one of the ones in the match column of the match_file
    with codecs.open(input_file_name, encoding='utf-8', mode="rb") as input_file:
        for line in input_file:
            cols = line.split()
            if cols[input_column_index] in matching_values:
                output_file.write(line)

您可能想查看argparse和csv。非常感谢。您的解决方案只是在原来存在已删除行的位置留下一个空行。我编辑了我的问题以澄清我的意思。这是固定的吗？空白行在那里，因为代码将不必要的新行添加到未被删除的行中，但是我的最后一个编辑应该修复它。我还重新安排了一些事情的执行，使其更加符合逻辑。如果我的最新更新修复了空白行问题，请考虑接受我的答案（请参阅）。

2   A
2   X

import csv

# set stores only unique values to add items use .add(item)
first_column_items = set()
# load 1st column of 1.txt items into first_column_items 
# open 2.txt as infile_two

with open("outfile.txt", "wb") as out_f:
    writer = csv.writer(out_f)
    desired_column_idx = 1 # indexes are zero start
    for row in infile_two:
        column_value = row[desired_column_idx]
        if column_value in first_column_items:
            outfile.writerow(row)

import codecs
import sys

input_file_name    = sys.argv[1]
input_column_index = int(sys.argv[2]) - 1
match_file_name    = sys.argv[3]
match_column_index = int(sys.argv[4]) - 1
output_file_name   = sys.argv[5]

# create a set of all unique values in the match_column of match_file_name
matching_values = set()
with codecs.open(match_file_name, encoding='utf-8', mode="rb") as match_file:
    for cols in (line.split() for line in match_file):
        matching_values.add(cols[match_column_index])

with codecs.open(output_file_name, encoding='utf-8', mode="wb") as output_file:
    # copy lines from input_file to output file whose value in the input_column
    # is one of the ones in the match column of the match_file
    with codecs.open(input_file_name, encoding='utf-8', mode="rb") as input_file:
        for line in input_file:
            cols = line.split()
            if cols[input_column_index] in matching_values:
                output_file.write(line)