Python消除第n列不包含的行';不存在于另一个文件的第m列中
我有两个标签分隔的txt文件,UTF-8没有BOM表 1.txt 2.txt 命令 python消除.py 2.txt 2 1.txt 1 output.txt中的_行 将意味着:如果2.txt第2列的某个元素在1.txt的第1列中不存在,则删除该元素的行 因此,output.txt将 2.txt 我一直在通过将这些文件在Excel中的各个列中进行排序来实现这一点,但随后文件很快变得太大 老实说,我是python的完全新手,所以我可以看到我需要的代码是这些“结构”部分Python消除第n列不包含的行';不存在于另一个文件的第m列中,python,tab-delimited,Python,Tab Delimited,我有两个标签分隔的txt文件,UTF-8没有BOM表 1.txt 2.txt 命令 python消除.py 2.txt 2 1.txt 1 output.txt中的_行 将意味着:如果2.txt第2列的某个元素在1.txt的第1列中不存在,则删除该元素的行 因此,output.txt将 2.txt 我一直在通过将这些文件在Excel中的各个列中进行排序来实现这一点,但随后文件很快变得太大 老实说,我是python的完全新手,所以我可以看到我需要的代码是这些“结构”部分 import codecs
import codecs
import sys
input_file = sys.argv[1]
input_column = sys.argv[2]
match_file = sys.argv[3]
match_column = sys.argv[4]
output_file = sys.argv[5]
ifile = codecs.open(input_file, encoding = 'utf-8', mode="rb")
ofile = codecs.open(output_file, encoding = 'utf-8', mode="wb")
for line in ifile:
????????
ofile.write(line)
ifile.close()
ofile.close()
============================================
马蒂诺的第一个解决方案产生
2 A
2 X
而不是
2 A
2 X
能修好吗?这里有一些东西可以让你开始。这是不完整的,但希望它能指引你正确的方向
import csv
# set stores only unique values to add items use .add(item)
first_column_items = set()
# load 1st column of 1.txt items into first_column_items
# open 2.txt as infile_two
with open("outfile.txt", "wb") as out_f:
writer = csv.writer(out_f)
desired_column_idx = 1 # indexes are zero start
for row in infile_two:
column_value = row[desired_column_idx]
if column_value in first_column_items:
outfile.writerow(row)
您可以使用
csv
模块来读取和写入文件,但在这种情况下,这并不是必需的,因为您自己做这件事相对简单。请注意,在Python中,一行上的行和值(列)的索引是以零为基础的,因此第一列对应于列编号0
,第二列对应于1
,等等。行的情况也是如此
import codecs
import sys
input_file_name = sys.argv[1]
input_column_index = int(sys.argv[2]) - 1
match_file_name = sys.argv[3]
match_column_index = int(sys.argv[4]) - 1
output_file_name = sys.argv[5]
# create a set of all unique values in the match_column of match_file_name
matching_values = set()
with codecs.open(match_file_name, encoding='utf-8', mode="rb") as match_file:
for cols in (line.split() for line in match_file):
matching_values.add(cols[match_column_index])
with codecs.open(output_file_name, encoding='utf-8', mode="wb") as output_file:
# copy lines from input_file to output file whose value in the input_column
# is one of the ones in the match column of the match_file
with codecs.open(input_file_name, encoding='utf-8', mode="rb") as input_file:
for line in input_file:
cols = line.split()
if cols[input_column_index] in matching_values:
output_file.write(line)
您可能想查看argparse和csv。非常感谢。您的解决方案只是在原来存在已删除行的位置留下一个空行。我编辑了我的问题以澄清我的意思。这是固定的吗?空白行在那里,因为代码将不必要的新行添加到未被删除的行中,但是我的最后一个编辑应该修复它。我还重新安排了一些事情的执行,使其更加符合逻辑。如果我的最新更新修复了空白行问题,请考虑接受我的答案(请参阅)。
2 A
2 X
import csv
# set stores only unique values to add items use .add(item)
first_column_items = set()
# load 1st column of 1.txt items into first_column_items
# open 2.txt as infile_two
with open("outfile.txt", "wb") as out_f:
writer = csv.writer(out_f)
desired_column_idx = 1 # indexes are zero start
for row in infile_two:
column_value = row[desired_column_idx]
if column_value in first_column_items:
outfile.writerow(row)
import codecs
import sys
input_file_name = sys.argv[1]
input_column_index = int(sys.argv[2]) - 1
match_file_name = sys.argv[3]
match_column_index = int(sys.argv[4]) - 1
output_file_name = sys.argv[5]
# create a set of all unique values in the match_column of match_file_name
matching_values = set()
with codecs.open(match_file_name, encoding='utf-8', mode="rb") as match_file:
for cols in (line.split() for line in match_file):
matching_values.add(cols[match_column_index])
with codecs.open(output_file_name, encoding='utf-8', mode="wb") as output_file:
# copy lines from input_file to output file whose value in the input_column
# is one of the ones in the match column of the match_file
with codecs.open(input_file_name, encoding='utf-8', mode="rb") as input_file:
for line in input_file:
cols = line.split()
if cols[input_column_index] in matching_values:
output_file.write(line)