Python-尝试比较两个格式不同的文本文件
文件1的格式如下:Python-尝试比较两个格式不同的文本文件,python,Python,文件1的格式如下: 1111111111 2222222222 3333333333:4444444444 1111111111:2222222222 # Warning: Untested code ahead with open("file1", "r") as f1: # First, get the set of all the values in file 1 # Sets use hash tables under the covers so this shoul
1111111111
2222222222
3333333333:4444444444
1111111111:2222222222
# Warning: Untested code ahead
with open("file1", "r") as f1:
# First, get the set of all the values in file 1
# Sets use hash tables under the covers so this should
# be fast enough for our use case (assuming sizes less than
# the total memory available on the system)
keys = set(f1.read().splitlines())
# Since we can't write back into the same file as we read through it
# we'll pipe the valid lines into a new file
with open("file2", "r") as f2:
with open("filtered_file", "w") as dest:
for line in f2:
line = line.strip() # Remove newline
# ASSUMPTION: All lines in file 2 have a colon
if line.split(":")[1] in keys:
continue
else:
dest.writeline(line)
文件2的格式如下:
1111111111
2222222222
3333333333:4444444444
1111111111:2222222222
# Warning: Untested code ahead
with open("file1", "r") as f1:
# First, get the set of all the values in file 1
# Sets use hash tables under the covers so this should
# be fast enough for our use case (assuming sizes less than
# the total memory available on the system)
keys = set(f1.read().splitlines())
# Since we can't write back into the same file as we read through it
# we'll pipe the valid lines into a new file
with open("file2", "r") as f2:
with open("filtered_file", "w") as dest:
for line in f2:
line = line.strip() # Remove newline
# ASSUMPTION: All lines in file 2 have a colon
if line.split(":")[1] in keys:
continue
else:
dest.writeline(line)
我正在试图找出一种方法来获取文件1中的内容,看看它是否只与文件2中冒号右侧的内容匹配。最终目标是删除文件2中的完整行(如果存在匹配项)
我知道我可以使用标准命令剪切文件2,使其格式完全相同。问题是我需要的是88888:99999格式的完成文件,它似乎太复杂了,无法将它们分割成正确的顺序
我尝试过嵌套循环、正则表达式、集合、列表,我的脑袋都快晕过去了
我希望这是有道理的。提前谢谢
Traceback (most recent call last):
File "test.py", line 17, in <module>
if line.split(":")[1] in keys:
IndexError: list index out of range
回溯(最近一次呼叫最后一次):
文件“test.py”,第17行,在
如果键中的行.split(“:”[1]:
索引器:列表索引超出范围
假设要删除文件2中的行,如果该行的第二部分与文件1中的任何值匹配,则执行以下操作:
1111111111
2222222222
3333333333:4444444444
1111111111:2222222222
# Warning: Untested code ahead
with open("file1", "r") as f1:
# First, get the set of all the values in file 1
# Sets use hash tables under the covers so this should
# be fast enough for our use case (assuming sizes less than
# the total memory available on the system)
keys = set(f1.read().splitlines())
# Since we can't write back into the same file as we read through it
# we'll pipe the valid lines into a new file
with open("file2", "r") as f2:
with open("filtered_file", "w") as dest:
for line in f2:
line = line.strip() # Remove newline
# ASSUMPTION: All lines in file 2 have a colon
if line.split(":")[1] in keys:
continue
else:
dest.writeline(line)
这就是如何在文件2中将元素正确地放在冒号上的方法。也许不是最干净的,但你明白了
str2 = open(file2).read()
righttocolon = [s.split(":")[1] for s in [ln for ln in str2.split("\n")] if len(s.split(":")) == 2]
您的意思是,如果文件1中的任何行与冒号右侧的组件或匹配的行相匹配,则应删除文件2中的行吗?是的。我们基本上可以完全忽略冒号左边的内容,但是如果文件1中的一行在文件2中的任何位置,那么应该删除文件2中的整行……我认为这是正确的算法。[1]在if line.split(“:”[1]”中做什么?@Craig-
line.split(“:”
将字符串拆分为字符串数组(无“:”字符并拆分)<代码>[1]访问数组中的第二个条目(在大多数编程语言中,数组索引从零开始)。因此“1111111:2222222”->
[“1111111111”,“2222222”]
并获取数组中的第二个元素将生成“2222222”-然后我们将其与不允许的值列表进行核对。感谢您的帮助。我假设该行应该是键中的if line.split(“:”,1):如果是,我将得到一个不可修复的列表错误。我感谢你的帮助@Craig-不,应该是line.split(“:”[1]
-你能编辑你的问题以显示你正在使用的代码并包括你得到的堆栈跟踪吗?Sean-我使用的正是你发布的代码,只是文件名不同而已。堆栈跟踪张贴在上面。。。。。