Python 忽略某些特定字符的差异
我有两个巨大的文本文件,大小从1到5 GB,我必须使用shell命令计算它们之间的差异。 问题是,对于这些文件的每一行,我必须忽略特定位置的一些字符Python 忽略某些特定字符的差异,python,shell,unix,diff,cut,Python,Shell,Unix,Diff,Cut,我有两个巨大的文本文件,大小从1到5 GB,我必须使用shell命令计算它们之间的差异。 问题是,对于这些文件的每一行,我必须忽略特定位置的一些字符 file1 = open("FILE1.TXT", "r") a={} for line1 in file1: str = line1[:59] + line1[68:-1] a[str] = 1
file1 = open("FILE1.TXT", "r")
a={}
for line1 in file1:
str = line1[:59] + line1[68:-1]
a[str] = 1
file1.close()
file2 = open("FILE2.TXT", "r")
out = open("OUTPUT.TXT", "w")
for line2 in file2:
str = line2[:59] + line2[68:-1]
if not a.has_key(str):
out.write(line2[:-1])
out.close()
file2.close()
我第一次使用diff时,从两个文件中删除了我必须忽略的内容:
file1 = open("FILE1.TXT", "r")
a={}
for line1 in file1:
str = line1[:59] + line1[68:-1]
a[str] = 1
file1.close()
file2 = open("FILE2.TXT", "r")
out = open("OUTPUT.TXT", "w")
for line2 in file2:
str = line2[:59] + line2[68:-1]
if not a.has_key(str):
out.write(line2[:-1])
out.close()
file2.close()
diffdiff可能不是合适的工具,因为您只对比较每行的一部分感兴趣,并且只希望从第二个文件输出。您需要编写自己的比较脚本,这会更容易,因为您只对每个文件中对应行之间的差异感兴趣。Python中的一个示例:
with open("FILE1.TXT", "r") as f1:
with open("FILE2.TXT", "r") as f2:
for line1, line2 in zip(f1, f2):
if (line1[:57] != line2[:57] or
line1[68:] != line2[68:]):
print line2
file1 = open("FILE1.TXT", "r")
a={}
for line1 in file1:
str = line1[:59] + line1[68:-1]
a[str] = 1
file1.close()
file2 = open("FILE2.TXT", "r")
out = open("OUTPUT.TXT", "w")
for line2 in file2:
str = line2[:59] + line2[68:-1]
if not a.has_key(str):
out.write(line2[:-1])
out.close()
file2.close()
多亏了python提示,我做到了:
file1 = open("FILE1.TXT", "r")
a={}
for line1 in file1:
str = line1[:59] + line1[68:-1]
a[str] = 1
file1.close()
file2 = open("FILE2.TXT", "r")
out = open("OUTPUT.TXT", "w")
for line2 in file2:
str = line2[:59] + line2[68:-1]
if not a.has_key(str):
out.write(line2[:-1])
out.close()
file2.close()
两个2.8GB的大文件大约需要20秒
file1 = open("FILE1.TXT", "r")
a={}
for line1 in file1:
str = line1[:59] + line1[68:-1]
a[str] = 1
file1.close()
file2 = open("FILE2.TXT", "r")
out = open("OUTPUT.TXT", "w")
for line2 in file2:
str = line2[:59] + line2[68:-1]
if not a.has_key(str):
out.write(line2[:-1])
out.close()
file2.close()
谢谢大家 由于您在diff看到字符之前就将其切掉,因此很明显,输出中缺少这些字符的原因。如果它们是一样的,那么你就不必忽视它们。由于它们不同,您希望在输出中看到哪些?我刚刚添加了一个示例,现在很清楚了?我必须在输出中看到第二个文件的行…结果不正确,因为我在第二个文件中有一些新的行,它们不包含在第一个文件中。我现在正在更新示例rigth
file1 = open("FILE1.TXT", "r")
a={}
for line1 in file1:
str = line1[:59] + line1[68:-1]
a[str] = 1
file1.close()
file2 = open("FILE2.TXT", "r")
out = open("OUTPUT.TXT", "w")
for line2 in file2:
str = line2[:59] + line2[68:-1]
if not a.has_key(str):
out.write(line2[:-1])
out.close()
file2.close()