Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/shell/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 忽略某些特定字符的差异_Python_Shell_Unix_Diff_Cut - Fatal编程技术网

Python 忽略某些特定字符的差异

Python 忽略某些特定字符的差异,python,shell,unix,diff,cut,Python,Shell,Unix,Diff,Cut,我有两个巨大的文本文件,大小从1到5 GB,我必须使用shell命令计算它们之间的差异。 问题是,对于这些文件的每一行,我必须忽略特定位置的一些字符 file1 = open("FILE1.TXT", "r") a={} for line1 in file1: str = line1[:59] + line1[68:-1] a[str] = 1

我有两个巨大的文本文件,大小从1到5 GB,我必须使用shell命令计算它们之间的差异。 问题是,对于这些文件的每一行,我必须忽略特定位置的一些字符

            file1 = open("FILE1.TXT", "r")
            a={}
            for line1 in file1:
                str = line1[:59] + line1[68:-1]
                a[str] = 1
            file1.close()

            file2 = open("FILE2.TXT", "r")
            out = open("OUTPUT.TXT", "w")
            for line2 in file2:
                str = line2[:59] + line2[68:-1]
                if not a.has_key(str):
                    out.write(line2[:-1])
            out.close()
            file2.close()
我第一次使用diff时,从两个文件中删除了我必须忽略的内容:

            file1 = open("FILE1.TXT", "r")
            a={}
            for line1 in file1:
                str = line1[:59] + line1[68:-1]
                a[str] = 1
            file1.close()

            file2 = open("FILE2.TXT", "r")
            out = open("OUTPUT.TXT", "w")
            for line2 in file2:
                str = line2[:59] + line2[68:-1]
                if not a.has_key(str):
                    out.write(line2[:-1])
            out.close()
            file2.close()
diffdiff可能不是合适的工具,因为您只对比较每行的一部分感兴趣,并且只希望从第二个文件输出。您需要编写自己的比较脚本,这会更容易,因为您只对每个文件中对应行之间的差异感兴趣。Python中的一个示例:

with open("FILE1.TXT", "r") as f1:
    with open("FILE2.TXT", "r") as f2:
        for line1, line2 in zip(f1, f2):
            if (line1[:57] != line2[:57] or
                line1[68:] != line2[68:]):
                print line2
            file1 = open("FILE1.TXT", "r")
            a={}
            for line1 in file1:
                str = line1[:59] + line1[68:-1]
                a[str] = 1
            file1.close()

            file2 = open("FILE2.TXT", "r")
            out = open("OUTPUT.TXT", "w")
            for line2 in file2:
                str = line2[:59] + line2[68:-1]
                if not a.has_key(str):
                    out.write(line2[:-1])
            out.close()
            file2.close()

多亏了python提示,我做到了:

            file1 = open("FILE1.TXT", "r")
            a={}
            for line1 in file1:
                str = line1[:59] + line1[68:-1]
                a[str] = 1
            file1.close()

            file2 = open("FILE2.TXT", "r")
            out = open("OUTPUT.TXT", "w")
            for line2 in file2:
                str = line2[:59] + line2[68:-1]
                if not a.has_key(str):
                    out.write(line2[:-1])
            out.close()
            file2.close()
两个2.8GB的大文件大约需要20秒

            file1 = open("FILE1.TXT", "r")
            a={}
            for line1 in file1:
                str = line1[:59] + line1[68:-1]
                a[str] = 1
            file1.close()

            file2 = open("FILE2.TXT", "r")
            out = open("OUTPUT.TXT", "w")
            for line2 in file2:
                str = line2[:59] + line2[68:-1]
                if not a.has_key(str):
                    out.write(line2[:-1])
            out.close()
            file2.close()

谢谢大家

由于您在diff看到字符之前就将其切掉,因此很明显,输出中缺少这些字符的原因。如果它们是一样的,那么你就不必忽视它们。由于它们不同,您希望在输出中看到哪些?我刚刚添加了一个示例,现在很清楚了?我必须在输出中看到第二个文件的行…结果不正确,因为我在第二个文件中有一些新的行,它们不包含在第一个文件中。我现在正在更新示例rigth
            file1 = open("FILE1.TXT", "r")
            a={}
            for line1 in file1:
                str = line1[:59] + line1[68:-1]
                a[str] = 1
            file1.close()

            file2 = open("FILE2.TXT", "r")
            out = open("OUTPUT.TXT", "w")
            for line2 in file2:
                str = line2[:59] + line2[68:-1]
                if not a.has_key(str):
                    out.write(line2[:-1])
            out.close()
            file2.close()