用python比较两个文件_Python_Python 3.x

用python比较两个文件

python python-3.x

用python比较两个文件,python,python-3.x,Python,Python 3.x,好的，我有一个学校作业，我需要比较两个文件。它非常简单，程序需要显示这两个文件中的所有唯一单词，例如文件1：这是一个测试文件2：这不是一个测试输出： [“这”、“是”、“a”、“测试”、“不是”] 这就是我对这段代码的预期输出： def unique_words(file_1, file_2): unique_words_list = [] for word in file_1: unique_words_list.append(word) fo

好的，我有一个学校作业，我需要比较两个文件。它非常简单，程序需要显示这两个文件中的所有唯一单词，例如

文件1：这是一个测试

文件2：这不是一个测试

输出： [“这”、“是”、“a”、“测试”、“不是”]

这就是我对这段代码的预期输出：

def unique_words(file_1, file_2):
    unique_words_list = []
    for word in file_1:
        unique_words_list.append(word)
    for word in file_2:
        if word not in file_1:
            unique_words_list.append(word)
    return unique_words_list

但这并没有发生，可悲的是，这是输出：

set(['and', 'lemon', 'the', 'lime', 'dog', 'cat'])
set(['and', 'like', 'bunny', 'the', 'really', 'mouse', 'dogs', 'meat'])
set(['and', 'lemon', 'like', 'mouse', 'dog', 'cat', 'bunny', 'the', 'really', 'meat', 'dogs', 'lime'])

['this\n'，'is\n'，'a\n'，'test'，'this\n'，'is\n'，'not\n'，'a\n'，'test']

我有多个功能，它们的工作方式基本相同，输出也类似。我知道\n出现的原因，但我不知道如何摆脱它。

如果有人能帮我获得正确的输出，那将是一个很大的帮助：）

下面是我在重用部分代码时编写的一个小片段：

#!/usr/bin/env python3.6

with open('file1.txt', 'r') as file1, open('file2.txt', 'r') as file2:
    file_1 = file1.readlines()
    file_1 = [line.rstrip() for line in file_1]
    file_2 = file2.readlines()
    file_2 = [line.rstrip() for line in file_2]


def unique_words(file_1, file_2):
    unique_words_list = file_1
    for word in file_2:
        if word not in unique_words_list:
            unique_words_list.append(word)
    return unique_words_list


print(unique_words(file_1, file_2))

此脚本假定您有两个文件，分别名为

file1.txt

和

file2.txt

，它们与脚本位于同一目录中。从您的示例中，我们还假设每个单词都在自己的行上。这里有一个步行通道：

打开这两个文件并将它们的行读入列表，使用列表删除换行符

定义一个函数，将第一个文件中的所有单词添加到列表中，然后将第二个文件中不在该列表中的所有单词添加到列表中

使用前面读入的文件作为输入打印该函数的输出

Steampunkery的解决方案是不正确的：（1）它不处理每行超过1个单词的文件，（2）它不考虑file1.txt中的重复单词（使用file1行“word”试试——应该得到一个“word”输出，但得到四个）。此外，不需要为/if构造


这是一个简洁而正确的解决方案
file1.txt的内容：
the cat and the dog
the lime and the lemon

file2.txt的内容：
the mouse and the bunny
dogs really like meat

守则：
def unique(infiles):
    words = set()
    for infile in infiles:
        words.update(set([y for x in [l.strip().split() for l in open(infile, 'r').readlines()] for y in x]))
    return words

print unique(['file1.txt'])
print unique(['file2.txt'])
print unique(['file1.txt', 'file2.txt',])

输出：
set(['and', 'lemon', 'the', 'lime', 'dog', 'cat'])
set(['and', 'like', 'bunny', 'the', 'really', 'mouse', 'dogs', 'meat'])
set(['and', 'lemon', 'like', 'mouse', 'dog', 'cat', 'bunny', 'the', 'really', 'meat', 'dogs', 'lime'])

Python学习者的两个课程：
使用语言提供的工具，如set
考虑破坏算法的输入条件
对不起，作业特别告诉我要使用列表：IShit，这确实有效。有一个a/n，因为文件的每个单词都被设置在一个单独的行上，因为我只知道如何在行上循环。你能给我解释一下为什么比较文件不起作用吗？在我的机器上没问题……啊，谢谢：）我想我可以用这个小片段来做！检查另一个答案哦，哇，你是对的，我甚至没有注意到。我会将你发送的内容转换成我自己的代码，非常感谢！