Python文本文档翻译比较_Python_For Loop_Compare

Python文本文档翻译比较

python for-loop

Python文本文档翻译比较,python,for-loop,compare,Python,For Loop,Compare,这个问题相当简单。我正在尝试创建一个“翻译比较”程序，它读取并比较两个文档，然后返回其他文档中没有的每个单词。这是为初学者编写的，所以我尽量避免使用晦涩难懂的内部方法，即使这意味着代码效率较低。这就是我目前所拥有的 def translation_comparison(): import re file1 = open("Desktop/file1.txt","r") file2 = open("Desktop/file2.txt","r") text1 = file1.

这个问题相当简单。我正在尝试创建一个“翻译比较”程序，它读取并比较两个文档，然后返回其他文档中没有的每个单词。这是为初学者编写的，所以我尽量避免使用晦涩难懂的内部方法，即使这意味着代码效率较低。这就是我目前所拥有的

def translation_comparison():
   import re
   file1 = open("Desktop/file1.txt","r")
   file2 = open("Desktop/file2.txt","r")
   text1 = file1.read()
   text2 = file2.read()
   text1 = re.findall(r'\w+',text1)
   text2 = re.findall(r'\w+',text2)
   for item in text2:
       if item not in text1:
           return item

你可以试试这样的

#######Test data
#file1.txt = this is a test
#file2.txt = this a test
#results#
#is

def translation_comparison():
    with open("file1.txt", 'r') as f1:
        f1 = f1.read().split()
    with open("file2.txt", 'r') as f2:
        f2 = f2.read().split()

    for word in f1:
        if word not in f2:
            print(word)


translation_comparison()

此外，这也是一个很好的实践

with open("file1.txt", 'r') as f1:
        f1 =f1.read().split()

因为当使用with打开文件时，它会在您不使用时关闭文件。Python非常擅长释放和管理内存，但确保释放内存或调用

file1.close()

完成后。

假设需要逐字比较，例如

abc

与

bac

将返回

和

，然后

和

（与原始代码中的

None

相反）

注意，这适用于小文件，但文件越大，所需时间越长。for循环在python中以c的速度运行，但这并不意味着它不会在一个大文件中花费很长时间。这只是为了演示。谢谢。这很有帮助。唯一的问题是，两个示例都只返回第一个实例，而不是所有实例，其中一个单词不在另一个文本文档中。现在它将只是打印出来。我今天想到的一件事，上面的方法是有效的，但它将附加所有未找到单词的实例。修复此问题的最佳方法是将创建的列表转换为集合。

import string
import itertools

class FileExhausted(Exception): pass

def read_by_word(file):
    def read_word():
        while True:
            l = file.read(1)
            if l:
                if l in string.whitespace:
                    break
                yield l
            else:
                raise FileExhausted

    while True:
        this_word_gen = read_word()
        try:
            this_word = "".join(this_word_gen)
        except FileExhausted:
            break
        else:
            if this_word:
                yield this_word

def translation_comparison():
    with open("file1.txt") as file1, open("file2.txt") as file2:
        words1 = read_by_word(file1)
        words2 = read_by_word(file2)

        for (word1, word2) in itertools.zip_longest(words1, words2, fillvalue=None):
            if word1 != word2:
                yield (word1, word2)