Python：自动更正_Python_Python 2.7

Python：自动更正

python python-2.7

Python：自动更正,python,python-2.7,Python,Python 2.7,我有两个文件check.txt和orig.txt。我想检查check.txt中的每个单词，看看它是否与orig.txt中的任何单词匹配。如果它确实匹配，那么代码应该用它的第一个匹配项替换该单词，否则它应该保持该单词的原样。但不知何故，它并没有按要求工作。请帮忙 check.txt如下所示： ukrain troop force ukraine cnn should stop pretending & announce: we will not report news whi

我有两个文件check.txt和orig.txt。我想检查check.txt中的每个单词，看看它是否与orig.txt中的任何单词匹配。如果它确实匹配，那么代码应该用它的第一个匹配项替换该单词，否则它应该保持该单词的原样。但不知何故，它并没有按要求工作。请帮忙

check.txt如下所示：

ukrain

troop

force

ukraine cnn should stop pretending &amp; announce: we will not report news while it reflects bad on obama @bostonglobe @crowleycnn @hardball

rt @cbcnews: breaking: .@vice journalist @simonostrovsky, held in #ukraine now free and safe http://t.co/sgxbedktlu http://t.co/jduzlg6jou

russia 'outraged' at deadly shootout in east #ukraine -  moscow:... http://t.co/nqim7uk7zg
 #groundtroops #russianpresidentvladimirputin

和orig.txt看起来像：

ukrain

troop

force

ukraine cnn should stop pretending &amp; announce: we will not report news while it reflects bad on obama @bostonglobe @crowleycnn @hardball

rt @cbcnews: breaking: .@vice journalist @simonostrovsky, held in #ukraine now free and safe http://t.co/sgxbedktlu http://t.co/jduzlg6jou

russia 'outraged' at deadly shootout in east #ukraine -  moscow:... http://t.co/nqim7uk7zg
 #groundtroops #russianpresidentvladimirputin

您的代码有两个问题：

当您在

中循环单词时，每个单词仍将有一个新行字符，因此您的

in

检查不起作用

您希望对

中的每个单词迭代

orig

，但文件是迭代器，在

中的第一个单词之后会耗尽

您可以通过执行

word=word.strip（）

和

orig=list（orig）

来修复这些问题，也可以尝试以下操作：

# get all stemmed words
stemmed = [line.strip() for line in f]
# set of lowercased original words
original = set(word.lower() for line in orig for word in line.split())
# map stemmed words to unstemmed words
unstemmed = {word: None for word in stemmed}
# find original words for word stems in map
for stem in unstemmed:
    for word in original:
        if stem in word:
            unstemmed[stem] = word
print unstemmed

或更短（没有最后的双循环），使用，如注释中所建议的：

unstemmed = {word: difflib.get_close_matches(word, original, 1) for word in stemmed}

此外，请记住关闭文件，或使用关键字自动关闭文件。

“如果匹配，则代码应将该单词替换为第一个匹配的单词，否则应保留该单词的原样”应该替换什么？在原始文件或检查文件中？请注意，如果您使用“r”在“读取更多”中打开文件，则无法在该文件中写入。

对于f中的单词：对于orig中的行

如果第二个循环在行上循环，那么第一个循环会循环什么？如果您提供一些示例输入和预期输出，则会有所帮助。现在我们只是猜测。检查一下这可能有用