Python 使用字典将拼写错误的单词分配给行号

Python 使用字典将拼写错误的单词分配给行号,python,dictionary,spell-checking,Python,Dictionary,Spell Checking,这是我目前拥有的代码: from collections import defaultdict goodwords = set() with open("soccer.txt", "rt") as f: for word in f.readlines(): goodwords.add(word.strip()) badwords = defaultdict(list) with open("soccer.txt", "rt") as f: for lin

这是我目前拥有的代码:

from collections import defaultdict

goodwords = set()

with open("soccer.txt", "rt") as f:
     for word in f.readlines():
        goodwords.add(word.strip())

badwords = defaultdict(list)

with open("soccer.txt", "rt") as f:
    for line_no, line in enumerate(f):
        for word in line.split():
            if word not in text:
                badwords[word].append(line_no)

print(badwords)
如何修复代码,使其打印
单词列表中存储的错误单词和行号

例如,如果第5行和第7行的单词
拼错了,它会打印如下内容:

togeher 5 7

将新的
计数器
插入
d
时,首先检查
word
中是否包含
words
。您可能想检查
word
是否已包含在
d
中:

if word not in d:
    d[word] = [counter]
else:
    d[word].append(counter)
检查
单词是否包含在
单词中或
行中
应该是一个单独的
if

您还可以使用dicts
setdefault()
方法简化此逻辑:

d.setdefault(word, []).append(counter)
或者将
d
设置为
defaultdict
,这将进一步简化赋值:

from collections import defaultdict
d = defaultdict(list)
...
d[word].append(counter)

关于通用算法,请注意,首先迭代所有行以增加计数器,然后,当计数器已达到其最大值时,开始检查拼写错误的单词。可能您应该检查循环中增加计数器的每一行。

从您正在做的工作来看,我认为以下几点非常适合您:

from collections import defaultdict

text = ( "cat", "dog", "rat", "bat", "rat", "dog",
         "man", "woman", "child", "child") #

d = defaultdict(list)

for lineno, word in enumerate(text):
    d[word].append(lineno)

print d
这将为您提供以下输出:

defaultdict(<type 'list'>, {'bat': [3], 'woman': [7], 'dog': [1, 5],
                            'cat': [0], 'rat': [2, 4], 'child': [8, 9],
                            'man': [6]})
产生:

set(['bat', 'woman', 'dog', 'cat', 'rat', 'child', 'man'])
或者,仅打印以下文字:

for word in d.keys():
    print word
编辑3:

我认为这可能是最终版本: 这是一个(故意)非常粗糙但几乎完全的拼写检查工具

from collections import defaultdict

# Build a set of all the words we know, assuming they're one word per line
good_words = set() # Use a set, as this will have the fastest look-up time.
with open("words.txt", "rt") as f:
    for word in f.readlines():
        good_words.add(word.strip())

bad_words = defaultdict(list)

with open("text_to_check.txt", "rt") as f:
    # For every line of text, get the line number, and the text.
    for line_no, line in enumerate(f):
        # Split into seperate words - note there is an issue with punctuation,
        # case sensitivitey, etc..
        for word in line.split():
            # If the word is not recognised, record the line where it occurred.
            if word not in good_words:
                bad_words[word].append(line_no)

最后,
bad_words
将是一个字典,其中未识别的单词作为关键字,单词所在的行号作为匹配值项。

您需要格式化代码,以便我们可以运行它-我做了一个简单的格式化,但这遗漏了一些缩进,无法计算行数;使用
len(words)
。如果您遇到一个错误并想问一个问题,请告诉我们您遇到了什么错误。我得到的错误是d[word]。append(counter)KeyError:'a'文本文件实际上名为soccer.txt,但我使用的是sys.argv。我只编程了2个月,所以我不了解一切。我将if word not in words更改为if words not in d,但仍然得到错误打印(word,d[计数器])keyerror:329i有一个不正确单词的列表,我想把我的txt文件中不正确单词的行号打印到一个集合中,然后打印出helo 5 8#5和8作为txt文件中的行号,尽管任何关于如何做的建议PLZZZI实际上都有一个名为dictset=[]的正确拼写列表这是一本包含很多单词的字典,不过我会试试这个谢谢这是我有一个包含单词的TXT文件一个不正确单词的列表,我只想把不正确单词的lne编号附加到彼此上你说的工作,但我希望它打印为一个集合我有一个集合,但我只能在incorrectwords:print(inwords)这会打印一组我不正确的单词,但是我该如何对您展示给我的代码执行此操作?cheersim感谢uve所做的一切,并相信如果我再添加一件事情,它应该会起作用,而不是打印不正确单词的行号我想打印txt文件中不正确单词的行号我会添加什么?我试图在txtfile中添加if-word:???更新为一个最小但完整的示例
from collections import defaultdict

# Build a set of all the words we know, assuming they're one word per line
good_words = set() # Use a set, as this will have the fastest look-up time.
with open("words.txt", "rt") as f:
    for word in f.readlines():
        good_words.add(word.strip())

bad_words = defaultdict(list)

with open("text_to_check.txt", "rt") as f:
    # For every line of text, get the line number, and the text.
    for line_no, line in enumerate(f):
        # Split into seperate words - note there is an issue with punctuation,
        # case sensitivitey, etc..
        for word in line.split():
            # If the word is not recognised, record the line where it occurred.
            if word not in good_words:
                bad_words[word].append(line_no)