Python 使用字典将拼写错误的单词分配给行号_Python_Dictionary_Spell Checking

Python 使用字典将拼写错误的单词分配给行号

python dictionary

Python 使用字典将拼写错误的单词分配给行号,python,dictionary,spell-checking,Python,Dictionary,Spell Checking,这是我目前拥有的代码： from collections import defaultdict goodwords = set() with open("soccer.txt", "rt") as f: for word in f.readlines(): goodwords.add(word.strip()) badwords = defaultdict(list) with open("soccer.txt", "rt") as f: for lin

这是我目前拥有的代码：

from collections import defaultdict

goodwords = set()

with open("soccer.txt", "rt") as f:
     for word in f.readlines():
        goodwords.add(word.strip())

badwords = defaultdict(list)

with open("soccer.txt", "rt") as f:
    for line_no, line in enumerate(f):
        for word in line.split():
            if word not in text:
                badwords[word].append(line_no)

print(badwords)

如何修复代码，使其打印

单词列表中存储的错误单词和行号
例如，如果第5行和第7行的单词拼错了，它会打印如下内容：
togeher 5 7

将新的计数器
插入d
时，首先检查word
中是否包含words
。您可能想检查word
是否已包含在d
中：
if word not in d:
    d[word] = [counter]
else:
    d[word].append(counter)

检查单词是否包含在单词中或行中
应该是一个单独的if

您还可以使用dictssetdefault（）
方法简化此逻辑：
d.setdefault(word, []).append(counter)

或者将d
设置为defaultdict
，这将进一步简化赋值：
from collections import defaultdict
d = defaultdict(list)
...
d[word].append(counter)

关于通用算法，请注意，首先迭代所有行以增加计数器，然后，当计数器已达到其最大值时，开始检查拼写错误的单词。可能您应该检查循环中增加计数器的每一行。
从您正在做的工作来看，我认为以下几点非常适合您：
from collections import defaultdict

text = ( "cat", "dog", "rat", "bat", "rat", "dog",
         "man", "woman", "child", "child") #

d = defaultdict(list)

for lineno, word in enumerate(text):
    d[word].append(lineno)

print d

这将为您提供以下输出：
defaultdict(<type 'list'>, {'bat': [3], 'woman': [7], 'dog': [1, 5],
                            'cat': [0], 'rat': [2, 4], 'child': [8, 9],
                            'man': [6]})

产生：
set(['bat', 'woman', 'dog', 'cat', 'rat', 'child', 'man'])

或者，仅打印以下文字：
for word in d.keys():
    print word

编辑3:
我认为这可能是最终版本：
这是一个（故意）非常粗糙但几乎完全的拼写检查工具
from collections import defaultdict

# Build a set of all the words we know, assuming they're one word per line
good_words = set() # Use a set, as this will have the fastest look-up time.
with open("words.txt", "rt") as f:
    for word in f.readlines():
        good_words.add(word.strip())

bad_words = defaultdict(list)

with open("text_to_check.txt", "rt") as f:
    # For every line of text, get the line number, and the text.
    for line_no, line in enumerate(f):
        # Split into seperate words - note there is an issue with punctuation,
        # case sensitivitey, etc..
        for word in line.split():
            # If the word is not recognised, record the line where it occurred.
            if word not in good_words:
                bad_words[word].append(line_no)

最后，bad_words
将是一个字典，其中未识别的单词作为关键字，单词所在的行号作为匹配值项。
您需要格式化代码，以便我们可以运行它-我做了一个简单的格式化，但这遗漏了一些缩进，无法计算行数；使用len（words）。如果您遇到一个错误并想问一个问题，请告诉我们您遇到了什么错误。我得到的错误是d[word]。append（counter）KeyError:'a'文本文件实际上名为soccer.txt，但我使用的是sys.argv。我只编程了2个月，所以我不了解一切。我将if word not in words更改为if words not in d，但仍然得到错误打印（word，d[计数器]）keyerror:329i有一个不正确单词的列表，我想把我的txt文件中不正确单词的行号打印到一个集合中，然后打印出helo 5 8#5和8作为txt文件中的行号，尽管任何关于如何做的建议PLZZZI实际上都有一个名为dictset=[]的正确拼写列表这是一本包含很多单词的字典，不过我会试试这个谢谢这是我有一个包含单词的TXT文件一个不正确单词的列表，我只想把不正确单词的lne编号附加到彼此上你说的工作，但我希望它打印为一个集合我有一个集合，但我只能在incorrectwords:print（inwords）这会打印一组我不正确的单词，但是我该如何对您展示给我的代码执行此操作？cheersim感谢uve所做的一切，并相信如果我再添加一件事情，它应该会起作用，而不是打印不正确单词的行号我想打印txt文件中不正确单词的行号我会添加什么？我试图在txtfile中添加if-word:？？？更新为一个最小但完整的示例
from collections import defaultdict

# Build a set of all the words we know, assuming they're one word per line
good_words = set() # Use a set, as this will have the fastest look-up time.
with open("words.txt", "rt") as f:
    for word in f.readlines():
        good_words.add(word.strip())

bad_words = defaultdict(list)

with open("text_to_check.txt", "rt") as f:
    # For every line of text, get the line number, and the text.
    for line_no, line in enumerate(f):
        # Split into seperate words - note there is an issue with punctuation,
        # case sensitivitey, etc..
        for word in line.split():
            # If the word is not recognised, record the line where it occurred.
            if word not in good_words:
                bad_words[word].append(line_no)