Python 使用字典将拼写错误的单词分配给行号
这是我目前拥有的代码:Python 使用字典将拼写错误的单词分配给行号,python,dictionary,spell-checking,Python,Dictionary,Spell Checking,这是我目前拥有的代码: from collections import defaultdict goodwords = set() with open("soccer.txt", "rt") as f: for word in f.readlines(): goodwords.add(word.strip()) badwords = defaultdict(list) with open("soccer.txt", "rt") as f: for lin
from collections import defaultdict
goodwords = set()
with open("soccer.txt", "rt") as f:
for word in f.readlines():
goodwords.add(word.strip())
badwords = defaultdict(list)
with open("soccer.txt", "rt") as f:
for line_no, line in enumerate(f):
for word in line.split():
if word not in text:
badwords[word].append(line_no)
print(badwords)
如何修复代码,使其打印单词列表中存储的错误单词和行号
例如,如果第5行和第7行的单词拼错了,它会打印如下内容:
togeher 5 7
将新的计数器
插入d
时,首先检查word
中是否包含words
。您可能想检查word
是否已包含在d
中:
if word not in d:
d[word] = [counter]
else:
d[word].append(counter)
检查单词是否包含在单词中或行中
应该是一个单独的if
您还可以使用dictssetdefault()
方法简化此逻辑:
d.setdefault(word, []).append(counter)
或者将d
设置为defaultdict
,这将进一步简化赋值:
from collections import defaultdict
d = defaultdict(list)
...
d[word].append(counter)
关于通用算法,请注意,首先迭代所有行以增加计数器,然后,当计数器已达到其最大值时,开始检查拼写错误的单词。可能您应该检查循环中增加计数器的每一行。从您正在做的工作来看,我认为以下几点非常适合您:
from collections import defaultdict
text = ( "cat", "dog", "rat", "bat", "rat", "dog",
"man", "woman", "child", "child") #
d = defaultdict(list)
for lineno, word in enumerate(text):
d[word].append(lineno)
print d
这将为您提供以下输出:
defaultdict(<type 'list'>, {'bat': [3], 'woman': [7], 'dog': [1, 5],
'cat': [0], 'rat': [2, 4], 'child': [8, 9],
'man': [6]})
产生:
set(['bat', 'woman', 'dog', 'cat', 'rat', 'child', 'man'])
或者,仅打印以下文字:
for word in d.keys():
print word
编辑3:
我认为这可能是最终版本:
这是一个(故意)非常粗糙但几乎完全的拼写检查工具
from collections import defaultdict
# Build a set of all the words we know, assuming they're one word per line
good_words = set() # Use a set, as this will have the fastest look-up time.
with open("words.txt", "rt") as f:
for word in f.readlines():
good_words.add(word.strip())
bad_words = defaultdict(list)
with open("text_to_check.txt", "rt") as f:
# For every line of text, get the line number, and the text.
for line_no, line in enumerate(f):
# Split into seperate words - note there is an issue with punctuation,
# case sensitivitey, etc..
for word in line.split():
# If the word is not recognised, record the line where it occurred.
if word not in good_words:
bad_words[word].append(line_no)
最后,bad_words
将是一个字典,其中未识别的单词作为关键字,单词所在的行号作为匹配值项。您需要格式化代码,以便我们可以运行它-我做了一个简单的格式化,但这遗漏了一些缩进,无法计算行数;使用len(words)
。如果您遇到一个错误并想问一个问题,请告诉我们您遇到了什么错误。我得到的错误是d[word]。append(counter)KeyError:'a'文本文件实际上名为soccer.txt,但我使用的是sys.argv。我只编程了2个月,所以我不了解一切。我将if word not in words更改为if words not in d,但仍然得到错误打印(word,d[计数器])keyerror:329i有一个不正确单词的列表,我想把我的txt文件中不正确单词的行号打印到一个集合中,然后打印出helo 5 8#5和8作为txt文件中的行号,尽管任何关于如何做的建议PLZZZI实际上都有一个名为dictset=[]的正确拼写列表这是一本包含很多单词的字典,不过我会试试这个谢谢这是我有一个包含单词的TXT文件一个不正确单词的列表,我只想把不正确单词的lne编号附加到彼此上你说的工作,但我希望它打印为一个集合我有一个集合,但我只能在incorrectwords:print(inwords)这会打印一组我不正确的单词,但是我该如何对您展示给我的代码执行此操作?cheersim感谢uve所做的一切,并相信如果我再添加一件事情,它应该会起作用,而不是打印不正确单词的行号我想打印txt文件中不正确单词的行号我会添加什么?我试图在txtfile中添加if-word:???更新为一个最小但完整的示例
from collections import defaultdict
# Build a set of all the words we know, assuming they're one word per line
good_words = set() # Use a set, as this will have the fastest look-up time.
with open("words.txt", "rt") as f:
for word in f.readlines():
good_words.add(word.strip())
bad_words = defaultdict(list)
with open("text_to_check.txt", "rt") as f:
# For every line of text, get the line number, and the text.
for line_no, line in enumerate(f):
# Split into seperate words - note there is an issue with punctuation,
# case sensitivitey, etc..
for word in line.split():
# If the word is not recognised, record the line where it occurred.
if word not in good_words:
bad_words[word].append(line_no)