Python拼写检查器_Python_Python 2.7_Nltk_Spell Checking_Pyenchant

Python拼写检查器

python python-2.7

Python拼写检查器,python,python-2.7,nltk,spell-checking,pyenchant,Python,Python 2.7,Nltk,Spell Checking,Pyenchant,我对Python和NLTK相当陌生。我正忙于一个可以执行拼写检查（用正确的单词替换拼写错误的单词）的应用程序。我目前正在使用Python2.7上的Enchant库、PyEnchant和NLTK库。下面的代码是处理更正/替换的类 from nltk.metrics import edit_distance class SpellingReplacer: def __init__(self, dict_name='en_GB', max_dist=2): self.spe

我对Python和NLTK相当陌生。我正忙于一个可以执行拼写检查（用正确的单词替换拼写错误的单词）的应用程序。我目前正在使用Python2.7上的Enchant库、PyEnchant和NLTK库。下面的代码是处理更正/替换的类

from nltk.metrics import edit_distance

class SpellingReplacer:
    def __init__(self, dict_name='en_GB', max_dist=2):
        self.spell_dict = enchant.Dict(dict_name)
        self.max_dist = 2

    def replace(self, word):
        if self.spell_dict.check(word):
            return word
        suggestions = self.spell_dict.suggest(word)

        if suggestions and edit_distance(word, suggestions[0]) <= self.max_dist:
            return suggestions[0]
        else:
            return word

现在，我真的不喜欢这个，因为它不是很准确，我正在寻找一种方法来实现拼写检查和替换单词。我还需要一些能找出拼写错误的东西，比如“caaaar”？有没有更好的拼写检查方法？如果是，它们是什么？谷歌是如何做到的？因为他们的拼写提示很好

有什么建议吗？

我建议从仔细阅读开始。（我不得不做一些类似的事情，我发现它非常有用。）

下面的函数尤其具有使拼写检查器更加复杂的思想：拆分、删除、转置和插入不规则单词以“更正”它们

def edits1(word):
   splits     = [(word[:i], word[i:]) for i in range(len(word) + 1)]
   deletes    = [a + b[1:] for a, b in splits if b]
   transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
   replaces   = [a + c + b[1:] for a, b in splits for c in alphabet if b]
   inserts    = [a + c + b     for a, b in splits for c in alphabet]
   return set(deletes + transposes + replaces + inserts)

注：以上是诺维格拼写更正器的一个片段

好消息是，您可以逐步添加并不断改进拼写检查器

希望有帮助。

拼写更正-> 你需要导入一个语料库到你的桌面上，如果你存储在别处，改变代码中的路径，我也用tkinter添加了一些图形，这只是为了解决非单词错误

def min_edit_dist(word1,word2):
    len_1=len(word1)
    len_2=len(word2)
    x = [[0]*(len_2+1) for _ in range(len_1+1)]#the matrix whose last element ->edit distance
    for i in range(0,len_1+1):  
        #initialization of base case values
        x[i][0]=i
        for j in range(0,len_2+1):
            x[0][j]=j
    for i in range (1,len_1+1):
        for j in range(1,len_2+1):
            if word1[i-1]==word2[j-1]:
                x[i][j] = x[i-1][j-1]
            else :
                x[i][j]= min(x[i][j-1],x[i-1][j],x[i-1][j-1])+1
    return x[i][j]
from Tkinter import *


def retrieve_text():
    global word1
    word1=(app_entry.get())
    path="C:\Documents and Settings\Owner\Desktop\Dictionary.txt"
    ffile=open(path,'r')
    lines=ffile.readlines()
    distance_list=[]
    print "Suggestions coming right up count till 10"
    for i in range(0,58109):
        dist=min_edit_dist(word1,lines[i])
        distance_list.append(dist)
    for j in range(0,58109):
        if distance_list[j]<=2:
            print lines[j]
            print" "   
    ffile.close()
if __name__ == "__main__":
    app_win = Tk()
    app_win.title("spell")
    app_label = Label(app_win, text="Enter the incorrect word")
    app_label.pack()
    app_entry = Entry(app_win)
    app_entry.pack()
    app_button = Button(app_win, text="Get Suggestions", command=retrieve_text)
    app_button.pack()
    # Initialize GUI loop
    app_win.mainloop()

def最小编辑距离（word1，word2）：
len_1=len（单词1）
len_2=len（字2）
x=[[0]*（len_2+1）表示_在范围内（len_1+1）]#最后一个元素的矩阵->编辑距离
对于范围内的i（0，len_1+1）：
#基本案例值的初始化
x[i][0]=i
对于范围（0，len_2+1）内的j：
x[0][j]=j
对于范围内的i（1，len_1+1）：
对于范围（1，len_2+1）内的j：
如果word1[i-1]==word2[j-1]：
x[i][j]=x[i-1][j-1]
其他：
x[i][j]=min（x[i][j-1]，x[i-1][j]，x[i-1][j-1]）+1
返回x[i][j]
从Tkinter进口*
def retrieve_text（）：
全球词汇1
word1=（app_entry.get（））
path=“C:\Documents and Settings\Owner\Desktop\Dictionary.txt”
ffile=open（路径'r'）
lines=ffile.readlines（）
距离列表=[]
打印“建议立即出现，直到10”
对于范围内的i（058109）：
dist=min\u edit\u dist（字1，行[i]）
距离列表。附加（距离）
对于范围（058109）内的j：
如果距离列表[j]，您可以使用库在python中进行拼写检查。

用法示例：
from autocorrect import Speller

spell = Speller(lang='en')

print(spell('caaaar'))
print(spell('mussage'))
print(spell('survice'))
print(spell('hte'))

caesar
message
service
the

结果：
from autocorrect import Speller

spell = Speller(lang='en')

print(spell('caaaar'))
print(spell('mussage'))
print(spell('survice'))
print(spell('hte'))

caesar
message
service
the

来自自动更正导入拼写
为此，你需要安装，更喜欢anaconda，它只适用于文字，而不适用于句子，所以这是你将面临的限制
from autocorrect import spell
print(spell('intrerpreter'))
# output: interpreter

python中检查拼写的最佳方法是：SymSpell、Bk-Tree或Peter Novig的方法
最快的是SymSpell
这是方法1：参考链接
该库基于Peter Norvig的实现
pip安装pyspellchecker
方法2:
pip安装-U symspellpy也许已经太晚了，但我将为将来的搜索提供答案。
要执行拼写错误更正，首先需要确保单词不荒谬或来自俚语，如，caaaar，amazzing等，并重复字母表。所以，我们首先需要去掉这些字母表。正如我们所知，在英语中，单词通常最多有两个重复的字母，例如，hello.，因此我们首先删除单词中多余的重复，然后检查它们的拼写。
要删除额外的字母表，可以使用Python中的正则表达式模块
完成后，使用Python中的Pyspellchecker库更正拼写
要实现，请访问此链接：
Spark NLP是我使用的另一个选项，它工作得非常好。一个简单的教程可以在这里找到
 Try-它对于自动拼写更正非常有效：
import jamspell

corrector = jamspell.TSpellCorrector()
corrector.LoadLangModel('en.bin')

corrector.FixFragment('Some sentnec with error')
# u'Some sentence with error'

corrector.GetCandidates(['Some', 'sentnec', 'with', 'error'], 1)
# ('sentence', 'senate', 'scented', 'sentinel')

pyspellchecker
是解决此问题的最佳方案之一pyspellchecker库基于Peter的博客文章。
它使用一种算法来查找距离原始单词2的编辑距离内的排列。
安装此库有两种方法。官方文件强烈建议使用该软件包

使用pip安装



从源代码安装

以下代码是文档中提供的示例
from spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])

for word in misspelled:
    # Get the one `most likely` answer
    print(spell.correction(word))

    # Get a list of `likely` options
    print(spell.candidates(word))

在终端中
用于代码
是一种开源的、独立于语言的、可培训的拼写检查工具，其性能优于Norvig的方法，并可用于多种编码语言。删除具有两个以上重复的letter
的单词不是一个好主意。（哦，我只是拼错了字母
）。我没有说要删除整个单词，我描述了从单词中删除额外的字母表。所以，lettters
到字母
。请仔细阅读答案。至少对于python3来说，indexer是不推荐的，它目前破坏了pyspellchecker模块。pyspellchecker非常慢，并且去除标点（但在python3.6上工作）打印（拼写（'Stanger things'））提供了Stenger things这看起来不符合python-3吗spell=Speller（lang='en'）
throwsTypeError:JSON对象必须是str，而不是“bytes”不幸的是，这个库不可信。在100个相对常见的单词中，有6个被自动更正为另一个单词：沙丁鱼->海军陆战队，空姐->乘务员，势利小人->雪，拐杖->离合器，毛皮->毛毡，烤面包机->过山车，哪个更好
git clone https://github.com/barrust/pyspellchecker.git
cd pyspellchecker
python setup.py install

from spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])

for word in misspelled:
    # Get the one `most likely` answer
    print(spell.correction(word))

    # Get a list of `likely` options
    print(spell.candidates(word))

pip install gingerit

from gingerit.gingerit import GingerIt
text = input("Enter text to be corrected")
result = GingerIt().parse(text)
corrections = result['corrections']
correctText = result['result']

print("Correct Text:",correctText)
print()
print("CORRECTIONS")
for d in corrections:
  print("________________")  
  print("Previous:",d['text'])  
  print("Correction:",d['correct'])   
  print("`Definiton`:",d['definition'])