如何使python不改变我的字典代码中的字符？_Python_Dictionary_Replace_Character

如何使python不改变我的字典代码中的字符？

python dictionary replace

如何使python不改变我的字典代码中的字符？,python,dictionary,replace,character,Python,Dictionary,Replace,Character,我的老师给了我们一项任务，要我们为除英语以外的任何语言编写拼写检查程序所以我选择荷兰语，因为它接近英文字母 import re, collections def words(text): return re.findall('[a-z]+', text.lower()) def train(features): model = collections.defaultdict(lambda: 1) for f in features: model[f] +=

我的老师给了我们一项任务，要我们为除英语以外的任何语言编写拼写检查程序所以我选择荷兰语，因为它接近英文字母

import re, collections

def words(text): return re.findall('[a-z]+', text.lower())

def train(features):
    model = collections.defaultdict(lambda: 1)
    for f in features:
        model[f] += 1
    return model


NWORDS = train(words(open('dutch2.txt').read()))

alphabet = 'aäbßcdefghijklmnoöpqrstuüvwxyz'

def edits1(word):
   splits     = [(word[:i], word[i:]) for i in range(len(word) + 1)]
   deletes    = [a + b[1:] for a, b in splits if b]
   transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
   replaces   = [a + c + b[1:] for a, b in splits for c in alphabet if b]
   inserts    = [a + c + b     for a, b in splits for c in alphabet]
   return set(deletes + transposes + replaces + inserts)

def known_edits2(word):
    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)

def known(words): return set(w for w in words if w in NWORDS)

def correct(word):
    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
    return max(candidates, key=NWORDS.get)

dutch2.txt具有以下特性：当我运行它时，输出是

    *** Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 18 2013, 21:18:40) [MSC v.1600 32 bit (Intel)] on win32. ***
>>> 
>>> correct("de")
'e'
>>>

这是不正确的。。其他字符的字母表也会改变

import re, collections

def words(text): return re.findall('[a-z]+', text.lower())

def train(features):
    model = collections.defaultdict(lambda: 1)
    for f in features:
        model[f] += 1
    return model


NWORDS = train(words(open('dutch2.txt').read()))

alphabet = 'aÃ¤bÃŸcdefghijklmnoÃ¶pqrstuÃ¼vwxyz'

我该如何解决角色的变化

我尝试了很多，但我做不到

你能让它运行吗？我还强烈建议您不要使用unicode代码点（您从索引字符串中获得的内容），而是使用graphemes。最简单的方法是

导入正则表达式；使用包的regex.findall（“\X”，“êo”）

。因此，您的

re.findall（'[a-z]+'，text.lower（））

应该变成

regex.findall（'\X+'，text.lower（））

，并且您应该用另一个命令拆分每个结果。这不是荷兰语，而是德语。