如何使python不改变我的字典代码中的字符?

如何使python不改变我的字典代码中的字符?,python,dictionary,replace,character,Python,Dictionary,Replace,Character,我的老师给了我们一项任务,要我们为除英语以外的任何语言编写拼写检查程序 所以我选择荷兰语,因为它接近英文字母 import re, collections def words(text): return re.findall('[a-z]+', text.lower()) def train(features): model = collections.defaultdict(lambda: 1) for f in features: model[f] +=

我的老师给了我们一项任务,要我们为除英语以外的任何语言编写拼写检查程序 所以我选择荷兰语,因为它接近英文字母

import re, collections

def words(text): return re.findall('[a-z]+', text.lower())

def train(features):
    model = collections.defaultdict(lambda: 1)
    for f in features:
        model[f] += 1
    return model


NWORDS = train(words(open('dutch2.txt').read()))

alphabet = 'aäbßcdefghijklmnoöpqrstuüvwxyz'

def edits1(word):
   splits     = [(word[:i], word[i:]) for i in range(len(word) + 1)]
   deletes    = [a + b[1:] for a, b in splits if b]
   transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
   replaces   = [a + c + b[1:] for a, b in splits for c in alphabet if b]
   inserts    = [a + c + b     for a, b in splits for c in alphabet]
   return set(deletes + transposes + replaces + inserts)

def known_edits2(word):
    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)

def known(words): return set(w for w in words if w in NWORDS)

def correct(word):
    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
    return max(candidates, key=NWORDS.get)
dutch2.txt具有以下特性: 当我运行它时,输出是

    *** Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 18 2013, 21:18:40) [MSC v.1600 32 bit (Intel)] on win32. ***
>>> 
>>> correct("de")
'e'
>>> 
这是不正确的。。 其他字符的字母表也会改变

import re, collections

def words(text): return re.findall('[a-z]+', text.lower())

def train(features):
    model = collections.defaultdict(lambda: 1)
    for f in features:
        model[f] += 1
    return model


NWORDS = train(words(open('dutch2.txt').read()))

alphabet = 'aäbßcdefghijklmnoöpqrstuüvwxyz'
我该如何解决角色的变化
我尝试了很多,但我做不到

你能让它运行吗?我还强烈建议您不要使用unicode代码点(您从索引字符串中获得的内容),而是使用graphemes。最简单的方法是
导入正则表达式;使用包的regex.findall(“\X”,“êo”)
。因此,您的
re.findall('[a-z]+',text.lower())
应该变成
regex.findall('\X+',text.lower())
,并且您应该用另一个命令拆分每个结果。这不是荷兰语,而是德语。