如何在Python中用ascii字符替换unicode字符（给出了perl脚本）？_Python_Perl_Unicode_Diacritics

如何在Python中用ascii字符替换unicode字符（给出了perl脚本）？

python perl unicode

如何在Python中用ascii字符替换unicode字符（给出了perl脚本）？,python,perl,unicode,diacritics,Python,Perl,Unicode,Diacritics,我正在尝试学习python，但不知道如何将以下perl脚本转换为python： #!/usr/bin/perl -w use open qw(:std :utf8); while(<>) { s/\x{00E4}/ae/; s/\x{00F6}/oe/; s/\x{00FC}/ue/; print; } #/usr/bin/perl-w 使用开放式qw（：标准：utf8）； while（）{ s/\x{00E4}/ae/

我正在尝试学习python，但不知道如何将以下perl脚本转换为python：

#!/usr/bin/perl -w                     

use open qw(:std :utf8);

while(<>) {
  s/\x{00E4}/ae/;
  s/\x{00F6}/oe/;
  s/\x{00FC}/ue/;
  print;
}

#/usr/bin/perl-w
使用开放式qw（：标准：utf8）；
while（）{
s/\x{00E4}/ae/；
s/\x{00F6}/oe/；
s/\x{00FC}/ue/；
印刷品；
}

该脚本只是将unicode umlauts更改为可选的ascii输出。（所以完整的输出是ascii格式的。）如有任何提示，我将不胜感激。谢谢

使用模块在标准输入或文件列表上循环
将从UTF-8读取的行解码为unicode对象
然后使用该方法映射所需的任何unicode字符

translit.py

将如下所示：

#!/usr/bin/env python2.6
# -*- coding: utf-8 -*-

import fileinput

table = {
          0xe4: u'ae',
          ord(u'ö'): u'oe',
          ord(u'ü'): u'ue',
          ord(u'ß'): None,
        }

for line in fileinput.input():
    s = line.decode('utf8')
    print s.translate(table),

$ cat utf8.txt 
sömé täßt
sömé täßt
sömé täßt

$ ./translit.py utf8.txt 
soemé taet
soemé taet
soemé taet

你可以这样使用它：

#!/usr/bin/env python2.6
# -*- coding: utf-8 -*-

import fileinput

table = {
          0xe4: u'ae',
          ord(u'ö'): u'oe',
          ord(u'ü'): u'ue',
          ord(u'ß'): None,
        }

for line in fileinput.input():
    s = line.decode('utf8')
    print s.translate(table),

$ cat utf8.txt 
sömé täßt
sömé täßt
sömé täßt

$ ./translit.py utf8.txt 
soemé taet
soemé taet
soemé taet

更新：

在使用python的情况下，默认情况下字符串是unicode，如果字符串包含非ASCII字符甚至非拉丁字符，则不需要对其进行编码。因此，解决方案如下所示：

line='Verhältnismäigkeit，Möglichkeit'
表={
ord（'a'）：'ae'，
ord（'ö'）：'oe'，
ord（'u'）：'ue'，
ord（‘ß’）：‘ss’，
}
行。翻译（表格）
>>>“Verhaeltnismaessigkeit，Moeglichkeit”

要转换为ASCII，您可能需要尝试或，这归结为：

>>> title = u"Klüft skräms inför på fédéral électoral große"
>>> import unicodedata
>>> unicodedata.normalize('NFKD', title).encode('ascii','ignore')
'Kluft skrams infor pa federal electoral groe'

我用

您可以根据需要更改解码语言。您可能需要一个简单的函数来减少单个实现的长度

def fancy2ascii(s):
    return s.decode('latin-1').encode('translit/long').encode('ascii')

您可以尝试将Unicode转换为ascii，而不是编写手动正则表达式。它是

Text:：Unidecode

Perl模块的Python端口：

#!/usr/bin/env python
import fileinput
import locale
from contextlib import closing
from unidecode import unidecode # $ pip install unidecode

def toascii(files=None, encoding=None, bufsize=-1):
    if encoding is None:
        encoding = locale.getpreferredencoding(False)
    with closing(fileinput.FileInput(files=files, bufsize=bufsize)) as file:
        for line in file: 
            print unidecode(line.decode(encoding)),

if __name__ == "__main__":
    import sys
    toascii(encoding=sys.argv.pop(1) if len(sys.argv) > 1 else None)

它使用

FileInput

类来避免全局状态

例如：

$ echo 'äöüß' | python toascii.py utf-8
aouss

又快又脏（蟒蛇2）：

搜索“音译”以查找相关问题。给定的Perl脚本实际上只会替换每行上的第一个匹配项，但这肯定是一个意外。要获得ascii输出，最后一行应该是

print s.translate（table）。我猜encode（'ascii'，'ignore'）

。目标似乎是删除德语文本，让它可以理解。

ord（u'ß'）：此代码中的无

是删除ß（“eszett”）字符。它应该是

ord（u'ß'）：u'ss'

。向上投票？？接受答案？？？哦。来吧。在…上我试图为地图展示不同的可能性。你选择了一个非常糟糕的例子，说明如何做OP没有表示他想要或需要的事情。@john:如果你将OP的问题与他上面的评论（“忽略”）结合起来，结果会完全一样，因此，不要再挑剔了。这根本不是原始的.pl所做的（主要是正确地音译德语特殊字符）。从德语umlauts中删除点的意义与从“x”中删除一条腿并写“y”或将“d”替换为“b”差不多，因为“看起来很像”。不，由于将不同的字符串映射到同一个字符串，因此可能会发生冲突。