Python 如何从字符串中去掉一些字符。replace（）不'；行不通_Python_Xml_Replace_Xml.etree

Python 如何从字符串中去掉一些字符。replace（）不'；行不通

python xml replace

Python 如何从字符串中去掉一些字符。replace（）不'；行不通,python,xml,replace,xml.etree,Python,Xml,Replace,Xml.etree,我需要从xml文件中得到的字符串中去掉波兰字符。我使用.replace（），但在本例中它不起作用。为什么？守则： # -*- coding: utf-8 from prestapyt import PrestaShopWebService from xml.etree import ElementTree prestashop = PrestaShopWebService('http://localhost/prestashop/api',

我需要从xml文件中得到的字符串中去掉波兰字符。我使用.replace（），但在本例中它不起作用。为什么？守则：

# -*- coding: utf-8
from prestapyt import PrestaShopWebService
from xml.etree import ElementTree

prestashop = PrestaShopWebService('http://localhost/prestashop/api', 
                              'key')
prestashop.debug = True

name = ElementTree.tostring(prestashop.search('products', options=
{'display': '[name]', 'filter[id]': '[2]'}), encoding='cp852',  
method='text')

print name
print name.replace('ł', 'l')

输出：

Naturalne mydło odświeżające
Naturalne mydło odświeżające

'Naturalne myd\x88o od\x98wie\xbeaj\xa5ce'
'\xc5\x82'

Naturalne mydło odświeżające
Naturalne mydlo odświeżające
'Naturalne mydło odświeżające' 'ł'
'Naturalne myd\u0142o od\u015bwie\u017caj\u0105ce' '\u0142'

但当我尝试替换非波兰字符时，效果很好

print name
print name.replace('a', 'o')

结果:

Naturalne mydło odświeżające
Noturolne mydło odświeżojące

这也很好：

name = "Naturalne mydło odświeżające"
print name.replace('ł', 'l')

有什么建议吗？

如果我正确理解您的问题，您可以使用：

您可能必须先用

名称解码cp852编码字符串。解码（'utf_8'）

。

您将编码与字节字符串混合。下面是一个简短的工作示例，再现了这个问题。我假设您在默认编码为

cp852

的Windows控制台中运行：

#!python2
# coding: utf-8
from xml.etree import ElementTree as et
name_element = et.Element('data')
name_element.text = u'Naturalne mydło odświeżające'
name = et.tostring(name_element,encoding='cp852', method='text')
print name
print name.replace('ł', 'l')

输出（无需更换）：

原因是，

名称

字符串在

cp852

中编码，但字节字符串常量

'ł'

在

utf-8

的源代码编码中编码

print repr(name)
print repr('ł')

输出：

Naturalne mydło odświeżające
Naturalne mydło odświeżające

'Naturalne myd\x88o od\x98wie\xbeaj\xa5ce'
'\xc5\x82'

Naturalne mydło odświeżające
Naturalne mydlo odświeżające
'Naturalne mydło odświeżające' 'ł'
'Naturalne myd\u0142o od\u015bwie\u017caj\u0105ce' '\u0142'

最好的解决方案是使用Unicode字符串：

#!python2
# coding: utf-8
from xml.etree import ElementTree as et
name_element = et.Element('data')
name_element.text = u'Naturalne mydło odświeżające'
name = et.tostring(name_element,encoding='cp852', method='text').decode('cp852')
print name
print name.replace(u'ł', u'l')
print repr(name)
print repr(u'ł')

输出（进行了更换）：

请注意，Python3的

et.tostring

有一个Unicode选项，字符串常量默认为Unicode。字符串的

repr（）

版本也更具可读性，但

ascii（）

实现了旧的行为。您还将发现，Python 3.6甚至可以在不使用波兰语代码页的控制台上打印波兰语，因此您可能根本不需要替换字符

#!python3
# coding: utf-8
from xml.etree import ElementTree as et
name_element = et.Element('data')
name_element.text = 'Naturalne mydło odświeżające'
name = et.tostring(name_element,encoding='unicode', method='text')
print(name)
print(name.replace('ł','l'))
print(repr(name),repr('ł'))
print(ascii(name),ascii('ł'))

输出：

Naturalne mydło odświeżające
Naturalne mydło odświeżające

'Naturalne myd\x88o od\x98wie\xbeaj\xa5ce'
'\xc5\x82'

Naturalne mydło odświeżające
Naturalne mydlo odświeżające
'Naturalne mydło odświeżające' 'ł'
'Naturalne myd\u0142o od\u015bwie\u017caj\u0105ce' '\u0142'

您需要将两个字符串的Unicode格式规范化为相同的格式。非常感谢！编码/解码的事情对我来说仍然有点棘手，所以我想我必须学习Unicode。我也会考虑移到Python 3.x。谢谢！我已经执行了你的建议，现在一切都好了。