Python 替换字符串中的特殊字符无效_Python_Python 3.x_Encoding

Python 替换字符串中的特殊字符无效

python python-3.x encoding

Python 替换字符串中的特殊字符无效,python,python-3.x,encoding,Python,Python 3.x,Encoding,我有一个长字符串，其中包含文本您与Uber Eats的周日晚间订单\n：test@email.com\n\n\n[图像：地图]\n\n[图像：Uber徽标]\n\xe2\x82\xac17.50\n选择Uber的等级，我试图在Python 3.6中用“EUR”替换“\xe2\x82\xac” 如果我打印字符串，我会看到它前面有b，即它是一个字节文字 b'<div dir="ltr"><br ...' etc. 这些都不管用 html.decode（“utf-8”）获取一个

我有一个长字符串，其中包含文本

您与Uber Eats的周日晚间订单\n：test@email.com\n\n\n[图像：地图]\n\n[图像：Uber徽标]\n\xe2\x82\xac17.50\n选择Uber的等级，

我试图在Python 3.6中用“EUR”替换“\xe2\x82\xac”

如果我打印字符串，我会看到它前面有b，即它是一个字节文字

 b'<div dir="ltr"><br ...' etc.

这些都不管用

html.decode（“utf-8”）

获取一个错误

“str”对象没有属性“decode”

对于上下文，字符串是通过使用邮箱库读取电子邮件内容生成的：

for message in mbox:
   for part in message.walk():
       html = str(part.get_payload(decode=True))

它不是那样工作的

html="Your Sunday evening order with Uber Eats\nTo: test@email.com\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,"
html = html.replace(u"\xe2\x82\xac","EUR")
html = html.replace(u'\xe2\x82\xac',"EUR")
html = html.replace('\xe2\x82\xac',"EUR")
html = html.replace(u"€","EUR")

html = html.encode("utf-8",'strict');

print("Encoded String: " + str(html))
print("Decoded String: " + html.decode("utf-8",'strict'))

它不是那样工作的

html="Your Sunday evening order with Uber Eats\nTo: test@email.com\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,"
html = html.replace(u"\xe2\x82\xac","EUR")
html = html.replace(u'\xe2\x82\xac',"EUR")
html = html.replace('\xe2\x82\xac',"EUR")
html = html.replace(u"€","EUR")

html = html.encode("utf-8",'strict');

print("Encoded String: " + str(html))
print("Decoded String: " + html.decode("utf-8",'strict'))

你应使用：

html = html.replace(r"\xe2\x82\xac", "EUR")

因此字符串

\xe2\x82\xac

被替换为EUR。假设

确实在html上

否则，你应该

html = html.replace('\u20ac', 'EUR')

但情况似乎并非如此，因为对于unicode符号，它不起作用

不要假设Python在字符串中使用UTF-8（事实上，它在内部不使用UTF-8）

注意：Python使用UTF-16（或UTF-32），因此Python永远不会编写

\xe2\x82\xac

（从解码字符串）。因此或

是文字，或是某个输出进程将其损坏。

您应该使用：

html = html.replace(r"\xe2\x82\xac", "EUR")

import unicodedata
jil = """"Your Sunday evening order with Uber Eats\nTo: test@email.com\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,"""
data = unicodedata.normalize("NFKD", jil)
print(data)
>>>" Your Sunday evening order with Uber Eats
To: test@email.com


[image: map]

[image: Uber logo]
â¬17.50
Thanks for choosing Uber,

因此字符串

\xe2\x82\xac

被替换为EUR。假设

确实在html上

否则，你应该

html = html.replace('\u20ac', 'EUR')

但情况似乎并非如此，因为对于unicode符号，它不起作用

不要假设Python在字符串中使用UTF-8（事实上，它在内部不使用UTF-8）

注意：Python使用UTF-16（或UTF-32），因此Python永远不会编写

\xe2\x82\xac

（从解码字符串）。所以或

是文字，或是某个输出过程把它弄坏了。

你的第一行

replace

在python 3.6

html中对我有效。replace（'\xe2\x82\xac'，“EUR”）

只在utf-8文本中有效。当我将问题中的字符串复制/粘贴到python中时，它也有效。但是，它对我的原始字符串不起作用，我将原始字符串复制/粘贴到我的问题中。这有点令人费解。我也在使用Python3.6，在这里可以工作。请检查您的python文件是否编码为UTF-8 Unicode文本，或者尝试在顶部使用

#-*-编码：UTF-8-*-

。请注意，

u“string”

前缀在Python3中是不必要的

replace

的第一行在Python3.6

html.replace（'\xe2\x82\xac'，EUR'））

仅适用于utf-8文本。当我将问题中的字符串复制/粘贴到python中时，它也适用。但是，它对我的原始字符串不起作用，我将原始字符串复制/粘贴到我的问题中。这有点令人费解。我也在使用Python3.6，在这里可以工作。检查您的python文件是否编码为UTF-8 Unicode文本，或者尝试在顶部使用

#-*-编码：UTF-8-*-

。注意，python3中不需要

u“string”

前缀

import unicodedata
jil = """"Your Sunday evening order with Uber Eats\nTo: test@email.com\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,"""
data = unicodedata.normalize("NFKD", jil)
print(data)
>>>" Your Sunday evening order with Uber Eats
To: test@email.com


[image: map]

[image: Uber logo]
â¬17.50
Thanks for choosing Uber,