在Python2和Python3中匹配Unicode字符_Python_Python 2.7

在Python2和Python3中匹配Unicode字符

python python-2.7

在Python2和Python3中匹配Unicode字符,python,python-2.7,Python,Python 2.7,Python 3 >>> import re >>> >>> REGEX_KHMER = re.compile(r"[\u1780-\u17dd\u17e0-\u17e9\u17f0-\u17f9]+") >>> value = "ហួយ" >>> >>> re.search(REGEX_KHMER, value) <_sre.SRE_Match object; span=(0,

Python 3

>>> import re
>>> 
>>> REGEX_KHMER = re.compile(r"[\u1780-\u17dd\u17e0-\u17e9\u17f0-\u17f9]+")
>>> value = "ហួយ"
>>> 
>>> re.search(REGEX_KHMER, value)
<_sre.SRE_Match object; span=(0, 3), match='ហួយ'>

我想要Python 3的行为，但是为什么正则表达式模式不能匹配Unicode字符，比如

ហួយ在Python 2中，但在Python 3中正常工作吗？
我确实必须使用Python 2，这应该在那里工作：
# coding=utf-8
import re

REGEX_KHMER = re.compile(ur"[\u1780-\u17dd\u17e0-\u17e9\u17f0-\u17f9]+", re.UNICODE)

value = ur"ហួយ"
match = re.search(REGEX_KHMER, value)
print(match.group(0))

因此：

明确源文件的编码coding=utf-8
在字符串前面加上ur
，表示它是utf-8字符串
告诉正则表达式引擎使用Unicode
我认为您确实必须使用Python 2，这应该可以在那里使用：
# coding=utf-8
import re

REGEX_KHMER = re.compile(ur"[\u1780-\u17dd\u17e0-\u17e9\u17f0-\u17f9]+", re.UNICODE)

value = ur"ហួយ"
match = re.search(REGEX_KHMER, value)
print(match.group(0))

因此：

明确源文件的编码coding=utf-8
在字符串前面加上ur
，表示它是utf-8字符串
告诉正则表达式引擎使用Unicode
我假设您必须使用Python 2？因为在几乎任何情况下，最好的做法都是使用Python 3，特别是在Python 2中使用非英语Unicode可能会非常痛苦？因为在几乎任何情况下，最好的做法都是转到Python3，特别是因为在Python2中使用非英语Unicode可能是一件非常痛苦的事情。