Python 错误：Can'；不要在类似字节的对象上使用字符串模式_Python_Regex_String_Compilation_Byte

Python 错误：Can'；不要在类似字节的对象上使用字符串模式

python regex string compilation

Python 错误：Can'；不要在类似字节的对象上使用字符串模式,python,regex,string,compilation,byte,Python,Regex,String,Compilation,Byte,我使用Python 3.2.3运行此代码： regex = '<title>(.+?)</title>' pattern = re.compile(regex) html对象从特定url获取html代码 html = response.read() 我得到错误“不能在类似字节的对象上使用字符串模式”。我试过使用： regex = b'<title>(.+?)</title>' regex=b'（.+？）' 但这会给我的结果加上一个“b”吗？

我使用Python 3.2.3运行此代码：

regex = '<title>(.+?)</title>'
pattern = re.compile(regex)

html对象从特定url获取html代码

html = response.read()

我得到错误“不能在类似字节的对象上使用字符串模式”。我试过使用：

regex = b'<title>(.+?)</title>'

regex=b'（.+？）'

但这会给我的结果加上一个“b”吗？谢谢

urllib.request

响应为您提供字节，而不是unicode字符串。这就是为什么

re

模式也需要是一个

bytes

对象，然后您会再次得到

bytes

结果

您可以使用服务器在HTTP头中为您提供的编码对响应进行解码：

html = response.read()
# no codec set? We default to UTF-8 instead, a reasonable assumption
codec = response.info().get_param('charset', 'utf8')
html = html.decode(codec)

现在您有了Unicode，也可以使用Unicode正则表达式

如果服务器在编码方面撒谎，或者没有编码集，并且UTF-8的默认值也不正确，上述情况仍然可能导致

UnicodeDecodeException

错误

在任何情况下，用

b'…'

表示的返回值都是

字节

对象；原始字符串数据尚未解码为Unicode，如果您知道数据的正确编码，则无需担心。

什么是

html

，什么是

html

对象？尝试使用

str（html）

。会发生什么情况？您建议使用哪个Python HTML解析器Ignacio？这代表了在读取和写入字符串数据时的一般规则：在读取输入时将其解码为Unicode，在写入之前对Unicode字符串进行编码。程序中的所有文本都应该用Unicode处理。

html = response.read()
# no codec set? We default to UTF-8 instead, a reasonable assumption
codec = response.info().get_param('charset', 'utf8')
html = html.decode(codec)