为什么不能使用”；正则表达式“；在python？；重新导入 html=”“” 0 0天 0 0时 0 0分 0 0秒 """ tmp=re.compile（u“（）*？（[\u4e00-\u9fa5]）*？，re.u）结果=re.findall（tmp，html.decode（“utf-8”））打印结果 []_Python_Regex

为什么不能使用”；正则表达式“；在python？；重新导入 html=”“” 0 0天 0 0时 0 0分 0 0秒 """ tmp=re.compile（u“（）*？（[\u4e00-\u9fa5]）*？，re.u）结果=re.findall（tmp，html.decode（“utf-8”））打印结果 []

python regex

为什么不能使用”；正则表达式“；在python？；重新导入 html=”“” 0 0天 0 0时 0 0分 0 0秒 """ tmp=re.compile（u“（）*？（[\u4e00-\u9fa5]）*？，re.u）结果=re.findall（tmp，html.decode（“utf-8”））打印结果 [],python,regex,Python,Regex,如上所述，为什么我的代码不能匹配中文字符？您在正则表达式模式中使用了单引号，而html中有div类。我认为有一种更简单的模式可以提取您想要的： tmp = re.compile(u"(?m)([\u4e00-\u9fa5])+", re.U) result=re.findall(tmp,html) print result 输出： ['天', '时', '分', '秒'] 如果您的html大于问题中显示的内容，并且您只需要中的中文字符，您可以首先提取div中的文本，然后在该文本中搜索： ins

如上所述，为什么我的代码不能匹配中文字符？

您在正则表达式模式中使用了单引号

，而

html

中有

div

类。我认为有一种更简单的模式可以提取您想要的：

tmp = re.compile(u"(?m)([\u4e00-\u9fa5])+", re.U)
result=re.findall(tmp,html)
print result

输出：

['天', '时', '分', '秒']
如果您的html
大于问题中显示的内容，并且您只需要
中的中文字符，您可以首先提取div
中的文本，然后在该文本中搜索：
inside_text = re.search(r'<div class="tB-mb">[\s\S]+</div>', html).group()
result = re.findall(tmp,inside_text)

inside_text=re.search（r'[\s\s]+'，html.group（）
结果=re.findall（tmp，内部文本）

输出将根据需要。
您在正则表达式模式中使用了
中的单引号，而html
中有div
类的双引号。我认为有一种更简单的模式可以提取您想要的内容：
tmp = re.compile(u"(?m)([\u4e00-\u9fa5])+", re.U)
result=re.findall(tmp,html)
print result

输出：
['天', '时', '分', '秒']
如果您的html
大于问题中显示的内容，并且您只需要
中的中文字符，您可以首先提取div
中的文本，然后在该文本中搜索：
inside_text = re.search(r'<div class="tB-mb">[\s\S]+</div>', html).group()
result = re.findall(tmp,inside_text)

inside_text=re.search（r'[\s\s]+'，html.group（）
结果=re.findall（tmp，内部文本）

输出将符合要求。
hi@Stephen2017，别忘了使用PEP8编码样式来格式化代码。谢谢。您是否尝试过，您的问题的可能副本不完全是中文字符，我猜……您无法按照您使用的模式提取div内的所有中文字符hi@Stephen2017，别忘了使用PEP8编码ng样式来格式化您的代码。谢谢。您是否尝试过，您的问题的可能副本不完全是中文字符，我猜…您无法根据您使用的模式提取div中的所有中文字符