Python正则表达式可以'；找不到子字符串，但它应该_Python_Regex

Python正则表达式可以'；找不到子字符串，但它应该

python regex

Python正则表达式可以'；找不到子字符串，但它应该,python,regex,Python,Regex,我正在尝试使用BeautifulSoup解析html，以尝试提取网页标题。有时，这不起作用，因为网站写得不好，例如坏的结束标签。当这不起作用时，我转到手动正则表达式我有文本 <html xmlns="http://www.w3.org/1999/xhtml"\n xmlns:og="http://ogp.me/ns#"\n xmlns:fb="https://www.facebook.com/2008/fbml">\n<head>\n <

我正在尝试使用BeautifulSoup解析html，以尝试提取网页标题。有时，这不起作用，因为网站写得不好，例如坏的结束标签。当这不起作用时，我转到手动正则表达式

我有文本

<html xmlns="http://www.w3.org/1999/xhtml"\n      xmlns:og="http://ogp.me/ns#"\n      xmlns:fb="https://www.facebook.com/2008/fbml">\n<head>\n    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>\n    <title>\n                    .@wolfblitzercnn prepping questions for the Cheney intvw. @CNNSitRoom today. 5p. \n            </title>\n    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />...

\n\n\n\n@wolfblitzercnn正在为切尼intvw准备问题@今天是我的房间。5便士\n\n。。。

我试图获取

和

标记之间的值。它应该相当简单，但不起作用。这是我的python代码

result = re.search('\<title\>(.+?)\</title\>', html)
if result is not None:
    title = result.group(0)

result=re.search（'\（.+？）\'，html）
如果结果不是无：
标题=结果。组（0）

无论出于什么原因，这都不适用于本文本。它将result.group（）返回为None，否则我将获得AttributeError。AttributeError:“非类型”对象没有属性“组”

我已经将这篇文本转换成在线python正则表达式开发人员，并尝试了所有选项（re.match、re.findall、re.search），它们都在那里工作，但无论出于什么原因，在我的脚本中都无法找到这些标记之间的任何内容。甚至尝试其他正则表达式，例如

<title>(.*?)</title>

（*）

etc

如果要在

和

标记之间获取测试，应使用以下regexp：

pattern = "<title>([^<]+)</title>"

re.findall(pattern, html_string)

pattern=“（[^如果要在
和
标记之间获取测试，应使用此regexp:
pattern = "<title>([^<]+)</title>"

re.findall(pattern, html_string) 

pattern=“（[^您应该使用使
也匹配换行符
result = re.search('\<title\>(.+?)\</title\>', html, re.DOTALL)

result=re.search（'\（.+？）\'，html，re.DOTALL）

正如文件所说：
…如果没有此标志，。
将匹配除换行符以外的任何内容
还应使用使
与换行符匹配
result = re.search('\<title\>(.+?)\</title\>', html, re.DOTALL)

result=re.search（'\（.+？）\'，html，re.DOTALL）

正如文件所说：
…如果没有此标志，。
将匹配除换行符以外的任何内容
为什么使用re.DOTALL
标志？您甚至不使用
。为什么使用re.DOTALL
标志？您甚至不使用
。