Can'；我无法在Python中使用正则表达式模式_Python_Regex

Can'；我无法在Python中使用正则表达式模式

python regex

Can'；我无法在Python中使用正则表达式模式,python,regex,Python,Regex,我有以下（重复）HTML文本，需要使用Python和正则表达式从中提取一些值不要求我可以使用 match_det = re.compile(r'<td width="35.+?">(.+?)</td>').findall(html_source_det) 请不要尝试用正则表达式解析HTML，因为它不是规则的。而是使用HTML解析库，如。这会让你的生活更轻松！以下是BeautifulSoup的一个示例： from bs4 import BeautifulSoup

我有以下（重复）HTML文本，需要使用Python和正则表达式从中提取一些值


不要求

我可以使用

match_det = re.compile(r'<td width="35.+?">(.+?)</td>').findall(html_source_det)

请不要尝试用正则表达式解析HTML，因为它不是规则的。而是使用HTML解析库，如。这会让你的生活更轻松！以下是BeautifulSoup的一个示例：

from bs4 import BeautifulSoup

html = '''<tr>
<td width="35%">Demand No</td>
<td width="65%"><input type="text" name="T1" size="12" onFocus="this.blur()" value="876716001"></td>
</tr>'''

soup = BeautifulSoup(html)
print soup.find('td', attrs={'width': '65%'}).findNext('input')['value']

为什么要使用正则表达式而不是BeautifulSoup？请查看正则表达式的“^”和“$”（而不是使用\n）代码。请提供尽可能短的完整程序来演示您的错误。有关更多信息，请参阅。请看一篇建议您不应该使用正则表达式的帖子。罗布：确实，您的代码可以工作！可能是因为html_源是一个静态字符串。我发布了这个字符串，所以你可以看到它，但实际上我是通过下载得到的。我用显示如何获取html_源代码的代码更新了我的问题。也许有一些编码问题或肮脏的不可打印字符我需要摆脱…谢谢你的这一点和你的建议不使用正则表达式与html。我一定会去看看这个美丽的乌苏图书馆。

new_url = "https://webaccess.site.int/curracc/" + url_details #not a real url
myresponse_det = urllib2.urlopen(new_url)
html_source_det = myresponse_det.read()

from bs4 import BeautifulSoup

html = '''<tr>
<td width="35%">Demand No</td>
<td width="65%"><input type="text" name="T1" size="12" onFocus="this.blur()" value="876716001"></td>
</tr>'''

soup = BeautifulSoup(html)
print soup.find('td', attrs={'width': '65%'}).findNext('input')['value']

print soup.find('input', attrs={'name': 'T1'})['value']