Python 使用BeautifulSoup在HTML标记内选择多个值_Python_Web Scraping_Beautifulsoup

Python 使用BeautifulSoup在HTML标记内选择多个值

python web-scraping

Python 使用BeautifulSoup在HTML标记内选择多个值,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我创建了一个包含多个代码块的HTML页面，如下所示： <div data-pnref="all" class="clearfix _5qo4"> <a data-hovercard="/ajax/hovercard/user.php?id=671948073& amp;extragetparams=%7B%22hc_location%22%3A%22friends_tab%22%7D" ... /> 我想检索数据悬停卡的值，尤其是URL中的id:“67194

我创建了一个包含多个代码块的HTML页面，如下所示：

<div data-pnref="all" class="clearfix _5qo4">
<a data-hovercard="/ajax/hovercard/user.php?id=671948073&
amp;extragetparams=%7B%22hc_location%22%3A%22friends_tab%22%7D" ... />

我想检索

数据悬停卡的值，尤其是URL中的id:“671948073”
我在BeautifulSoup模块中尝试了findAll和select，但至今未成功。
找到
，然后找到
：
是的，但我检索整个块，然后我无法提取id
html = '<div data-pnref="all" class="clearfix _5qo4"><a data-hovercard="/ajax/hovercard/user.php?id=671948073&amp;extragetparams=%7B%22hc_location%22%3A%22friends_tab%22%7D"/></div>'
soup = BeautifulSoup(html)

div = soup.find('div')
anchor = div.find('a')

data_hovercard = anchor['data-hovercard']

print data_hovercard
#/ajax/hovercard/user.php?id=671948073&extragetparams=%7B%22hc_location%22%3A%22friends_tab%22%7D

import urlparse

parsed = urlparse.urlparse(data_hovercard)
parsed_dict = urlparse.parse_qs(parsed.query)
hovercard_id = parsed_dict['id']

print hovercard_id
#['671948073']