Python 从Beauty soup中的图表中提取文本
对beautifulsoup来说比较新,我正试图从这个网页中提取数据: 我想抓住标题“程序完成者”、“雇佣的第二季度”下的数字。html代码的相关部分是:Python 从Beauty soup中的图表中提取文本,python,beautifulsoup,Python,Beautifulsoup,对beautifulsoup来说比较新,我正试图从这个网页中提取数据: 我想抓住标题“程序完成者”、“雇佣的第二季度”下的数字。html代码的相关部分是: <ul class="listbox"> <li class="li1"> <p style="cursor:help" class="listtop" title="WIA Adult completers are those individuals who have e
<ul class="listbox">
<li class="li1">
<p style="cursor:help" class="listtop" title="WIA Adult
completers are those individuals who have exited a WIA Adult program from
which the individual received a core staff-assisted service (such as job
search or placement assistance) or an intensive service (such as
counseling, career planning, or job training). Those individuals who
participated in WIA through self-service, like OhioMeansJobs.com, or other
less intensive programs are not included in the dashboard.">Program
Completers</p>
<p id="programcompleters1">18</p></li>
这将返回文本,但网页的其他部分也标记为“ul”。我未能从图表区域内获取任何文本。如何检索所需的文本
谢谢你的帮助 所需的元素位于iframe中。尝试在以下位置从页面本身提取 所以,这应该行得通
url="http://reports.workforce.test.ohio.gov/WIAReports/WIA_COUNTY.ASPX?level=county&DataType=hIp9ibmBIwbKor1WvT5Bkg==&name=GTL8gAmmdulY5GSlycy7WQ==&programDate=Kf/2jvCFFRgQJjODWV7l08ATxxM/adw9p1FWfZ9J7O8="
page = urllib2.urlopen(url)
soup = BeautifulSoup(page)
chartcontainers = soup.findAll('div', {"class": "chartcontain"})
for container in chartcontainers:
print(container)
#then do whatever
如前所述,您要查找的数据位于iframe中,请按照@selected_codex的说明访问它: 然后,您可以通过以下方式访问感兴趣的字段:
data = {}
for tag in soup.find_all('p'):
if tag.get('id'):
data[tag.get('id')] = tag.text
print(data)
>> print(data.get('programcompleters1'))
18
非常感谢。两个答案都有效,但是@Matt_Davidson的解决方案让我得到了我正在寻找的更具体的数据。
data = {}
for tag in soup.find_all('p'):
if tag.get('id'):
data[tag.get('id')] = tag.text
print(data)
>> print(data.get('programcompleters1'))
18