使用Python从html（包括图像）中提取表_Python_Html_Python 3.x_Beautifulsoup_Html Parsing

使用Python从html（包括图像）中提取表

python html python-3.x

使用Python从html（包括图像）中提取表,python,html,python-3.x,beautifulsoup,html-parsing,Python,Html,Python 3.x,Beautifulsoup,Html Parsing,我正在尝试从html下载一个表，它不是通常的td/tr格式，并且包含图像 html代码如下所示： <div class="dynamicBottom"> <div class="dynamicLeft"> <div class="content_block details_block scroll_tabs" data-tab="TABS_DETAILS"> <div class="header_with_improve wrap"> <a

我正在尝试从html下载一个表，它不是通常的td/tr格式，并且包含图像

html代码如下所示：

<div class="dynamicBottom">
<div class="dynamicLeft">
<div class="content_block details_block scroll_tabs" data-tab="TABS_DETAILS">
<div class="header_with_improve wrap">
<a href="/UpdateListing.html" onclick="ta.setEvtCookie('UpdateListing', 'entry-detail-moreinfo', null, 0, '/UpdateListingRedesign')"><div class="improve_listing_btn ui_button primary small">improve this entry</div></a>
<h3 class="tabs_header">Details</h3> </div>
<div class="details_tab">
<div class="table_section">
<div class="row">
<div class="ratingSummary wrap">
<div class="histogramCommon bubbleHistogram wrap">
<div class="colTitle">
Rating
</div>
<ul class="barChart">
<li>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Location</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Service</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
</li>

代码呢

data18= soup.find(attrs={'class':  'sprite-rating_s_fill rating_s_fill s45'})
print(data18["alt"] if data18 else "No meta title given")

这也无济于事，因为它不清楚它代表了哪种评级，因为它只打印出“50分中的45分”，但不清楚是哪一类。此外，图像标签（“sprite-rating_s_fill rating_s_fill s45”）在其他表格中根据等级而变化

有没有办法提取完整的表？或者告诉Python在某个单词后提取图像，例如“Location”

非常感谢你的帮助

html = '''<div class="dynamicBottom">
<div class="dynamicLeft">
<div class="content_block details_block scroll_tabs" data-tab="TABS_DETAILS">
<div class="header_with_improve wrap">
<a href="/UpdateListing.html" onclick="ta.setEvtCookie('UpdateListing', 'entry-detail-moreinfo', null, 0, '/UpdateListingRedesign')"><div class="improve_listing_btn ui_button primary small">improve this entry</div></a>
<h3 class="tabs_header">Details</h3> </div>
<div class="details_tab">
<div class="table_section">
<div class="row">
<div class="ratingSummary wrap">
<div class="histogramCommon bubbleHistogram wrap">
<div class="colTitle">
Rating
</div>
<ul class="barChart">
<li>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Location</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Service</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
</li>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
for div in soup.find_all('div', class_="ratingRow wrap"):
    text = div.text.strip()
    alt = div.find('img').get('alt')
    print(text, alt)

html = '''<div class="dynamicBottom">
<div class="dynamicLeft">
<div class="content_block details_block scroll_tabs" data-tab="TABS_DETAILS">
<div class="header_with_improve wrap">
<a href="/UpdateListing.html" onclick="ta.setEvtCookie('UpdateListing', 'entry-detail-moreinfo', null, 0, '/UpdateListingRedesign')"><div class="improve_listing_btn ui_button primary small">improve this entry</div></a>
<h3 class="tabs_header">Details</h3> </div>
<div class="details_tab">
<div class="table_section">
<div class="row">
<div class="ratingSummary wrap">
<div class="histogramCommon bubbleHistogram wrap">
<div class="colTitle">
Rating
</div>
<ul class="barChart">
<li>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Location</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Service</span>
</div>
<div class="wrap row part ">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s45" src="https://static.tacdn.com/img2/x.gif" alt="45 out of fifty points">
</span>
</div>
</div>
</li>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
for div in soup.find_all('div', class_="ratingRow wrap"):
    text = div.text.strip()
    alt = div.find('img').get('alt')
    print(text, alt)

Location 45 out of fifty points
Service 45 out of fifty points