Web scraping 浏览网站并提取电话号码

Web scraping 浏览网站并提取电话号码,web-scraping,beautifulsoup,navigation,Web Scraping,Beautifulsoup,Navigation,我正在尝试使用BeautifulSoup从多个国家的网站上刮取非法电话号码 html代码具有以下结构: <div class="Table"> <div class="Title"> </div> <div class="Row"> <div class="Cell"> <div>United Kingdom<br> <a hr

我正在尝试使用BeautifulSoup从多个国家的网站上刮取非法电话号码

html代码具有以下结构:

<div class="Table">
<div class="Title"> </div>
<div class="Row">
<div class="Cell">
<div>United Kingdom<br>
<a href="447465832167-UnitedKingdom"><img src="flags/flag-uk.png" alt="SMS - United Kingdom" style="vertical-align: middle;">&nbsp;&nbsp;+447465832167<br>
</a><strong> SMS received:23304<section style="border:none; height: auto; padding: 1px; width: auto; background: #33FF66;"></section></strong></div>
</div>
<div class="Cell">
<div>Germany<br>
<a href="4915902933699-Germany"><img src="flags/german_flag.gif" alt="SMS - Germany" style="vertical-align: middle;">&nbsp;&nbsp;+4915902933699<br>
</a><strong> SMS received:21712<section style="border:none; height: auto; padding: 1px; width: auto; background: #33FF66;"></section></strong></div>
</div>
<div class="Cell">
<div>India<br>
<a href="919532351442-India"><img src="flags/flag-india.png" alt="SMS - India" style="vertical-align: middle;">&nbsp;&nbsp;+919532351442<br>
</a><strong> SMS received:1593<section style="border:none; height: auto; padding: 1px; width: auto; background: #33FF66;"></section></strong></div>
</div>
....
</div>
...
soup = bs.BeautifulSoup(source, parser)
mydivs = soup.findAll("div", {"class": "Cell"})

[<div class="Cell">
<div>United Kingdom<br/>
<a href="447465832167-UnitedKingdom"><img alt="SMS - United Kingdom" src="flags/flag-uk.png" style="vertical-align: middle;"/>  +447465832167<br/>
</a><strong> SMS received:23324<section style="border:none; height: auto; padding: 1px; width: auto; background: #33FF66;"></section></strong></div>
</div>, <div class="Cell">
<div>Germany<br/>
<a href="4915902933699-Germany"><img alt="SMS - Germany" src="flags/german_flag.gif" style="vertical-align: middle;"/>  +4915902933699<br/>
</a><strong> SMS received:21739<section style="border:none; height: auto; padding: 1px; width: auto; background: #33FF66;"></section></strong></div>
</div>, <div class="Cell">
<div>India<br/>
...
]
结果:

<div class="Table">
<div class="Title"> </div>
<div class="Row">
<div class="Cell">
<div>United Kingdom<br>
<a href="447465832167-UnitedKingdom"><img src="flags/flag-uk.png" alt="SMS - United Kingdom" style="vertical-align: middle;">&nbsp;&nbsp;+447465832167<br>
</a><strong> SMS received:23304<section style="border:none; height: auto; padding: 1px; width: auto; background: #33FF66;"></section></strong></div>
</div>
<div class="Cell">
<div>Germany<br>
<a href="4915902933699-Germany"><img src="flags/german_flag.gif" alt="SMS - Germany" style="vertical-align: middle;">&nbsp;&nbsp;+4915902933699<br>
</a><strong> SMS received:21712<section style="border:none; height: auto; padding: 1px; width: auto; background: #33FF66;"></section></strong></div>
</div>
<div class="Cell">
<div>India<br>
<a href="919532351442-India"><img src="flags/flag-india.png" alt="SMS - India" style="vertical-align: middle;">&nbsp;&nbsp;+919532351442<br>
</a><strong> SMS received:1593<section style="border:none; height: auto; padding: 1px; width: auto; background: #33FF66;"></section></strong></div>
</div>
....
</div>
...
soup = bs.BeautifulSoup(source, parser)
mydivs = soup.findAll("div", {"class": "Cell"})

[<div class="Cell">
<div>United Kingdom<br/>
<a href="447465832167-UnitedKingdom"><img alt="SMS - United Kingdom" src="flags/flag-uk.png" style="vertical-align: middle;"/>  +447465832167<br/>
</a><strong> SMS received:23324<section style="border:none; height: auto; padding: 1px; width: auto; background: #33FF66;"></section></strong></div>
</div>, <div class="Cell">
<div>Germany<br/>
<a href="4915902933699-Germany"><img alt="SMS - Germany" src="flags/german_flag.gif" style="vertical-align: middle;"/>  +4915902933699<br/>
</a><strong> SMS received:21739<section style="border:none; height: auto; padding: 1px; width: auto; background: #33FF66;"></section></strong></div>
</div>, <div class="Cell">
<div>India<br/>
...
]
[
英国
收到的短信:23324 , 德国
收到短信:21739 , 印度
... ]
问题:
如何从中检索电话号码?

您就快到了,请尝试:

>>> [div.a.text.strip() for div in mydivs]
['+447465832167', '+4915902933699', '+919532351442']

非常感谢你!我刚刚找到了另一个解决方案,但你的解决方案要短得多,优雅得多!