Python 3.x 使用嵌套for循环、python3中的BeautifulSoup进行网页抓取
我正在抓取一个html文件。 我编写了以下代码Python 3.x 使用嵌套for循环、python3中的BeautifulSoup进行网页抓取,python-3.x,beautifulsoup,Python 3.x,Beautifulsoup,我正在抓取一个html文件。 我编写了以下代码 with open('Basic Materials.htm') as fp: soup=BeautifulSoup(fp,'lxml') table=soup.find('div',{'class':'sfe-break-bottom'}) for row in table.find_all('tr'): cells=row.find_all('td') print(cells) 现在,打
with open('Basic Materials.htm') as fp:
soup=BeautifulSoup(fp,'lxml')
table=soup.find('div',{'class':'sfe-break-bottom'})
for row in table.find_all('tr'):
cells=row.find_all('td')
print(cells)
现在,打印输出(单元格)如下所示:
[<td colspan="2" style="text-align:left"><b>Gainers (% price change)</b>
</td>, <td width="15%">Last Trade
</td>, <td width="20%">Change
</td>, <td width="15%">
Mkt Cap
</td>]
[<td style="text-align:left;">
<a href="/finance?q=NYSE:GFI&ei=H7pKWbBtgoabAZ7Kv7gI">Gold Fields Limited (ADR)</a>
</td>, <td style="text-align:left;">
<a href="/finance?q=NYSE:GFI&ei=H7pKWbBtgoabAZ7Kv7gI">GFI</a>
</td>, <td>3.53
</td>, <td width="20%">
<span class="chg">+0.11</span>
<span class="chg">(3.22%)</span>
</td>, <td>2.84B
</td>]
[<td style="text-align:left;">
<a href="/finance?q=NYSE:VALE&ei=H7pKWbBtgoabAZ7Kv7gI">Vale SA (ADR)</a>
</td>, <td style="text-align:left;">
<a href="/finance?q=NYSE:VALE&ei=H7pKWbBtgoabAZ7Kv7gI">VALE</a>
</td>, <td>7.94
</td>, <td width="20%">
<span class="chg">+0.17</span>
<span class="chg">(2.19%)</span>
</td>, <td>39.61B
</td>]
[<td style="text-align:left;">
<a href="/finance?q=NYSE:CLF&ei=H7pKWbBtgoabAZ7Kv7gI">Cliffs Natural Resources</a>
</td>, <td style="text-align:left;">
<a href="/finance?q=NYSE:CLF&ei=H7pKWbBtgoabAZ7Kv7gI">CLF</a>
</td>, <td>5.97
</td>, <td width="20%">
<span class="chg">+0.12</span>
<span class="chg">(2.14%)</span>
</td>, <td>1.69B
</td>]
[<td style="text-align:left;">
<a href="/finance?q=NYSE:AUY&ei=H7pKWbBtgoabAZ7Kv7gI">Yamana Gold Inc. (USA)</a>
</td>, <td style="text-align:left;">
<a href="/finance?q=NYSE:AUY&ei=H7pKWbBtgoabAZ7Kv7gI">AUY</a>
</td>, <td>2.40
</td>, <td width="20%">
<span class="chg">+0.05</span>
<span class="chg">(1.91%)</span>
</td>, <td>2.27B
</td>]
[<td style="text-align:left;">
<a href="/finance?q=NYSE:HL&ei=H7pKWbBtgoabAZ7Kv7gI">Hecla Mining Company</a>
</td>, <td style="text-align:left;">
<a href="/finance?q=NYSE:HL&ei=H7pKWbBtgoabAZ7Kv7gI">HL</a>
</td>, <td>5.20
</td>, <td width="20%">
<span class="chg">+0.09</span>
<span class="chg">(1.86%)</span>
</td>, <td>2.03B
</td>]
[<td colspan="2" style="text-align:left"><b>Losers (% price change)</b>
</td>, <td colspan="3">
</td>]
[<td style="text-align:left;">
<a href="/finance?cid=717954&ei=H7pKWbBtgoabAZ7Kv7gI">Jaguar Mining Inc (USA)</a>
</td>, <td style="text-align:left;">
<a href="/finance?cid=717954&ei=H7pKWbBtgoabAZ7Kv7gI"></a>
</td>, <td>11.92
</td>, <td width="20%">
<span class="chr">-0.74</span>
<span class="chr">(-5.85%)</span>
</td>, <td>2.52B
</td>]
[<td style="text-align:left;">
<a href="/finance?q=NYSE:OLN&ei=H7pKWbBtgoabAZ7Kv7gI">Olin Corporation</a>
</td>, <td style="text-align:left;">
<a href="/finance?q=NYSE:OLN&ei=H7pKWbBtgoabAZ7Kv7gI">OLN</a>
</td>, <td>28.64
</td>, <td width="20%">
<span class="chr">-1.52</span>
<span class="chr">(-5.04%)</span>
</td>, <td>4.81B
</td>]
[<td style="text-align:left;">
<a href="/finance?q=NASDAQ:GPRE&ei=H7pKWbBtgoabAZ7Kv7gI">Green Plains Inc</a>
</td>, <td style="text-align:left;">
<a href="/finance?q=NASDAQ:GPRE&ei=H7pKWbBtgoabAZ7Kv7gI">GPRE</a>
</td>, <td>19.12
</td>, <td width="20%">
<span class="chr">-0.98</span>
<span class="chr">(-4.85%)</span>
</td>, <td>708.77M
</td>]
[<td style="text-align:left;">
<a href="/finance?q=NYSE:IPI&ei=H7pKWbBtgoabAZ7Kv7gI">Intrepid Potash, Inc.</a>
</td>, <td style="text-align:left;">
<a href="/finance?q=NYSE:IPI&ei=H7pKWbBtgoabAZ7Kv7gI">IPI</a>
</td>, <td>2.09
</td>, <td width="20%">
<span class="chr">-0.09</span>
<span class="chr">(-4.13%)</span>
</td>, <td>261.35M
</td>]
[<td style="text-align:left;">
<a href="/finance?q=NASDAQ:CENX&ei=H7pKWbBtgoabAZ7Kv7gI">Century Aluminum Co</a>
</td>, <td style="text-align:left;">
<a href="/finance?q=NASDAQ:CENX&ei=H7pKWbBtgoabAZ7Kv7gI">CENX</a>
</td>, <td>13.62
</td>, <td width="20%">
<span class="chr">-0.56</span>
<span class="chr">(-3.95%)</span>
</td>, <td>1.17B
</td>]
[<td colspan="2" style="text-align:left"><b>Most Actives (dollar volume)</b>
</td>, <td colspan="3">
</td>]
[<td style="text-align:left;">
<a href="/finance?q=NYSE:X&ei=H7pKWbBtgoabAZ7Kv7gI">United States Steel Corp.</a>
</td>, <td style="text-align:left;">
<a href="/finance?q=NYSE:X&ei=H7pKWbBtgoabAZ7Kv7gI">X</a>
</td>, <td>21.27
</td>, <td width="20%">
<span class="chg">+0.20</span>
<span class="chg">(0.95%)</span>
</td>, <td>3.77B
</td>]
[<td style="text-align:left;">
<a href="/finance?q=NYSE:DOW&ei=H7pKWbBtgoabAZ7Kv7gI">Dow Chemical Co</a>
</td>, <td style="text-align:left;">
<a href="/finance?q=NYSE:DOW&ei=H7pKWbBtgoabAZ7Kv7gI">DOW</a>
</td>, <td>64.01
</td>, <td width="20%">
<span class="chr">-1.09</span>
<span class="chr">(-1.67%)</span>
</td>, <td>78.06B
</td>]
[<td style="text-align:left;">
<a href="/finance?q=NYSE:NUE&ei=H7pKWbBtgoabAZ7Kv7gI">Nucor Corporation</a>
</td>, <td style="text-align:left;">
<a href="/finance?q=NYSE:NUE&ei=H7pKWbBtgoabAZ7Kv7gI">NUE</a>
</td>, <td>56.15
</td>, <td width="20%">
<span class="chg">+0.02</span>
<span class="chg">(0.04%)</span>
</td>, <td>18.02B
</td>]
[<td style="text-align:left;">
<a href="/finance?q=NYSE:VALE&ei=H7pKWbBtgoabAZ7Kv7gI">Vale SA (ADR)</a>
</td>, <td style="text-align:left;">
<a href="/finance?q=NYSE:VALE&ei=H7pKWbBtgoabAZ7Kv7gI">VALE</a>
</td>, <td>7.94
</td>, <td width="20%">
<span class="chg">+0.17</span>
<span class="chg">(2.19%)</span>
</td>, <td>39.61B
</td>]
[<td style="text-align:left;">
<a href="/finance?q=NYSE:MT&ei=H7pKWbBtgoabAZ7Kv7gI">ArcelorMittal SA (ADR)</a>
</td>, <td style="text-align:left;">
<a href="/finance?q=NYSE:MT&ei=H7pKWbBtgoabAZ7Kv7gI">MT</a>
</td>, <td>20.16
</td>, <td width="20%">
<span class="chg">+0.28</span>
<span class="chg">(1.38%)</span>
</td>, <td>20.06B
</td>][/python]
[python]
with open('Basic Materials.htm') as fp:
soup=BeautifulSoup(fp,'lxml')
table=soup.find('div',{'class':'sfe-break-bottom'})
for row in table.find_all('tr'):
cells=row.find_all('td')
for link in cells.find_all('a', limit=3):
print(link.get_text()) # gets the name
print(link.get('href')) # gets the links
但是我得到了以下错误
AttributeError回溯(最近的调用)
最后)
在()
4用于表中的行。查找所有('tr'):
5个单元格=行。查找所有('td'))
---->6用于单元格中的链接。查找所有('a',限制=3):
7打印(link.get_text())#获取名称
8打印(link.get('href'))#获取链接
getattr中的~\Anaconda3\envs\practice\lib\site packages\bs4\element.py(self,key)
1805 defgetattr(自身,键):
1806提高属性错误(
->1807“ResultSet对象没有属性“%s”。您可能将项目列表视为单个项目。是否
要调用find()?%key时,请调用find_all()
1808 )
AttributeError:ResultSet对象没有“全部查找”属性。您可能将项目列表视为单个项目。是吗
要调用find()时,请调用find_all()
你能告诉我为什么会出现这个错误吗?
如何获得前3个“a”和带有这些标记的文本。
谢谢
单元格
是一个列表,因此您不能直接调用该方法。从中,尝试创建一个列表来替换您所指的单元格。查找所有('a',limit=3)
,您可以执行以下操作:
for cell in cells:
atags = cell.findAll('a',limit=3)
for link in atags:
print(link.text)
print(link['href'])
或使用列表理解:
atags = [cell.findAll('a',limit=3) for cell in cells]
for link in atags:
print(link[0].text)
print(link[0]['href'])
@Jonelya很乐意帮忙,如果有用请告诉我!我按照您的指导更改了代码:open('Basic Materials.htm')为fp:soup=BeautifulSoup(fp,'lxml')table=soup.find('div',{'class':'sfe-break-bottom'})为表中的行。find_all('tr'):cells=row.find_all('td')atags=[cell.find_all('a',limit=3)为单元格中的单元格]为ATag中的链接:print(links.get_text())为links('href'))但我再次收到错误消息:AttributeError:ResultSet对象没有“get_text”属性。@Jonelya如果正确,请检查我的编辑,改为尝试.text
。无法工作。我尝试了打印(link.text())和打印(link['href'])另外还有…但不起作用你的兄弟…起作用了…谢谢你如果阿吉亚尔先生提供了你需要的答案,那么你应该,请,标记他的答案“接受”。我收到了错误。这回答了你的问题吗?