Python 使用BeautifulSoup进行刮削时出现问题_Python_Beautifulsoup_Web Scripting

Python 使用BeautifulSoup进行刮削时出现问题

python

Python 使用BeautifulSoup进行刮削时出现问题,python,beautifulsoup,web-scripting,Python,Beautifulsoup,Web Scripting,我正在尝试使用BeautifulSoup刮取url torrents = bs.findAll('tr',id = re.compile('torrent_*')) torrents获取该页面上的所有torrents，现在torrents的每个元素都包含一个tr元素我的问题是len（torrents[0].td）是5，但我无法迭代td。我的意思是类似于torrents[o]中x的。td不起作用我为torrent[0]获取的数据是： <tr class="odd" id="torren

我正在尝试使用BeautifulSoup刮取url

torrents = bs.findAll('tr',id = re.compile('torrent_*'))

torrents获取该页面上的所有torrents，现在torrents的每个元素都包含一个tr元素

我的问题是len（torrents[0].td）是5，但我无法迭代td。我的意思是类似于torrents[o]中x的

。td

不起作用

我为torrent[0]获取的数据是：

<tr class="odd" id="torrent_2962816">
<td class="fontSize12px torrentnameCell">
<div class="iaconbox floatedRight">
<a title="Torrent magnet link" href="magnet:?xt=urn:btih:0898a4b562c1098eb69b9b801c61a51d788df0f5&amp;dn=the+beatles+2009+greatest+hits+cdrip+ikmn+reupld&amp;tr=http%3A%2F%2Ftracker.publicbt.com%2Fannounce" onclick="_gaq.push(['_trackEvent', 'Download', 'Magnet Link', 'Music']);" class="imagnet icon16"></a>
<a title="Download torrent file" href="http://torrage.com/torrent/0898A4B562C1098EB69B9B801C61A51D788DF0F5.torrent?title=[kat.ph]the.beatles.2009.greatest.hits.cdrip.ikmn.reupld" onclick="_gaq.push(['_trackEvent', 'Download', 'Download torrent file', 'Music']);" class="idownload icon16"></a>
<a class="iPartner2 icon16" href="http://www.downloadweb.org/checking.php?acode=b146a357c57fddd450f6b5c446108672&amp;r=d&amp;qb=VGhlIEJlYXRsZXMgWzIwMDldIEdyZWF0ZXN0IEhpdHMgQ0RSaXAtIGlLTU4gUmVVUGxk" onclick="_gaq.push(['_trackEvent', 'Download', 'Download movie']);"></a>
<a class="iverif icon16" href="/the-beatles-2009-greatest-hits-cdrip-ikmn-reupld-t2962816.html" title="Verified Torrent"></a> <a rel="2962816,0" class="icomment" href="/the-beatles-2009-greatest-hits-cdrip-ikmn-reupld-t2962816.html#comments_tab">
<span class="icommentdiv"></span>145
    </a>
</div>
<div class="torrentname">
<a href="/the-beatles-2009-greatest-hits-cdrip-ikmn-reupld-t2962816.html" class="torType musicType"></a>
<a href="/the-beatles-2009-greatest-hits-cdrip-ikmn-reupld-t2962816.html">The <strong class="red">Beatles</strong> [2009] Greatest Hits CDRip- iKMN ReUPld</a>
<span>
                Posted by <a class="plain" href="/user/iKMN/">iKMN</a>
<img src="http://static.kat.ph/images/verifup.png" alt="verified" /> in 
                    <span id="cat_2962816">
<a href="/music/">Music</a>
</span></span>
</div>
</td>
<td class="nobr">168.26 <span>MB</span></td>
<td>42</td>
<td>1&nbsp;year</td>
<td class="green">1368</td>
<td class="red lasttd">94</td>
</tr>


邮寄人
在里面
168.26 MB
42
一年
1368
94

我建议您使用或代替BeautifulSoup，您还可以使用xpath获取链接：

import lxml.html
doc = lxml.html.parse('http://www.kat.ph/search/beatles/?categories[]=music')
links = doc.xpath('//a[contains(@class,"idownload")]/@href')

你为RIAA工作吗？为什么要找len（torrents[0].td，然后迭代完全不同的内容？？？@Matt-这仍然是一个完全有效的编程问题，没有说明任何邪恶的意图。@Bunny，你介意编辑这篇文章以包含一个数据样本吗？如果链接是正确的，那么这个问题的整个上下文也是正确的。@Tim我不是在暗示它不是，只是开玩笑。事实上，既然我已经看到了这个问题，我就要开始研究BeautifulSoup了：）你能举一个你想要的输出的例子吗？