Python 美丽的汤抛出“索引器”`

Python 美丽的汤抛出“索引器”`,python,beautifulsoup,Python,Beautifulsoup,我正在使用Python 2.7和Beauty Soup 3.2抓取一个网站。我对这两种语言都是新手,但从文档中我开始了一些学习 我正在阅读以下文档: 我现在所做和拥有的(失败的部分): tag的内容。awayteamsTd的内容如下所示: [ [<img class="flag" src="/gfx/flags/nl.gif" alt="nl" />, u'Harkemase Boys', <img src="/gfx/favourite_off.gif" alt="

我正在使用Python 2.7和Beauty Soup 3.2抓取一个网站。我对这两种语言都是新手,但从文档中我开始了一些学习

我正在阅读以下文档:

我现在所做和拥有的(失败的部分):

tag的内容。awayteamsTd的内容如下所示:

[
    [<img class="flag" src="/gfx/flags/nl.gif" alt="nl" />, u'Harkemase Boys', <img src="/gfx/favourite_off.gif" alt="fav icon" class="fav off" id="team-6077" />],
    [<img class="flag" src="/gfx/flags/nl.gif" alt="nl" />, u'RKC Waalwijk', <img src="/gfx/favourite_off.gif" alt="fav icon" class="fav off" id="team-427" />],
    [<img class="flag" src="/gfx/flags/nl.gif" alt="nl" />, u'Dutch KNVB Beker', <img src="/gfx/favourite_off.gif" alt="fav icon" class="fav off" id="team-6758" />],
    [<img class="flag" src="/gfx/flags/nl.gif" alt="nl" />, u'PSV', <img src="/gfx/favourite_off.gif" alt="fav icon" class="fav off" id="team-3" />],
    [<img class="flag" src="/gfx/flags/nl.gif" alt="nl" />, u'Ajax', <img src="/gfx/favourite_off.gif" alt="fav icon" class="fav off" id="team-2" />],
    [<img class="flag" src="/gfx/flags/nl.gif" alt="nl" />, u'Dutch KNVB Beker', <img src="/gfx/favourite_off.gif" alt="fav icon" class="fav off" id="team-6758" />],
    [<img class="flag" src="/gfx/flags/nl.gif" alt="nl" />, u'SC Heerenveen', <img src="/gfx/favourite_off.gif" alt="fav icon" class="fav off" id="team-14" />],
    [<img class="flag" src="/gfx/flags/nl.gif" alt="nl" />, u'Feyenoord', <img src="/gfx/favourite_off.gif" alt="fav icon" class="fav off" id="team-9" />],
    [<img class="flag" src="/gfx/flags/nl.gif" alt="nl" />, u'Dutch KNVB Beker', <img src="/gfx/favourite_off.gif" alt="fav icon" class="fav off" id="team-6758" />]
]
[
    [u'Away-team'], 
    [<img src="/gfx/favourite_off.gif" class="fav off" alt="fav icon" id="team-13" />, u'NEC', <img class="flag" src="/gfx/flags/nl.gif" alt="nl" />], 
    [<img src="/gfx/favourite_off.gif" class="fav off" alt="fav icon" id="team-11" />, u'Heracles', <img class="flag" src="/gfx/flags/nl.gif" alt="nl" />], 
    [<img src="/gfx/favourite_off.gif" class="fav off" alt="fav icon" id="team-428" />, u'Stormvogels Telstar', <img class="flag" src="/gfx/flags/nl.gif" alt="nl" />], 
    [<img src="/gfx/favourite_off.gif" class="fav off" alt="fav icon" id="team-419" />, u'FC Volendam', <img class="flag" src="/gfx/flags/nl.gif" alt="nl" />],
    [<img src="/gfx/favourite_off.gif" class="fav off" alt="fav icon" id="team-7" />, u'FC Twente', <img class="flag" src="/gfx/flags/nl.gif" alt="nl" />],
    [<img src="/gfx/favourite_off.gif" class="fav off" alt="fav icon" id="team-415" />, u'FC Dordrecht', <img class="flag" src="/gfx/flags/nl.gif" alt="nl" />]
]
[
[u‘客场球队’],
[,u'NEC',],
[,u'Heracles',],
[,u'Stormvogels Telstar',],
[,u'FC'vonaldam',],
[,u'FC Twente',],
[,u'FC Dordrecht',]
]
我试图解决但尚未完全解决的问题是:

  • 代码
    awayteams=[tag.contents[1]用于awayteamsTd中的标记]
    出现错误:
    索引器:列表索引超出范围
    。这当然是正确的,因为正如您在awayteamsTd
    tag.contents
    输出中所看到的,有第一个条目
    [u'Away-team']
    。这就是它失败的原因。但是我如何删除/跳过这个
  • 在HomeTeam输出中,一切正常,但我想排除出现文本Dutch KNVB Beker的那些
问题在于“远离”单元格(列名)位于带有“远离”类的td内:

此外,如果您想从主队列表中排除
Dutch KNVB Beker
,请在列表理解表达式中添加一个条件:

hometeams = [tag.contents[1] for tag in hometeamsTd if tag.contents[1] != 'Dutch KNVB Beker']
问题是“远离”单元格(列名)位于带有“远离”类的td内:

此外,如果您想从主队列表中排除
Dutch KNVB Beker
,请在列表理解表达式中添加一个条件:

hometeams = [tag.contents[1] for tag in hometeamsTd if tag.contents[1] != 'Dutch KNVB Beker']
<thead class="title">
    ...
    <tr class="sub">
      ...  
      <td>Home-team</td>
      <td></td>
      <td class="away">Away-team</td>
      <td class="broadcast">Broadcast</td>
    </tr>
  </thead>
</thead>
awayteamsTd = soup.findAll('td', { "class" : "away" })[1:]
hometeams = [tag.contents[1] for tag in hometeamsTd if tag.contents[1] != 'Dutch KNVB Beker']