Python 3.x 表中行中的刮除特定td_Python 3.x

Python 3.x 表中行中的刮除特定td

python-3.x

Python 3.x 表中行中的刮除特定td,python-3.x,Python 3.x,我正在废弃一个网站以获取某些数据。我的代码在某种程度上运行良好。它找到我想要的特定表和行，然后选择单元格并将它们放入dict中。我的问题是选择行中的最后一个单元格导入urllib 导入urllib.request 从bs4导入BeautifulSoup 进口稀土导入操作系统作为pd进口熊猫 URL=”http://www.nationsonline.org/oneworld/IATA_Codes/airport_code_list.htm" thepage=urllib thepage=

我正在废弃一个网站以获取某些数据。我的代码在某种程度上运行良好。它找到我想要的特定表和行，然后选择单元格并将它们放入dict中。我的问题是选择行中的最后一个单元格

导入urllib
导入urllib.request
从bs4导入BeautifulSoup
进口稀土
导入操作系统
作为pd进口熊猫
URL=”http://www.nationsonline.org/oneworld/IATA_Codes/airport_code_list.htm"
thepage=urllib
thepage=urllib.request.urlopen（theurl）
soup=BeautifulSoup（页面“html.parser”）
空气=[]
init_data=open（'/Users/paribaker/Desktop/air.txt'，'a'）
计数=0
虽然count一些调试语句显示某些行中只有两个
单元格。事实上，这对于行中的第一行是正确的：
for i, row in enumerate(rows):
    print("Row {}:\n".format(i))
    for j, td in enumerate(row.find_all('td')):
        print(" Cell {}:\n{}".format(j, td))
    try:
        col3 = row.find_all('td')[2]
    except IndexError as e:
        print("ERROR on Row {}: {}".format(i, e))
        break

输出：
Row 0:

 Cell 0:
<td style="width:730px;"><script async="" src="http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script><!-- Top-Banner 728x90, Erstellt 25.12.09 --><ins class="adsbygoogle" data-ad-client="ca-pub-7193398479241689" data-ad-slot="6570665833" style="display:inline-block;width:728px;height:90px"></ins><script>(adsbygoogle = window.adsbygoogle || []).push({});</script></td>

 Cell 1:
<td class="logotd"><a href="/oneworld/first.shtml"><img alt="Nations Online Logo" class="displayed" height="60" src="/buttons/OWNO_logo06-60.png" width="60"/>    </a><br><b>One World<br>Nations Online</br></b></br></td>

ERROR on Row 0: list index out of range

City: Aarhus, Country: Denmark, Code: AAR
City: Abadan, Country: Iran, Code: ABD
City: Abeche, Country: Chad, Code: AEH
...
City: Zinder, Country: Niger, Code: ZND
City: Zouerate, Country: Mauritania, Code: OUZ
City: Zurich (Zürich) - Kloten, Country: Switzerland, Code: ZRH

输出：
Row 0:

 Cell 0:
<td style="width:730px;"><script async="" src="http://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script><!-- Top-Banner 728x90, Erstellt 25.12.09 --><ins class="adsbygoogle" data-ad-client="ca-pub-7193398479241689" data-ad-slot="6570665833" style="display:inline-block;width:728px;height:90px"></ins><script>(adsbygoogle = window.adsbygoogle || []).push({});</script></td>

 Cell 1:
<td class="logotd"><a href="/oneworld/first.shtml"><img alt="Nations Online Logo" class="displayed" height="60" src="/buttons/OWNO_logo06-60.png" width="60"/>    </a><br><b>One World<br>Nations Online</br></b></br></td>

ERROR on Row 0: list index out of range

City: Aarhus, Country: Denmark, Code: AAR
City: Abadan, Country: Iran, Code: ABD
City: Abeche, Country: Chad, Code: AEH
...
City: Zinder, Country: Niger, Code: ZND
City: Zouerate, Country: Mauritania, Code: OUZ
City: Zurich (Zürich) - Kloten, Country: Switzerland, Code: ZRH

嗯，我不是在刮每一张桌子或每一排。我只擦第三张桌子，然后是除第一行以外的每一行。每行有三个数据点：城市、国家和代码。我想把这三个都收集起来。查看表，每一行在所有3中都有一个值，我是否遗漏了什么？看起来是这样的-我发布的输出直接来自您的代码，只是添加了调试语句。这些单元格内容绝对不是城市/国家/代码数据。请参阅我的更新答案-我已经展示了一种方法，可以缩小范围，缩小到您感兴趣的数据。通常，使用BeautifulSoup属性筛选（类、名称、id等）以及元素标记名（td、tr等）来获得您想要的内容。