Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/293.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何从<;td>;python中的表_Python_Xml_Csv_Beautifulsoup_Web Crawler - Fatal编程技术网

如何从<;td>;python中的表

如何从<;td>;python中的表,python,xml,csv,beautifulsoup,web-crawler,Python,Xml,Csv,Beautifulsoup,Web Crawler,我正在使用python和beautifulsoup来刮表。我想从以下内容的html表中提取URL: <tbody> <tr> <td colspan="4" style="height:10px"></td> </tr> <tr class="header" id="a"> <td class=&q

我正在使用python和beautifulsoup来刮表。我想从以下内容的html表中提取URL:

<tbody>
   <tr>
      <td colspan="4" style="height:10px"></td>
   </tr>
   <tr class="header" id="a">
      <td class="w40 hidden-xs hidden-sm hidden-xxs"> </td>
      <td>A</td>
      <td><a class="fa fa-angle-up goToTop pull-right" href="#" onclick="$('html, body').animate({scrollTop: 0}, 1000);return false;" title="Scroll to top"></a></td>
      <td class="w40 hidden-xs hidden-sm hidden-xxs"> </td>
   </tr>
   <tr>
      <td class="w40 hidden-xs hidden-sm hidden-xxs"> </td>
      <td colspan="2"><a data-iata="ARD" data-lat="-8.13234" data-lon="124.597" href="https://www.flightradar24.com/data/airports/ard" title="Alor Island Airport"><img class="icon-airport" src="https://www.flightradar24.com/static/images/airport_pin_40_blue.png"/> Alor Island Airport <small>(ARD/WATM)</small> </a> </td>
      <td class="w40 hidden-xs hidden-sm hidden-xxs"> </td>
   </tr>
   <tr>
      <td class="w40 hidden-xs hidden-sm hidden-xxs"> </td>
      <td colspan="2"><a data-iata="AMQ" data-lat="-3.71026" data-lon="128.089096" href="https://www.flightradar24.com/data/airports/amq" title="Ambon Pattimura Airport"><img class="icon-airport" src="https://www.flightradar24.com/static/images/airport_pin_40_blue.png"/> Ambon Pattimura Airport <small>(AMQ/WAPP)</small> </a> <span class="pull-right">Rating: 79%</span> </td>
      <td class="w40 hidden-xs hidden-sm hidden-xxs"> </td>
   </tr>
   <tr>
      <td class="w40 hidden-xs hidden-sm hidden-xxs"> </td>
      <td colspan="2"><a data-iata="ABU" data-lat="-9.07444" data-lon="124.904404" href="https://www.flightradar24.com/data/airports/abu" title="Atambua Haliwen Airport"><img class="icon-airport" src="https://www.flightradar24.com/static/images/airport_pin_40_blue.png"/> Atambua Haliwen Airport <small>(ABU/WATA)</small> </a> </td>
      <td class="w40 hidden-xs hidden-sm hidden-xxs"> </td>
   </tr>
   <tr class="header" id="b">
      <td class="w40 hidden-xs hidden-sm hidden-xxs"> </td>
      <td>B</td>
      <td><a class="fa fa-angle-up goToTop pull-right" href="#" onclick="$('html, body').animate({scrollTop: 0}, 1000);return false;" title="Scroll to top"></a></td>
      <td class="w40 hidden-xs hidden-sm hidden-xxs"> </td>
   </tr>
   <tr>
      <td class="w40 hidden-xs hidden-sm hidden-xxs"> </td>
      <td colspan="2"><a data-iata="BXB" data-lat="-2.53224" data-lon="133.438797" href="https://www.flightradar24.com/data/airports/bxb" title="Babo Airport"><img class="icon-airport" src="https://www.flightradar24.com/static/images/airport_pin_40_blue.png"/> Babo Airport <small>(BXB/WASO)</small> </a> </td>
      <td class="w40 hidden-xs hidden-sm hidden-xxs"> </td>
   </tr>
   <tr>
      <td class="w40 hidden-xs hidden-sm hidden-xxs"> </td>
      <td colspan="2"><a data-iata="BJW" data-lat="-8.7125" data-lon="121.0625" href="https://www.flightradar24.com/data/airports/bjw" title="Bajawa Turelelo Soa Airport"><img class="icon-airport" src="https://www.flightradar24.com/static/images/airport_pin_40_blue.png"/> Bajawa Turelelo Soa Airport <small>(BJW/WATB)</small> </a> </td>
      <td class="w40 hidden-xs hidden-sm hidden-xxs"> </td>
   </tr>
   <tr>
      <td class="w40 hidden-xs hidden-sm hidden-xxs"> </td>
      <td colspan="2"><a data-iata="BPN" data-lat="-1.26827" data-lon="116.894402" href="https://www.flightradar24.com/data/airports/bpn" title="Balikpapan Sepinggan Airport"><img class="icon-airport" src="https://www.flightradar24.com/static/images/airpo,...
代码如下

bs=BeautifulSoup(page.content, 'html.parser')
table_body=bs.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
    cols=row.find_all('td')
    for link in cols:
        a = link.get("href")
        print(a)

但是我得到了
在Python中有什么方法可以这样做吗?

您缺少了一个循环
href
包含在
a
标记中

下面的代码输出正确

from bs4 import BeautifulSoup
bs=BeautifulSoup(html, 'html.parser')
table_body=bs.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
    cols=row.find_all('td')
    for col in cols:
        a_list = col.find_all('a')
        for a in a_list:
            href = a.get("href")
            print(href)
输出

#
https://www.flightradar24.com/data/airports/ard
https://www.flightradar24.com/data/airports/amq
https://www.flightradar24.com/data/airports/abu
#
https://www.flightradar24.com/data/airports/bxb
https://www.flightradar24.com/data/airports/bjw
https://www.flightradar24.com/data/airports/bpn

你错过了一个循环
href
包含在
a
标记中

下面的代码输出正确

from bs4 import BeautifulSoup
bs=BeautifulSoup(html, 'html.parser')
table_body=bs.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
    cols=row.find_all('td')
    for col in cols:
        a_list = col.find_all('a')
        for a in a_list:
            href = a.get("href")
            print(href)
输出

#
https://www.flightradar24.com/data/airports/ard
https://www.flightradar24.com/data/airports/amq
https://www.flightradar24.com/data/airports/abu
#
https://www.flightradar24.com/data/airports/bxb
https://www.flightradar24.com/data/airports/bjw
https://www.flightradar24.com/data/airports/bpn
要从
中刮取所有
,您可以使用CSS选择器:
tbody td a[data iata][href]
,这意味着“所有
a
都有
data iata
属性,该属性在
tbody
下包含
href

输出:

https://www.flightradar24.com/data/airports/ard
https://www.flightradar24.com/data/airports/amq
https://www.flightradar24.com/data/airports/abu
https://www.flightradar24.com/data/airports/bxb
https://www.flightradar24.com/data/airports/bjw
https://www.flightradar24.com/data/airports/bpn
要从
中刮取所有
,您可以使用CSS选择器:
tbody td a[data iata][href]
,这意味着“所有
a
都有
data iata
属性,该属性在
tbody
下包含
href

输出:

https://www.flightradar24.com/data/airports/ard
https://www.flightradar24.com/data/airports/amq
https://www.flightradar24.com/data/airports/abu
https://www.flightradar24.com/data/airports/bxb
https://www.flightradar24.com/data/airports/bjw
https://www.flightradar24.com/data/airports/bpn