Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/three.js/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python beautifulsoup未捕获完整表_Python_Beautifulsoup_Mechanize - Fatal编程技术网

Python beautifulsoup未捕获完整表

Python beautifulsoup未捕获完整表,python,beautifulsoup,mechanize,Python,Beautifulsoup,Mechanize,由于mechanize 这项工作: from bs4 import BeautifulSoup import requests page = 'http://www.airchina.com.cn/www/jsp/airlines_operating_data/exlshow_en.jsp' r = requests.get(page) r.encoding = 'utf-8' soup = BeautifulSoup(r.text) div = soup.find('div', clas

由于
mechanize

这项工作:

from bs4 import BeautifulSoup
import requests

page = 'http://www.airchina.com.cn/www/jsp/airlines_operating_data/exlshow_en.jsp'
r = requests.get(page)

r.encoding = 'utf-8'
soup = BeautifulSoup(r.text)

div = soup.find('div', class_='mainRight').find_all('div')[1]
table = div.find('table', recursive=False)

for row in table.find_all('tr', recursive=False):
    for cell in row('td', recursive=False):
        print cell.text.split()
但这并不是:

import mechanize
from bs4 import BeautifulSoup
import requests

URL='http://www.airchina.com.cn/www/jsp/airlines_operating_data/exlshow_en.jsp'
control_year=['2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014']
control_month=['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']

br = mechanize.Browser()
r=br.open(URL)

br.select_form("exl")
control_m = br.form.find_control('month')
control_y = br.form.find_control('year')

br[control_m.name]=['06'] 
br[control_y.name]=['2012']
response = br.submit()
soup = BeautifulSoup(response,'html.parser')
#div = soup.find('div', class_='mainRight')


div = soup.find('div', class_='mainRight').find_all('div')[1]
table = div.find('table', recursive=False)
for row in table.find_all('tr', recursive=False):
    for cell in row('td', recursive=False):
        print cell.text.strip()
使用
mechanize
的只生成以下内容,即使在firebug中我看到了所有
tr
td

Jun 2012
% change vs Jun 2011
% change vs May 2012
Cumulative Jun 2012
% cumulative change

将两者结合使用时,不会出现问题,因此它可能与您正在使用的
html.parser
相关

import mechanize
from bs4 import BeautifulSoup

URL = ('http://www.airchina.com.cn/www/jsp/airlines_operating_data/'
       'exlshow_en.jsp')
control_year = ['2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013',
                '2014']
control_month = ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10',
                 '11', '12']

br = mechanize.Browser()
r = br.open(URL)

br.select_form("exl")
control_m = br.form.find_control('month')
control_y = br.form.find_control('year')

br[control_m.name] = ['06']
br[control_y.name] = ['2012']
response = br.submit()

soup = BeautifulSoup(response)

div = soup.find('div', class_='mainRight').find_all('div')[1]
table = div.find('table', recursive=False)

for row in table.find_all('tr', recursive=False):
    for cell in row('td', recursive=False):
        print cell.text.split()

很可能它会自动在表中添加
tbody
元素。尝试在
tr
@Wolph之前的
table
中的所有
tbody
中循环。我尝试了
表。find_all('tbody')
但返回
[]
我相信它可能与您正在使用的
html.parser
有关,请参阅我的答案以获得有效的工作版本!谢谢你的帮助。我从来没有想过。我不理解
html.parser
。有时候这是唯一有效的方法。有时它不起作用。
html.parser
是Python发行版的一部分,所以它总是可用的。但这并不是你能得到的最好的
lxml
速度更快、效率更高,但它是一个独立的依赖项,这意味着您需要自己安装它。您可以在此处找到解析器列表: