Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/338.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/design-patterns/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 无法读取用于HTML刮取的列_Python_Beautifulsoup_Mechanize - Fatal编程技术网

Python 无法读取用于HTML刮取的列

Python 无法读取用于HTML刮取的列,python,beautifulsoup,mechanize,Python,Beautifulsoup,Mechanize,我正试图从表中提取数据 我使用了以下代码: #!/usr/bin/env python from mechanize import Browser from BeautifulSoup import BeautifulSoup mech = Browser() url = "http://en.wikipedia.org/wiki/Hybrid_electric_vehicles_in_the_United_States" page = mech.open(url) html = page.r

我正试图从表中提取数据

我使用了以下代码:

#!/usr/bin/env python
from mechanize import Browser
from BeautifulSoup import BeautifulSoup

mech = Browser()
url = "http://en.wikipedia.org/wiki/Hybrid_electric_vehicles_in_the_United_States"
page = mech.open(url)
html = page.read()
soup = BeautifulSoup(html)
table = soup.find("table",{ "class" : "wikitable" })

for row in table.findAll('tr')[1:]:
col = row.findAll('th')
Vehicle = col[0].string
Year1 = col[2].string
Year2 = col[3].string
Year3 = col[4].string
Year4 = col[5].string
Year5 = col[6].string
Year6 = col[7].string
Year7 = col[8].string
Year8 = col[9].string
Year9 = col[10].string
Year10 = col[11].string
Year11 = col[12].string
Year12 = col[13].string
Year13 = col[14].string
Year14 = col[15].string
Year15 = col[16].string
Year16 = col[17].string
record =(Vehicle,Year1,Year2,Year3,Year4,Year5,Year6,Year7,Year8,Year9,Year10,Year11,Year12,Year13,Year14,Year15,Year16)
print "|".join(record)
我得到这个错误

 File "scrap1.ph", line 13
    col = row.findAll('th')
      ^
IndentationError: expected an indented block

有人能告诉我我做错了什么吗。

除了@traceur关于缩进错误的观点外,这里还有一些可以大大简化代码的方法:

from mechanize import Browser
from bs4 import BeautifulSoup

mech = Browser()
url = "http://en.wikipedia.org/wiki/Hybrid_electric_vehicles_in_the_United_States"
soup = BeautifulSoup(mech.open(url))
table = soup.find("table", class_="wikitable")

for row in table('tr')[1:]:
    print "|".join(col.text.strip() for col in row.find_all('th'))
请注意,与其使用来自BeautifulSoup import BeautifulSoup(第三版BeautifulSoup)的
,不如使用来自bs4 import BeautifulSoup
(第四版)的
,因为第三版不再维护

还请注意,您可以将
mech.open(url)
直接传递给
BeautifulSoup
构造函数,而不是手动读取它


希望这有帮助。

我仍然在您的脚本上看到缩进错误。请帮助我如何删除该错误。@Auguster hm,这里没有缩进问题,请检查您是否正确粘贴了代码。我粘贴了相同的代码,但出现了此错误。文件“scrap1.py”,第10行打印“|”。为行中的列连接(col.text.strip()。查找所有('th'))^indentation错误:应为缩进block@Auguster缩进以
print
开头的行。我需要做的就是将打印行与for loop放在同一行中,这可能是因为我正在windows中编辑并使用cygwin运行代码。