Web scraping 使用BeautifulSoup解析Python中的HTML表
我试图从第二个表(即specifications选项卡)开始访问数据,但我的代码只返回第一个表中的数据。通过阅读许多其他帖子,我得出了以下与创建我想要的列表不相符合的结论:Web scraping 使用BeautifulSoup解析Python中的HTML表,web-scraping,beautifulsoup,Web Scraping,Beautifulsoup,我试图从第二个表(即specifications选项卡)开始访问数据,但我的代码只返回第一个表中的数据。通过阅读许多其他帖子,我得出了以下与创建我想要的列表不相符合的结论: from bs4 import BeautifulSoup import csv html = "http://www.carwale.com/marutisuzuki-cars/baleno/sigma12/" html_content = requests.get(html).text soup = BeautifulS
from bs4 import BeautifulSoup
import csv
html = "http://www.carwale.com/marutisuzuki-cars/baleno/sigma12/"
html_content = requests.get(html).text
soup = BeautifulSoup(html_content, "lxml")
table = soup.find("table")
output_rows = []
for table_row in table.findAll('tr'):
columns = table_row.findAll('td')
output_row = []
for column in columns:
output_row.append(column.text)
output_rows.append(output_row)
output_rows
.find()
将只返回找到的第一个元素/标记。您想使用.find_all()
,它将返回所有指定元素/标记的列表
不过,在这种情况下,我可以推荐熊猫吗。熊猫的.read_html()
在引擎盖下使用beautifulsoup,并查找那些
标签。然后,它将它们作为数据帧列表返回。这只是选择所需表的索引位置的问题。查看站点,查找索引位置1-4中返回的表:
import pandas as pd
dfs = pd.read_html('http://www.carwale.com/marutisuzuki-cars/baleno/sigma12/')
result = pd.DataFrame()
for df in dfs[1:5]:
result = result.append(df, sort=False).reset_index(drop=True)
输出:
print (result)
0 1
0 Engine 1197cc, 4 Cylinders Inline, 4 Valves/Cylinder,...
1 Engine Type VVT
2 Fuel Type Petrol
3 Max Power (bhp@rpm) 82 bhp @ 6000 rpm
4 Max Torque (Nm@rpm) 115 Nm @ 4000 rpm
5 Mileage (ARAI) 21.01 kmpl
6 Drivetrain FWD
7 Transmission Manual - 5 Gears
8 Emission Standard BS 6
9 Length 3995 mm
10 Width 1745 mm
11 Height 1510 mm
12 Wheelbase 2520 mm
13 Ground Clearance 170 mm
14 Kerb Weight 865 kg
15 Doors 5 Doors
16 Seating Capacity 5 Person
17 No of Seating Rows 2 Rows
18 Bootspace 339 litres
19 Fuel Tank Capacity 37 litres
20 Suspension Front McPherson Strut
21 Suspension Rear Torsion Beam
22 Front Brake Type Disc
23 Rear Brake Type Drum
24 Minimum Turning Radius 4.9 metres
25 Steering Type Power assisted (Electric)
26 Wheels Steel Rims
27 Spare Wheel Steel
28 Front Tyres 185 / 65 R15
29 Rear Tyres 185 / 65 R15
from bs4 import BeautifulSoup
html = "http://www.carwale.com/marutisuzuki-cars/baleno/sigma12/"
html_content = requests.get(html).text
soup = BeautifulSoup(html_content, "lxml")
tables= soup.select("table.specs:not(.features)")
output_rows = []
for table in tables:
for table_row in table.findAll('tr'):
columns = table_row.findAll('td')
output_row = []
for column in columns:
output_row.append(column.text.strip())
output_rows.append(output_row)
print(output_rows)
当您以特定表为目标时,需要选择表的类名。请尝试以下
css
选择器
代码:
print (result)
0 1
0 Engine 1197cc, 4 Cylinders Inline, 4 Valves/Cylinder,...
1 Engine Type VVT
2 Fuel Type Petrol
3 Max Power (bhp@rpm) 82 bhp @ 6000 rpm
4 Max Torque (Nm@rpm) 115 Nm @ 4000 rpm
5 Mileage (ARAI) 21.01 kmpl
6 Drivetrain FWD
7 Transmission Manual - 5 Gears
8 Emission Standard BS 6
9 Length 3995 mm
10 Width 1745 mm
11 Height 1510 mm
12 Wheelbase 2520 mm
13 Ground Clearance 170 mm
14 Kerb Weight 865 kg
15 Doors 5 Doors
16 Seating Capacity 5 Person
17 No of Seating Rows 2 Rows
18 Bootspace 339 litres
19 Fuel Tank Capacity 37 litres
20 Suspension Front McPherson Strut
21 Suspension Rear Torsion Beam
22 Front Brake Type Disc
23 Rear Brake Type Drum
24 Minimum Turning Radius 4.9 metres
25 Steering Type Power assisted (Electric)
26 Wheels Steel Rims
27 Spare Wheel Steel
28 Front Tyres 185 / 65 R15
29 Rear Tyres 185 / 65 R15
from bs4 import BeautifulSoup
html = "http://www.carwale.com/marutisuzuki-cars/baleno/sigma12/"
html_content = requests.get(html).text
soup = BeautifulSoup(html_content, "lxml")
tables= soup.select("table.specs:not(.features)")
output_rows = []
for table in tables:
for table_row in table.findAll('tr'):
columns = table_row.findAll('td')
output_row = []
for column in columns:
output_row.append(column.text.strip())
output_rows.append(output_row)
print(output_rows)
输出:
[['Engine', '1197cc, 4 Cylinders Inline, 4 Valves/Cylinder, DOHC'], ['Engine Type', 'VVT'], ['Fuel Type', 'Petrol'], ['Max Power (bhp@rpm)', '82 bhp @ 6000 rpm'], ['Max Torque (Nm@rpm)', '115 Nm @ 4000 rpm'], ['Mileage (ARAI)', '21.01 kmpl'], ['Drivetrain', 'FWD'], ['Transmission', 'Manual - 5 Gears'], ['Emission Standard', 'BS 6'], ['Length', '3995 mm'], ['Width', '1745 mm'], ['Height', '1510 mm'], ['Wheelbase', '2520 mm'], ['Ground Clearance', '170 mm'], ['Kerb Weight', '865 kg'], ['Doors', '5 Doors'], ['Seating Capacity', '5 Person'], ['No of Seating Rows', '2 Rows'], ['Bootspace', '339 litres'], ['Fuel Tank Capacity', '37 litres'], ['Suspension Front', 'McPherson Strut'], ['Suspension Rear', 'Torsion Beam'], ['Front Brake Type', 'Disc'], ['Rear Brake Type', 'Drum'], ['Minimum Turning Radius', '4.9 metres'], ['Steering Type', 'Power assisted (Electric)'], ['Wheels', 'Steel Rims'], ['Spare Wheel', 'Steel'], ['Front Tyres', '185 / 65 R15'], ['Rear Tyres', '185 / 65 R15']]