Python 熊猫数据帧的网页抓取

Python 熊猫数据帧的网页抓取,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正试图从我的项目中提取数据。我试图将前20个城市的数据纳入熊猫数据框架,如下所示: 等级|城市|纬度|经度 这样我就可以在代码的后面部分提取坐标并计算所需的各种参数。到目前为止,我已经想到了这一点,但似乎失败了: rank=[] city=[] state=[] population_present=[] population_past=[] changepercent=[] info = requests.get('https://en.wikipedia.org/wiki/List_o

我正试图从我的项目中提取数据。我试图将前20个城市的数据纳入熊猫数据框架,如下所示: 等级|城市|纬度|经度

这样我就可以在代码的后面部分提取坐标并计算所需的各种参数。到目前为止,我已经想到了这一点,但似乎失败了:

rank=[]
city=[]
state=[]
population_present=[]
population_past=[]
changepercent=[]


info = requests.get('https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population').text
bs = BeautifulSoup(info, 'html.parser')

for row in bs.find('table').find_all('tr'):
    p = row.find_all('td')


for row in bs.find('table').find_all('tr'):
    p= row.find_all('td')
    if(len(p) > 0):
        rank.append(p[0].text)
        city.append(p[1].text)
        latitude.append(p[2].text.rstrip('\n'))

您从网页访问了错误的元素。要访问包含所需数据的表,请使用以下命令:

info = requests.get('https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population').text
bs = BeautifulSoup(info, 'html.parser')

for tr in bs.findAll('table')[4].findAll('tr'):
    # Now take the data from this row that you want, and put it in a DataFrame

您可以通过python
pandas
实现这一点。请尝试下面的代码

import pandas as pd
import requests
from bs4 import BeautifulSoup

info = requests.get('https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population').text
bs = BeautifulSoup(info, 'html.parser')
table=bs.find_all('table',class_='wikitable')[1]
df=pd.read_html(str(table))[0]
#Get the first 20 records
df1=df.iloc[:20]

Rank=df1['2018rank'].values.tolist()
City=df1['City'].values.tolist()
#Get the location in list
locationlist=df1['Location'].values.tolist()
Latitude=[]
Longitude=[]
for val in locationlist:
    val1=val.split("/")[-1]
    Latitude.append(val1.split()[0])
    Longitude.append(val1.split()[-1])

df2=pd.DataFrame({"Rank":Rank,"City":City,"Latitude":Latitude,"Longitude":Longitude})
print(df2)
输出

                City    Latitude   Longitude  Rank
0        New York[d]  40.6635°N   73.9387°W     1
1        Los Angeles  34.0194°N  118.4108°W     2
2            Chicago  41.8376°N   87.6818°W     3
3         Houston[3]  29.7866°N   95.3909°W     4
4            Phoenix  33.5722°N  112.0901°W     5
5    Philadelphia[e]  40.0094°N   75.1333°W     6
6        San Antonio  29.4724°N   98.5251°W     7
7          San Diego  32.8153°N  117.1350°W     8
8             Dallas  32.7933°N   96.7665°W     9
9           San Jose  37.2967°N  121.8189°W    10
10            Austin  30.3039°N   97.7544°W    11
11   Jacksonville[f]  30.3369°N   81.6616°W    12
12        Fort Worth  32.7815°N   97.3467°W    13
13          Columbus  39.9852°N   82.9848°W    14
14  San Francisco[g]  37.7272°N  123.0322°W    15
15         Charlotte  35.2078°N   80.8310°W    16
16   Indianapolis[h]  39.7767°N   86.1459°W    17
17           Seattle  47.6205°N  122.3509°W    18
18         Denver[i]  39.7619°N  104.8811°W    19
19     Washington[j]  38.9041°N   77.0172°W    20