Python 如何正确地将web抓取的数据放置到数据框中？_Python_Pandas_Beautifulsoup

Python 如何正确地将web抓取的数据放置到数据框中？

python pandas

Python 如何正确地将web抓取的数据放置到数据框中？,python,pandas,beautifulsoup,Python,Pandas,Beautifulsoup,问题：我用BeautifulSoup在维基百科上搜索了世界上每个国家的人均肉类消费量。在使用Pandas将其放入数据框时遇到问题-我的数据框是空白的维基百科页面：目标：将web抓取的数据放入数据框中代码： url_meat1='https://en.wikipedia.org/wiki/List_of_countries_by_meat_consumption' page=urllib.request.urlopen(url_meat1) soup= BeautifulSoup(page,

问题：我用BeautifulSoup在维基百科上搜索了世界上每个国家的人均肉类消费量。在使用Pandas将其放入数据框时遇到问题-我的数据框是空白的

维基百科页面：

目标：将web抓取的数据放入数据框中

代码：

url_meat1='https://en.wikipedia.org/wiki/List_of_countries_by_meat_consumption'
page=urllib.request.urlopen(url_meat1)
soup= BeautifulSoup(page, "lxml")# parse the HTML from our URL into the BeautifulSoup parse tree format
print(soup.prettify()) #print results of the web page scrape

table_meat1 = soup.find('table', class_='wikitable sortable')

A=[]
B=[]
C=[]

for row in table_meat1.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==3:
        A.append(cells[0].find(text=True))
        B.append(cells[1].find(text=True))
        C.append(cells[2].find(text=True))

df_meat1=pd.DataFrame(A,columns=['Country'])
df_meat1['kg/person (2009)']=B
df_meat1['kg/person (2017)']=C
df_meat1

我得到一个空白的数据框。。。

将

for

循环替换为

for

循环：

for row in table_meat1.findAll('tr'):
    cells=row.find_all('td')
    if len(cells)==4:
        A.append(cells[0].a['title'])
        B.append(cells[2].find(text=True))
        C.append(cells[3].find(text=True).strip())

输出：

                 Country kg/person (2009) kg/person (2017)
0                Albania             None                 
1                Algeria             19.5            17.33
2         American Samoa             26.8                 
3                 Angola             22.4                 
4    Antigua and Barbuda             84.3                 
..                   ...              ...              ...
183            Venezuela             76.8                 
184              Vietnam             49.9            52.90
185                Yemen             17.9                 
186               Zambia             12.3                 
187             Zimbabwe             21.3            13.64

[188 rows x 3 columns]

csv

文件中的相同数据：

将

for

循环替换为

for

循环：

for row in table_meat1.findAll('tr'):
    cells=row.find_all('td')
    if len(cells)==4:
        A.append(cells[0].a['title'])
        B.append(cells[2].find(text=True))
        C.append(cells[3].find(text=True).strip())

输出：

                 Country kg/person (2009) kg/person (2017)
0                Albania             None                 
1                Algeria             19.5            17.33
2         American Samoa             26.8                 
3                 Angola             22.4                 
4    Antigua and Barbuda             84.3                 
..                   ...              ...              ...
183            Venezuela             76.8                 
184              Vietnam             49.9            52.90
185                Yemen             17.9                 
186               Zambia             12.3                 
187             Zimbabwe             21.3            13.64

[188 rows x 3 columns]

csv

文件中的相同数据：

效果非常好！非常感谢。你能解释一下你做了什么吗？试图理解代码。当然…我做了2个更改。1.将

如果len（单元格）==3:

更改为

如果len（单元格）==4:

，因为没有长度为3的单元格。所有单元格的长度均为4。2.将

A.append（单元格[0]。find（text=True））

更改为

A.append（单元格[0]。A['title']）

，因为

title

属性包含国家名称。非常感谢，Sushil。你帮了大忙！那很有效！非常感谢。你能解释一下你做了什么吗？试图理解代码。当然…我做了2个更改。1.将

如果len（单元格）==3:

更改为

如果len（单元格）==4:

，因为没有长度为3的单元格。所有单元格的长度均为4。2.将

A.append（单元格[0]。find（text=True））

更改为

A.append（单元格[0]。A['title']）

，因为

title

属性包含国家名称。非常感谢，Sushil。你帮了大忙！