Python I';我得到一个空的数据框,试图从网页上抓取html代码。为什么?
尝试使用Python3.x和pandas从Basketball引用中获取工资数据。我没有收到任何错误消息,但我没有输出。我想要表格中的第二列和第四列:“球员”和工资“2019-20”。我做错了什么 这就是我到目前为止所做的:Python I';我得到一个空的数据框,试图从网页上抓取html代码。为什么?,python,pandas,scrape,Python,Pandas,Scrape,尝试使用Python3.x和pandas从Basketball引用中获取工资数据。我没有收到任何错误消息,但我没有输出。我想要表格中的第二列和第四列:“球员”和工资“2019-20”。我做错了什么 这就是我到目前为止所做的: # URL page we will scraping salaries_url = 'https://www.basketball-reference.com/contracts/players.html' salaries_response = requests.get
# URL page we will scraping
salaries_url = 'https://www.basketball-reference.com/contracts/players.html'
salaries_response = requests.get(salaries_url)
page = salaries_response.text
# this is the HTML from the given URL
soup = BeautifulSoup(html)
#This takes the player salaries data, and creates a list of a lists, where a list is all the values of a player
salaries = []
for x in soup.find_all('tr')[2:]:
tds_salaries = x.find_all('td')
name_s = tds_salaries[0].text
salary = tds_salaries[2].text
salaries.append([name_s, salary[1:]])
#create a salary pandas dataframe
salaries_df = pd.DataFrame(salaries, columns=['name', 'salary'])
salaries_df.head()
这里很好用。我所做的只是在for循环中尝试跳过表头 代码 Outuput
该网站是否使用javascript动态创建页面内容?
requests
库不支持javascript。@John Gordon我不这么认为。我能够成功地从网站上获取玩家统计数据;我只是对工资表有问题。作为调试步骤,在循环后打印salarys
。这至少会告诉您这是一个页面抓取问题,还是一个数据帧转换问题。如果您查看源代码,您可以看到它是静态的。
salaries_url = 'https://www.basketball-reference.com/contracts/players.html'
salaries_response = requests.get(salaries_url)
page = salaries_response.text
soup = BeautifulSoup(page)
salaries = []
for x in soup.find_all('tr')[2:]:
try:
tds_salaries = x.find_all('td')
name_s = tds_salaries[0].text
salary = tds_salaries[2].text
salaries.append([name_s, salary[1:]])
except IndexError:
print('This is a header!')
salaries_df = pd.DataFrame(salaries, columns=['name', 'salary'])
print(salaries_df)
name salary
0 Stephen Curry 40,231,758
1 Russell Westbrook 38,506,482
2 Chris Paul 38,506,482
3 John Wall 38,199,000
4 James Harden 38,199,000
.. ... ...
570 Hollis Thompson 50,000
571 Tyler Ulis 50,000
572 Demetrius Jackson 18,312
573 Jordan Caroline 6,000
574 Anthony Bennett 6,000
[575 rows x 2 columns]