Web scraping 如何在每页显示100个以上的结果?
我想更改此页面上的结果数:Web scraping 如何在每页显示100个以上的结果?,web-scraping,Web Scraping,我想更改此页面上的结果数:https://fifatracker.net/players/到100多个,然后将表格导出到Excel,让我更轻松。我在一个教程之后尝试使用python来实现它,但我无法让它工作。如果有一种方法可以从所有页面中提取表,它也会对我有所帮助。如前所述,每个请求限制为100。只需迭代api上的查询负载即可获得每个页面: import pandas as pd import requests url = 'https://fifatracker.net/api/v1/pla
https://fifatracker.net/players/
到100多个,然后将表格导出到Excel,让我更轻松。我在一个教程之后尝试使用python来实现它,但我无法让它工作。如果有一种方法可以从所有页面中提取表,它也会对我有所帮助。如前所述,每个请求限制为100。只需迭代api上的查询负载即可获得每个页面:
import pandas as pd
import requests
url = 'https://fifatracker.net/api/v1/players/'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36'}
page= 1
payload = {
"pagination":{
"per_page":"100","page":page},
"filters":{
"attackingworkrate":[],
"defensiveworkrate":[],
"primarypositions":[],
"otherpositions":[],
"nationality":[],
"order_by":"-overallrating"},
"context":{
"username":"guest",
"slot":"1","season":1},
"currency":"eur"}
jsonData = requests.post(url, headers=headers, json=payload).json()
current_page = jsonData['pagination']['current_page']
last_page = jsonData['pagination']['last_page']
dfs = []
for page in range(1,last_page+1):
if page == 1:
pass
else:
payload['pagination']['page'] = page
jsonData = requests.post(url, headers=headers, json=payload).json()
players = pd.json_normalize(jsonData['result'])
dfs.append(players)
print('Page %s of %s' %(page,last_page))
df = pd.concat(dfs).reset_index(drop=True)
输出:
print(df)
slug ... info.contract.loanedto_clubname
0 lionel-messi ... NaN
1 cristiano-ronaldo ... NaN
2 robert-lewandowski ... NaN
3 neymar-jr ... NaN
4 kevin-de-bruyne ... NaN
... ... ...
19137 levi-kaye ... NaN
19138 phillip-cancar ... NaN
19139 julio-pérez ... NaN
19140 alan-mclaughlin ... NaN
19141 tatsuki-yoshitomi ... NaN
[19142 rows x 92 columns]
他们的API请求允许最大值为100,任何更大的数字,默认值为50。但是,随着页面大小的增加,您可以发出相同的请求,不需要刮刀,简单的代码就可以做到,更改页面大小时请检查网络选项卡