Web scraping 如何在每页显示100个以上的结果?

Web scraping 如何在每页显示100个以上的结果?,web-scraping,Web Scraping,我想更改此页面上的结果数:https://fifatracker.net/players/到100多个,然后将表格导出到Excel,让我更轻松。我在一个教程之后尝试使用python来实现它,但我无法让它工作。如果有一种方法可以从所有页面中提取表,它也会对我有所帮助。如前所述,每个请求限制为100。只需迭代api上的查询负载即可获得每个页面: import pandas as pd import requests url = 'https://fifatracker.net/api/v1/pla

我想更改此页面上的结果数:
https://fifatracker.net/players/
到100多个,然后将表格导出到Excel,让我更轻松。我在一个教程之后尝试使用python来实现它,但我无法让它工作。如果有一种方法可以从所有页面中提取表,它也会对我有所帮助。

如前所述,每个请求限制为100。只需迭代api上的查询负载即可获得每个页面:

import pandas as pd
import requests

url = 'https://fifatracker.net/api/v1/players/'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36'}

page= 1 
payload = {
    "pagination":{
        "per_page":"100","page":page},
           "filters":{
               "attackingworkrate":[],
               "defensiveworkrate":[],
               "primarypositions":[],
               "otherpositions":[],
               "nationality":[],
               "order_by":"-overallrating"},
           "context":{
               "username":"guest",
               "slot":"1","season":1},
           "currency":"eur"}

jsonData = requests.post(url, headers=headers, json=payload).json()
current_page = jsonData['pagination']['current_page']
last_page = jsonData['pagination']['last_page']

dfs = []
for page in range(1,last_page+1):
    if page == 1:
        pass
        
    else:
        payload['pagination']['page'] = page
        jsonData = requests.post(url, headers=headers, json=payload).json()
        
    players = pd.json_normalize(jsonData['result'])
    dfs.append(players)
    print('Page %s of %s' %(page,last_page))

df = pd.concat(dfs).reset_index(drop=True)
输出:

print(df)
                     slug  ... info.contract.loanedto_clubname
0            lionel-messi  ...                             NaN
1       cristiano-ronaldo  ...                             NaN
2      robert-lewandowski  ...                             NaN
3               neymar-jr  ...                             NaN
4         kevin-de-bruyne  ...                             NaN
                  ...  ...                             ...
19137           levi-kaye  ...                             NaN
19138      phillip-cancar  ...                             NaN
19139         julio-pérez  ...                             NaN
19140     alan-mclaughlin  ...                             NaN
19141   tatsuki-yoshitomi  ...                             NaN

[19142 rows x 92 columns]

他们的API请求允许最大值为100,任何更大的数字,默认值为50。但是,随着页面大小的增加,您可以发出相同的请求,不需要刮刀,简单的代码就可以做到,更改页面大小时请检查网络选项卡