Python 如何使用游标对API进行分页？_Python_Json_Pandas_Api_Pagination

Python 如何使用游标对API进行分页？

python json pandas api pagination

Python 如何使用游标对API进行分页？,python,json,pandas,api,pagination,Python,Json,Pandas,Api,Pagination,JSON响应示例 { "Data": { "City": [ { "loc": "Sector XYZ", "Country": "AUS", }, { .

JSON响应示例

{
    "Data": {
        "City": [
            {
                "loc": "Sector XYZ",
                "Country": "AUS",
            },
            
            {
            .
            .
            .
            .
            .
            },
        ]
    },
    "Meta": {},
    "ResourceType": 40,
    "StatusCode": 200,
    "Message": null,
    "Cursor": "apicursor-ad39609e-5fb2-4a66-9402-6def95e75655",
    ]
}

光标是动态的，在每次分页响应后都会发生变化；下一个可能是“

apicursor-53ee8993-022c-41df-8be7-9BDEDF91E52

”等等

新URL将采用以下格式

https://myurl123.com/api/V2/data/{}?size=10&cursor=apicursor-53ee8993-022c-41df-8be7-9bdedfd91e52

对于非常大的数据集，我无法确定如何对响应进行分页并将其附加到数据帧。这是我尝试过的，但不包括分页

def foo(name):
    url = "https://myurl123.com/api/V2/data/{}?size=10".format(name)
    print(url)
    headers = {
    'Authorization': 'ApiKey xyz123',
    'Content-Type': 'application/json'
    }

    response = requests.request("GET", url, headers=headers, data=payload)
    try:
        x = response.json()
        xs = next(iter(x['Data'].values()))
        df = pd.read_json(StringIO(json.dumps(xs)), orient='records')
        df.reset_index(drop=True, inplace=True)
        return df
    except:
        print('fetch failed')

我只想对API进行分页，获取

df

中的所有数据，并将其作为上述函数的一部分返回

我无法理解这里提供的其他一些答案，因此我为任何重复表示歉意。感谢您的帮助和建议。

我是否正确理解您需要反复阅读API，直到您不再获取任何数据？你可以这样做。函数

get_data（）

将作为一个迭代器返回所有请求的所有行。从调用函数来看，这看起来就像一个长列表

但对于10万条生产线来说，这需要很长时间。因为每个请求将读取10行，所以一个接一个地读取10000个请求

def get_data(name):
    csr = ""
    baseurl = "https://myurl123.com/api/V2/data/{}".format(name)
    headers = {
        'Authorization': 'ApiKey xyz123',
        'Content-Type': 'application/json'
    }

    while True:
        url = "{}?size=10&cursor={}".format(baseurl, csr)
        res = requests.request("GET", url, headers=headers, data=payload)
        res.raise_for_status()
        data = res.json()
        if not data["Data"]:
            break
        crs = data["Cursor"]

        for row in data["Data"]["City"]:
            yield row

def get_df(name):
    data = get_data(name)

    df = pd.read_json(StringIO(json.dumps(list(data))), orient='records')
    df.reset_index(drop=True, inplace=True)

    return df

你有什么错误吗？嗯，您不知道，因为您将整个内容包装到了一个通用的try中，除了block:）。首先删除该选项，然后查看错误消息的内容。可能会将其添加到您的问题中。@C14L我没有得到任何错误，但这只返回前10行（以数据帧的形式）编辑：如果我从URL中删除size参数，我会得到前100000行，我猜这是由于设计限制。无论如何，我仍然需要在这里分页