在python中并发调用api_Python_Api_Concurrency_Threadpool_Gevent

在python中并发调用api

python api concurrency

在python中并发调用api,python,api,concurrency,threadpool,gevent,Python,Api,Concurrency,Threadpool,Gevent,我需要和api谈谈，以获取有关团队的信息。每个团队都有一个唯一的id。我用这个id调用api，然后得到每个团队的球员列表（DICT列表）。一个玩家的一个关键是另一个id，我可以用它来获取关于该玩家的更多信息。我可以绑定所有这些player_id并调用api，以便在一个api调用中获得每个player的所有附加信息我的问题是：我预计球队的数量会增加，可能会很大。此外，每支球队的球员数量也可能增加使这些api调用并发到api的最佳方法是什么？我可以使用multiprocessing.dummy

我需要和api谈谈，以获取有关团队的信息。每个团队都有一个唯一的id。我用这个id调用api，然后得到每个团队的球员列表（DICT列表）。一个玩家的一个关键是另一个id，我可以用它来获取关于该玩家的更多信息。我可以绑定所有这些player_id并调用api，以便在一个api调用中获得每个player的所有附加信息

我的问题是：我预计球队的数量会增加，可能会很大。此外，每支球队的球员数量也可能增加

使这些api调用并发到api的最佳方法是什么？我可以使用multiprocessing.dummy中的ThreadPool，我还看到genvent用于类似的东西

对api的调用需要一些时间才能获得返回值（每次批量api调用需要1-2秒）

现在，我要做的是：

for each team:
    get the list of players
    store the player_ids in a list
    get the player information for all the players (passing the list of player_ids)
assemble and process the information

如果使用ThreadPool，我可以执行以下操作：

create a ThreadPool of size x
result = pool.map(function_to_get_team_info, list of teams)
pool.close()
pool.join()
#process results

def function_to_get_team_info(team_id):
    players = api.call(team_id)
    player_info = get_players_information(players)
    return player_info

def get_players_information(players):
    player_ids = []
    for player in players:
        player_ids.append(player['id'])
    return get_all_player_stats(player_ids)

def get_all_player_stats(players_id):
    return api.call(players_id)

这将同时处理每个团队，并将所有信息汇集回线程池结果中

为了使其完全并发，我认为我需要使我的线程池的大小与团队的数量相同。但我觉得这个比例不太合适。所以，我想知道我是否使用gevent来处理这些信息，这是否是一种更好的方法

欢迎提出任何建议

一个解决方案是：

准备要执行的任务列表，例如要处理的团队ID列表
创建N个线程工作线程的固定池
每个工作线程从列表中弹出一个任务并处理该任务（下载团队数据），完成后弹出另一个任务
当任务列表为空时，工作线程停止

当处理一个特定团队需要100个时间单位时，此解决方案可以避免这种情况，而其他团队的处理时间为1个时间单位（平均）

您可以根据团队数量、平均团队处理时间、CPU内核数量等调整线程工作线程的数量

扩展答案

这可以通过Python实现：

from multiprocessing import Pool

def api_call(id):
    pass # call API for given id

if __name__ == '__main__':
    p = Pool(5)
    p.map(api_call, [1, 2, 3])