Python 使用asyncio生成器和asyncio.as_完成_Python_Python Asyncio_Python 3.7

Python 使用asyncio生成器和asyncio.as_完成

python

Python 使用asyncio生成器和asyncio.as_完成,python,python-asyncio,python-3.7,Python,Python Asyncio,Python 3.7,我有一些代码，可以用来刮取url，解析信息，然后使用SQLAlchemy将其放入数据库。我正在尝试异步执行此操作，同时限制最大并发请求数这是我的密码： async def get_url(aiohttp_session, url1, url2): async with session.get(url1) as r_url1: if r_url1.status == 200: async with session.get(url2) as r_url2:

我有一些代码，可以用来刮取url，解析信息，然后使用SQLAlchemy将其放入数据库。我正在尝试异步执行此操作，同时限制最大并发请求数
这是我的密码：

async def get_url(aiohttp_session, url1, url2): async with session.get(url1) as r_url1: if r_url1.status == 200: async with session.get(url2) as r_url2: if r_url2.status == 200: return await r_url1.json(), await r_url2.json() async def url_generator(formatted_start_date, formatted_end_date, machine_id, interval): interval_start = formatted_start_date interval_end = formatted_start_date + interval while interval_end <= formatted_end_date: yield (f"https://example.org/start={interval_start}" f"Start={datetime.strftime(interval_start, DATETIME_FORMAT)}" f"&End={datetime.strftime(interval_end, DATETIME_FORMAT)}" f"&machines={machine_id}", f"https://example.org/start={interval_start}" f"Start={datetime.strftime(interval_start, DATETIME_FORMAT)}" f"&End={datetime.strftime(interval_end, DATETIME_FORMAT)}" f"&machines={machine_id}&groupby=Job" ) interval_start += interval interval_end += interval async def parse(database, url1_json, url2_json): """ Do some parsing and save it using credentials stored in the database object """ def main(database, formatted_start_date, formatted_end_date, machine_id, interval): async for url1_json, url2_json in asyncio.as_completed(url_generator(formatted_start_date, formatted_end_date, machine_id, interval)): parse(database, url1_json, url2_json)

async def get_url（aiohttp_会话，url1，url2）：与session.get（url1）异步，作为r\u url1: 如果r_url1.status==200：与session.get（url2）异步，作为r_url2: 如果r_url2.status==200：返回wait r_url1.json（），wait r_url2.json（）异步定义url_生成器（格式化的开始日期、格式化的结束日期、机器id、间隔）：间隔\u开始=格式化的\u开始\u日期间隔\u结束=格式化的\u开始\u日期+间隔当间隔结束时，发布的代码有几个问题：
您正在尝试使用as_completed 作为异步迭代器，使用async for 对其结果进行迭代。但是，as_completed 不返回异步迭代器（至少），必须使用常规的对进行迭代，并显式地等待每个生成的对象，如所示
您正在将异步迭代器作为_completed
传递给
，而它接受普通容器或（常规）iterable 在未使用async def 定义的函数中使用async for ，这应该是语法错误。另外，parse（）被定义为一个协同程序，您不需要等待它好消息是，由于url\u generator 已经是一个生成器，您根本不需要完成，您应该能够在它上面进行迭代： async def main(database, formatted_start_date, formatted_end_date, machine_id, interval): async for url1_json, url2_json in url_generator( formatted_start_date, formatted_end_date, machine_id, interval)): await parse(database, url1_json, url2_json) 但是，请注意，async for 不会自动并行化迭代，它只允许其他协同程序与迭代的协同程序并行运行。要并行化迭代，您需要调用以并行提交任务，并使用来限制并行任务的数量。例如： async def parse(database, url1_json, url2_json, limit): # async with applied to a semaphore ensures that no more than N # coroutines that use the same semaphore enter the "with" block # in parallel async with limit: ... code goes here ... async def main(database, formatted_start_date, formatted_end_date, machine_id, interval): limit = asyncio.Semaphore(10) # create all coroutines in advance using create_task # and run them in parallel, relying on the semaphore # limit the number of simultaneous requests tasks = [] async for url1_json, url2_json in url_generator( formatted_start_date, formatted_end_date, machine_id, interval)): # this create_task just creates the task - it will # start running when we return to the event loop tasks.append(asyncio.create_task(parse(database, url1_json, url2_json, limit)) # suspend to the event loop, resuming this coroutine only after # all the tasks have finished (or any of them raises) await asyncio.gather(*tasks) 请注意，url\u生成器不需要是异步的，因为它不需要等待任何东西。您可以使用def 对其进行定义，并使用for 对其进行迭代，感谢您的深入回答。我之所以将url\u生成器作为生成器，部分原因是我不想同时在内存中保存所有任务。任务列表可能有数百万个。当信号量调用了release（）。尽管如此，如果您要处理数百万个项目，您可能希望避免提前创建数百万个任务，并让他们为信号量“斗争”。（战斗在算法上应该仍然有效，但会浪费内存。）@TMarks您可以创建一个有界队列，创建固定数量的协同路由（按照您希望的同时访问限制的大小），以耗尽队列/联系数据库，并从main（）填充队列。请看一个例子。谢谢@user4815162342，我感谢你为回答问题所付出的一切努力。我会去查看你发布的链接，并尝试实现这样的解决方案。