Python 3.x 在异步IO中批处理任务_Python 3.x_Concurrency_Python Asyncio

Python 3.x 在异步IO中批处理任务

python-3.x concurrency

Python 3.x 在异步IO中批处理任务,python-3.x,concurrency,python-asyncio,Python 3.x,Concurrency,Python Asyncio,我有一个生成任务（io绑定任务）的函数：我正在尝试用asyncio编写一个消费者，它将同时处理最多10个任务，一个任务完成后，将接受新的任务。我不确定是否应该使用信号量，或者是否存在任何类型的asycio池执行器？我开始用线程编写伪代码： def run(self) while True: self.semaphore.acquire() # first acquire, then get task t = get_task() self.pr

我有一个生成任务（io绑定任务）的函数：

我正在尝试用asyncio编写一个消费者，它将同时处理最多10个任务，一个任务完成后，将接受新的任务。我不确定是否应该使用信号量，或者是否存在任何类型的asycio池执行器？我开始用线程编写伪代码：

def run(self)
   while True:
       self.semaphore.acquire() # first acquire, then get task
       t = get_task()
       self.process_task(t)

def process_task(self, task):
   try:
       self.execute_task(task)
       self.mark_as_done(task)
   except:
       self.mark_as_failed(task)
   self.semaphore.release()

有人能帮我吗？我不知道在哪里放置async/await关键字async不是线程。例如，如果您有文件IO绑定的任务，那么

然后用任务替换任务。如果一次只运行10个任务，请等待asyncio.gather every 10个任务

import asyncio

async def task(x):
  await asyncio.sleep(0.5)
  print( x, "is done" )

async def run(loop):
  futs = []
  for x in range(50):
    futs.append( task(x) )

  await asyncio.gather( *futs )

loop = asyncio.get_event_loop()
loop.run_until_complete( run(loop) )
loop.close()

如果您不能异步编写任务并且需要线程，这是使用asyncio的ThreadPoolExecutor的一个基本示例。请注意，对于max_workers=5，一次仅运行5个任务

import time
from concurrent.futures import ThreadPoolExecutor
import asyncio

def blocking(x):
  time.sleep(1)
  print( x, "is done" )

async def run(loop):
  futs = []
  executor = ThreadPoolExecutor(max_workers=5)
  for x in range(15):
    future = loop.run_in_executor(executor, blocking, x)
    futs.append( future )

  await asyncio.sleep(4)
  res = await asyncio.gather( *futs )

loop = asyncio.get_event_loop()
loop.run_until_complete( run(loop) )
loop.close()

使用asyncio.Sepmaphore的简单任务cap

async def max10（任务生成器）：
信号量=异步IO.信号量（10）
异步定义绑定（任务）：
与信号量异步：
返回等待任务
任务生成器中任务的异步：
异步。确保未来（有界（任务））

此解决方案的问题在于，任务正贪婪地从生成器中提取。例如，如果generator从大型数据库读取数据，程序可能会耗尽内存

除此之外，它的习惯用语和良好的行为

使用异步生成器协议按需提取新任务的解决方案：

async def max10（任务生成器）：
tasks=set（）
gen=任务\u生成器
尝试：
尽管如此：
而len（任务）<10：
任务。添加（等待gen.\uuu anext\uuuu（））
_完成，任务=wait asyncio.wait（任务，在=asyncio.FIRST\u完成时返回）
除停止异步迭代外：
等待asyncio.gather（*任务）

它可能被认为是次优的，因为它在10个可用任务之前不会开始执行任务

下面是使用工作者模式的简明而神奇的解决方案：

async def max10（任务生成器）：
异步def worker（）：
任务生成器中任务的异步：
等待任务
等待asyncio.gather（*[worker（）用于范围（10）中的i）

它依赖于一种有点违反直觉的特性，即能够在同一个异步生成器上拥有多个异步迭代器，在这种情况下，每个生成的项只被一个迭代器看到

我的直觉告诉我，这些解决方案都不能正常运行。

正如所指出的，使用信号量来限制并发性很容易让任务生成器过于急切地耗尽精力，因为在获取任务和将任务提交到事件循环之间没有背压。另一个答案也探讨了一个更好的选择，即不要在生成器生成一个项目后立即生成一个任务，而是创建一个固定数量的工作人员，同时使生成器耗尽

代码有两个方面可以改进：

不需要信号量——当任务数量固定时，它是多余的
处理生成任务和限制任务的取消

下面是一个解决这两个问题的实现：

async def throttle(task_generator, max_tasks):
    it = task_generator.__aiter__()
    cancelled = False
    async def worker():
        async for task in it:
            try:
                await task
            except asyncio.CancelledError:
                # If a generated task is canceled, let its worker
                # proceed with other tasks - except if it's the
                # outer coroutine that is cancelling us.
                if cancelled:
                    raise
            # other exceptions are propagated to the caller
    worker_tasks = [asyncio.create_task(worker())
                    for i in range(max_tasks)]
    try:
        await asyncio.gather(*worker_tasks)
    except:
        # In case of exception in one worker, or in case we're
        # being cancelled, cancel all workers and propagate the
        # exception.
        cancelled = True
        for t in worker_tasks:
            t.cancel()
        raise

一个简单的测试用例：

async def mock_task(num):
    print('running', num)
    await asyncio.sleep(random.uniform(1, 5))
    print('done', num)

async def mock_gen():
    tnum = 0
    while True:
        await asyncio.sleep(.1 * random.random())
        print('generating', tnum)
        yield asyncio.create_task(mock_task(tnum))
        tnum += 1

if __name__ == '__main__':
    asyncio.run(throttle(mock_gen(), 3))

谢谢你的回复。问题是，我不想做10乘10，而是在任何一项任务完成后开始下一项任务。因此，在开始时，我开始10项任务，其中一项完成，然后将下一项添加到池中。因此，每次处理10个任务，但解决方案中的问题是，在尚未获取信号量的情况下获取新任务，在我的案例中，是什么导致问题。，一次最多运行10个任务的要求从何而来？现有的问题和一个可以应用于此的解决方案：我喜欢这个答案的说教方法，但最后一个片段可能要简单得多。因为您有固定数量的工作者，所以可以去掉信号量。如果没有信号灯，工人可以使用普通的

异步for

循环。谢谢，编辑。还添加了一个解释，解释了为什么这样做是有效的：）注意，为了支持更友好的

asyncio.create\u任务

。

async def throttle(task_generator, max_tasks):
    it = task_generator.__aiter__()
    cancelled = False
    async def worker():
        async for task in it:
            try:
                await task
            except asyncio.CancelledError:
                # If a generated task is canceled, let its worker
                # proceed with other tasks - except if it's the
                # outer coroutine that is cancelling us.
                if cancelled:
                    raise
            # other exceptions are propagated to the caller
    worker_tasks = [asyncio.create_task(worker())
                    for i in range(max_tasks)]
    try:
        await asyncio.gather(*worker_tasks)
    except:
        # In case of exception in one worker, or in case we're
        # being cancelled, cancel all workers and propagate the
        # exception.
        cancelled = True
        for t in worker_tasks:
            t.cancel()
        raise

async def mock_task(num):
    print('running', num)
    await asyncio.sleep(random.uniform(1, 5))
    print('done', num)

async def mock_gen():
    tnum = 0
    while True:
        await asyncio.sleep(.1 * random.random())
        print('generating', tnum)
        yield asyncio.create_task(mock_task(tnum))
        tnum += 1

if __name__ == '__main__':
    asyncio.run(throttle(mock_gen(), 3))