Python asyncio实际上是如何工作的?

Python asyncio实际上是如何工作的?,python,python-3.x,python-asyncio,Python,Python 3.x,Python Asyncio,这个问题的动机是我的另一个问题: 网上有大量关于asyncio的文章和博客文章,但它们都很肤浅。我找不到任何关于asyncio是如何实际实现的,以及是什么使I/O异步的信息。我试图阅读源代码,但它不是最高级别的C代码,其中很多涉及辅助对象,但最关键的是,很难在Python语法和它将转换成的C代码之间建立联系 Asycnio自己的文档甚至没有那么有用。那里没有关于它如何工作的信息,只有一些关于如何使用它的指南,这些指南有时也是误导性的/写得非常糟糕 我熟悉Go的协同程序实现,希望Python也能做

这个问题的动机是我的另一个问题:

网上有大量关于
asyncio
的文章和博客文章,但它们都很肤浅。我找不到任何关于
asyncio
是如何实际实现的,以及是什么使I/O异步的信息。我试图阅读源代码,但它不是最高级别的C代码,其中很多涉及辅助对象,但最关键的是,很难在Python语法和它将转换成的C代码之间建立联系

Asycnio自己的文档甚至没有那么有用。那里没有关于它如何工作的信息,只有一些关于如何使用它的指南,这些指南有时也是误导性的/写得非常糟糕

我熟悉Go的协同程序实现,希望Python也能做到这一点。如果是这样的话,我在上面链接的帖子中找到的代码就可以工作了。既然没有,我现在想弄清楚原因。到目前为止,我最好的猜测如下,请纠正我的错误:

  • 形式为
    async def foo():…
    的过程定义实际上被解释为继承
    coroutine
    的类的方法
  • 也许,
    async def
    实际上被
    await
    语句分割成多个方法,在这些语句中调用这些方法的对象能够跟踪到目前为止执行过程中的进度
  • 如果上述情况属实,那么,从本质上讲,协同路由的执行归结为某个全局管理器(循环?)调用协同路由对象的方法
  • 全局管理器以某种方式(如何?)知道Python(仅?)代码何时执行I/O操作,并且能够在当前执行方法放弃控制后选择一个挂起的协同程序方法来执行(点击
    wait
    语句)
  • 换句话说,我试图将一些
    asyncio
    语法“分解”为更容易理解的语法:

    async def coro(name):
        print('before', name)
        await asyncio.sleep()
        print('after', name)
    
    asyncio.gather(coro('first'), coro('second'))
    
    # translated from async def coro(name)
    class Coro(coroutine):
        def before(self, name):
            print('before', name)
    
        def after(self, name):
            print('after', name)
    
        def __init__(self, name):
            self.name = name
            self.parts = self.before, self.after
            self.pos = 0
    
        def __call__():
            self.parts[self.pos](self.name)
            self.pos += 1
    
        def done(self):
            return self.pos == len(self.parts)
    
    
    # translated from asyncio.gather()
    class AsyncIOManager:
    
        def gather(*coros):
            while not every(c.done() for c in coros):
                coro = random.choice(coros)
                coro()
    

    如果我的猜测被证明是正确的,那么我就有问题了。在这种情况下,I/O实际上是如何发生的?在单独的线程中?整个解释器是否挂起,I/O是否发生在解释器外部?I/O的确切含义是什么?如果我的python过程称为C
    open()
    procedure,它又向内核发送了中断,放弃了对它的控制,那么python解释器如何知道这一点并能够继续运行其他代码,而内核代码执行实际的I/O,直到它唤醒最初发送中断的python过程?Python解释器原则上如何意识到这一点?

    这一切归结为asyncio正在解决的两个主要挑战:

    • 如何在单个线程中执行多个I/O
    • 如何实现协作多任务
    第一点的答案已经存在了很长一段时间,被称为a。在python中,它是在中实现的

    第二个问题涉及的概念是,即可以停止执行并在以后恢复的函数。在python中,协程是使用和语句实现的。这就是隐藏在这个世界背后的东西

    在这方面有更多的资源


    编辑:处理您对goroutines的评论:

    与asyncio中的goroutine最接近的等价物实际上不是一个协同路由,而是一个任务(请参见中的区别)。在python中,协同程序(或生成器)对事件循环或I/O的概念一无所知。它只是一个函数,可以使用
    yield
    停止其执行,同时保持其当前状态,以便以后可以恢复。
    yield from
    语法允许以透明的方式链接它们

    现在,在asyncio任务中,位于链最底层的协同程序总是会产生一个错误。然后,这个未来会出现在事件循环中,并集成到内部机制中。当future设置为由其他内部回调完成时,事件循环可以通过将future发送回协程链来恢复任务


    编辑:解决您帖子中的一些问题:

    在这种情况下,I/O实际上是如何发生的?在单独的线程中?整个解释器是否挂起,I/O是否发生在解释器外部

    不,线程中不会发生任何事情。I/O总是由事件循环管理,主要是通过文件描述符。然而,这些文件描述符的注册通常被高级协同程序隐藏,这会给您带来麻烦

    I/O的确切含义是什么?如果我的python过程称为C open()过程,它反过来向内核发送中断,并放弃对它的控制,那么python解释器如何知道这一点并能够继续运行其他一些代码,而内核代码执行实际的I/O,直到它唤醒最初发送中断的python过程?Python解释器原则上如何意识到这种情况


    I/O是任何阻塞调用。在asyncio中,所有I/O操作都应该经过事件循环,因为正如您所说的,事件循环无法知道在某些同步代码中正在执行阻塞调用。这意味着您不应该在协同程序的上下文中使用同步
    open
    。相反,使用一个专用的库,它提供异步版本的
    open

    您的
    coro
    去糖化在概念上是正确的,但有点不完整

    await
    不会无条件挂起,但只有在遇到阻塞调用时才会挂起。它如何知道呼叫被阻塞?这是由等待的代码决定的。例如,套接字读取的一个等待实现可以被设计为:

    def read(sock, n):
        # sock must be in non-blocking mode
        try:
            return sock.recv(n)
        except EWOULDBLOCK:
            event_loop.add_reader(sock.fileno, current_task())
            return SUSPEND
    
    在real asyncio中,修改未来的状态,而不是返回m
    data = await read(sock, 1024)
    
    data = read(sock, 1024)
    if data is SUSPEND:
        return SUSPEND
    self.pos += 1
    self.parts[self.pos](...)
    
    >>> def test():
    ...     yield 1
    ...     yield 2
    ...
    >>> gen = test()
    >>> next(gen)
    1
    >>> next(gen)
    2
    >>> next(gen)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    StopIteration
    
    >>> def test():
    ...     val = yield 1
    ...     print(val)
    ...     yield 2
    ...     yield 3
    ...
    >>> gen = test()
    >>> next(gen)
    1
    >>> gen.send("abc")
    abc
    2
    >>> gen.throw(Exception())
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 4, in test
    Exception
    
    >>> def test():
    ...     yield 1
    ...     return "abc"
    ...
    >>> gen = test()
    >>> next(gen)
    1
    >>> try:
    ...     next(gen)
    ... except StopIteration as exc:
    ...     print(exc.value)
    ...
    abc
    
    >>> def inner():
    ...     inner_result = yield 2
    ...     print('inner', inner_result)
    ...     return 3
    ...
    >>> def outer():
    ...     yield 1
    ...     val = yield from inner()
    ...     print('outer', val)
    ...     yield 4
    ...
    >>> gen = outer()
    >>> next(gen)
    1
    >>> next(gen) # Goes inside inner() automatically
    2
    >>> gen.send("abc")
    inner abc
    outer 3
    4
    
    async def inner():
        return 1
    
    async def outer():
        await inner()
    
    def subfoo(bar):
         qux = 3
         return qux * bar
    
    root -\
      :    \- subfoo --\
      :/--<---return --/
      |
      V
    
     def cofoo(bar):
          qux = yield bar  # yield marks a break point
          return qux
    
    root -\
      :    \- cofoo --\
      :/--<+--yield --/
      |    :
      V    :
    
    def wrap():
        yield 'before'
        yield from cofoo()
        yield 'after'
    
    root -\
      :    \-> coro_a -yield-from-> coro_b --\
      :/ <-+------------------------yield ---/
      |    :
      :\ --+-- coro_a.send----------yield ---\
      :                             coro_b <-/
    
    def foo():  # subroutine?
         return None
    
    def foo():  # coroutine?
         yield from foofoo()  # generator? coroutine?
    
    async def foo():  # coroutine!
         await foofoo()  # coroutine!
         return None
    
    loop -\
      :    \-> coroutine --await--> event --\
      :/ <-+----------------------- yield --/
      |    :
      |    :  # loop waits for event to happen
      |    :
      :\ --+-- send(reply) -------- yield --\
      :        coroutine <--yield-- event <-/
    
    class AsyncSleep:
        """Event to sleep until a point in time"""
        def __init__(self, until: float):
            self.until = until
    
        # used whenever someone ``await``s an instance of this Event
        def __await__(self):
            # yield this Event to the loop
            yield self
        
        def __repr__(self):
            return '%s(until=%.1f)' % (self.__class__.__name__, self.until)
    
    import time
    
    async def asleep(duration: float):
        """await that ``duration`` seconds pass"""
        await AsyncSleep(time.time() + duration / 2)
        await AsyncSleep(time.time() + duration / 2)
    
    coroutine = asleep(100)
    while True:
        print(coroutine.send(None))
        time.sleep(0.1)
    
    def run(*coroutines):
        """Cooperatively run all ``coroutines`` until completion"""
        # store wake-up-time and coroutines
        waiting = [(0, coroutine) for coroutine in coroutines]
        while waiting:
            # 2. pick the first coroutine that wants to wake up
            until, coroutine = waiting.pop(0)
            # 3. wait until this point in time
            time.sleep(max(0.0, until - time.time()))
            # 4. run this coroutine
            try:
                command = coroutine.send(None)
            except StopIteration:
                continue
            # 1. sort coroutines by their desired suspension
            if isinstance(command, AsyncSleep):
                waiting.append((command.until, coroutine))
                waiting.sort(key=lambda item: item[0])
    
    async def sleepy(identifier: str = "coroutine", count=5):
        for i in range(count):
            print(identifier, 'step', i + 1, 'at %.2f' % time.time())
            await asleep(0.1)
    
    run(*(sleepy("coroutine %d" % j) for j in range(5)))
    
    readable, writeable, _ = select.select(rlist, wlist, xlist, timeout)
    
    write_target = open('/tmp/foo')
    readable, writeable, _ = select.select([], [write_target], [])
    
    class AsyncRead:
        def __init__(self, file, amount=1):
            self.file = file
            self.amount = amount
            self._buffer = ''
    
        def __await__(self):
            while len(self._buffer) < self.amount:
                yield self
                # we only get here if ``read`` should not block
                self._buffer += self.file.read(1)
            return self._buffer
    
        def __repr__(self):
            return '%s(file=%s, amount=%d, progress=%d)' % (
                self.__class__.__name__, self.file, self.amount, len(self._buffer)
            )
    
    # new
    waiting_read = {}  # type: Dict[file, coroutine]
    
    # old
    time.sleep(max(0.0, until - time.time()))
    # new
    readable, _, _ = select.select(list(reads), [], [])
    
    # new - reschedule waiting coroutine, run readable coroutine
    if readable:
        waiting.append((until, coroutine))
        waiting.sort()
        coroutine = waiting_read[readable[0]]
    
    # new
    if isinstance(command, AsyncSleep):
        ...
    elif isinstance(command, AsyncRead):
        ...
    
    def run(*coroutines):
        """Cooperatively run all ``coroutines`` until completion"""
        waiting_read = {}  # type: Dict[file, coroutine]
        waiting = [(0, coroutine) for coroutine in coroutines]
        while waiting or waiting_read:
            # 2. wait until the next coroutine may run or read ...
            try:
                until, coroutine = waiting.pop(0)
            except IndexError:
                until, coroutine = float('inf'), None
                readable, _, _ = select.select(list(waiting_read), [], [])
            else:
                readable, _, _ = select.select(list(waiting_read), [], [], max(0.0, until - time.time()))
            # ... and select the appropriate one
            if readable and time.time() < until:
                if until and coroutine:
                    waiting.append((until, coroutine))
                    waiting.sort()
                coroutine = waiting_read.pop(readable[0])
            # 3. run this coroutine
            try:
                command = coroutine.send(None)
            except StopIteration:
                continue
            # 1. sort coroutines by their desired suspension ...
            if isinstance(command, AsyncSleep):
                waiting.append((command.until, coroutine))
                waiting.sort(key=lambda item: item[0])
            # ... or register reads
            elif isinstance(command, AsyncRead):
                waiting_read[command.file] = coroutine
    
    async def ready(path, amount=1024*32):
        print('read', path, 'at', '%d' % time.time())
        with open(path, 'rb') as file:
            result = await AsyncRead(file, amount)
        print('done', path, 'at', '%d' % time.time())
        print('got', len(result), 'B')
    
    run(sleepy('background', 5), ready('/dev/urandom'))
    
    id background round 1
    read /dev/urandom at 1530721148
    id background round 2
    id background round 3
    id background round 4
    id background round 5
    done /dev/urandom at 1530721148
    got 1024 B
    
    class AsyncRecv:
        def __init__(self, connection, amount=1, read_buffer=1024):
            assert not connection.getblocking(), 'connection must be non-blocking for async recv'
            self.connection = connection
            self.amount = amount
            self.read_buffer = read_buffer
            self._buffer = b''
    
        def __await__(self):
            while len(self._buffer) < self.amount:
                try:
                    self._buffer += self.connection.recv(self.read_buffer)
                except BlockingIOError:
                    yield self
            return self._buffer
    
        def __repr__(self):
            return '%s(file=%s, amount=%d, progress=%d)' % (
                self.__class__.__name__, self.connection, self.amount, len(self._buffer)
            )
    
    # old
    elif isinstance(command, AsyncRead):
        waiting_read[command.file] = coroutine
    # new
    elif isinstance(command, AsyncRead):
        waiting_read[command.file] = coroutine
    elif isinstance(command, AsyncRecv):
        waiting_read[command.connection] = coroutine
    
    # file
    file = open(path, 'rb')
    # non-blocking socket
    connection = socket.socket()
    connection.setblocking(False)
    # open without blocking - retry on failure
    try:
        connection.connect((url, port))
    except BlockingIOError:
        pass
    
    id background round 1
    read localhost:25000 at 1530783569
    read /dev/urandom at 1530783569
    done localhost:25000 at 1530783569 got 32768 B
    id background round 2
    id background round 3
    id background round 4
    done /dev/urandom at 1530783569 got 4096 B
    id background round 5