Python asyncio实际上是如何工作的?
这个问题的动机是我的另一个问题: 网上有大量关于Python asyncio实际上是如何工作的?,python,python-3.x,python-asyncio,Python,Python 3.x,Python Asyncio,这个问题的动机是我的另一个问题: 网上有大量关于asyncio的文章和博客文章,但它们都很肤浅。我找不到任何关于asyncio是如何实际实现的,以及是什么使I/O异步的信息。我试图阅读源代码,但它不是最高级别的C代码,其中很多涉及辅助对象,但最关键的是,很难在Python语法和它将转换成的C代码之间建立联系 Asycnio自己的文档甚至没有那么有用。那里没有关于它如何工作的信息,只有一些关于如何使用它的指南,这些指南有时也是误导性的/写得非常糟糕 我熟悉Go的协同程序实现,希望Python也能做
asyncio
的文章和博客文章,但它们都很肤浅。我找不到任何关于asyncio
是如何实际实现的,以及是什么使I/O异步的信息。我试图阅读源代码,但它不是最高级别的C代码,其中很多涉及辅助对象,但最关键的是,很难在Python语法和它将转换成的C代码之间建立联系
Asycnio自己的文档甚至没有那么有用。那里没有关于它如何工作的信息,只有一些关于如何使用它的指南,这些指南有时也是误导性的/写得非常糟糕
我熟悉Go的协同程序实现,希望Python也能做到这一点。如果是这样的话,我在上面链接的帖子中找到的代码就可以工作了。既然没有,我现在想弄清楚原因。到目前为止,我最好的猜测如下,请纠正我的错误:
async def foo():…
的过程定义实际上被解释为继承coroutine
的类的方法async def
实际上被await
语句分割成多个方法,在这些语句中调用这些方法的对象能够跟踪到目前为止执行过程中的进度wait
语句)asyncio
语法“分解”为更容易理解的语法:
async def coro(name):
print('before', name)
await asyncio.sleep()
print('after', name)
asyncio.gather(coro('first'), coro('second'))
# translated from async def coro(name)
class Coro(coroutine):
def before(self, name):
print('before', name)
def after(self, name):
print('after', name)
def __init__(self, name):
self.name = name
self.parts = self.before, self.after
self.pos = 0
def __call__():
self.parts[self.pos](self.name)
self.pos += 1
def done(self):
return self.pos == len(self.parts)
# translated from asyncio.gather()
class AsyncIOManager:
def gather(*coros):
while not every(c.done() for c in coros):
coro = random.choice(coros)
coro()
如果我的猜测被证明是正确的,那么我就有问题了。在这种情况下,I/O实际上是如何发生的?在单独的线程中?整个解释器是否挂起,I/O是否发生在解释器外部?I/O的确切含义是什么?如果我的python过程称为C
open()
procedure,它又向内核发送了中断,放弃了对它的控制,那么python解释器如何知道这一点并能够继续运行其他代码,而内核代码执行实际的I/O,直到它唤醒最初发送中断的python过程?Python解释器原则上如何意识到这一点?这一切归结为asyncio正在解决的两个主要挑战:
- 如何在单个线程中执行多个I/O
- 如何实现协作多任务
编辑:处理您对goroutines的评论: 与asyncio中的goroutine最接近的等价物实际上不是一个协同路由,而是一个任务(请参见中的区别)。在python中,协同程序(或生成器)对事件循环或I/O的概念一无所知。它只是一个函数,可以使用
yield
停止其执行,同时保持其当前状态,以便以后可以恢复。yield from
语法允许以透明的方式链接它们
现在,在asyncio任务中,位于链最底层的协同程序总是会产生一个错误。然后,这个未来会出现在事件循环中,并集成到内部机制中。当future设置为由其他内部回调完成时,事件循环可以通过将future发送回协程链来恢复任务
编辑:解决您帖子中的一些问题: 在这种情况下,I/O实际上是如何发生的?在单独的线程中?整个解释器是否挂起,I/O是否发生在解释器外部 不,线程中不会发生任何事情。I/O总是由事件循环管理,主要是通过文件描述符。然而,这些文件描述符的注册通常被高级协同程序隐藏,这会给您带来麻烦 I/O的确切含义是什么?如果我的python过程称为C open()过程,它反过来向内核发送中断,并放弃对它的控制,那么python解释器如何知道这一点并能够继续运行其他一些代码,而内核代码执行实际的I/O,直到它唤醒最初发送中断的python过程?Python解释器原则上如何意识到这种情况
I/O是任何阻塞调用。在asyncio中,所有I/O操作都应该经过事件循环,因为正如您所说的,事件循环无法知道在某些同步代码中正在执行阻塞调用。这意味着您不应该在协同程序的上下文中使用同步
open
。相反,使用一个专用的库,它提供异步版本的open
您的coro
去糖化在概念上是正确的,但有点不完整
await
不会无条件挂起,但只有在遇到阻塞调用时才会挂起。它如何知道呼叫被阻塞?这是由等待的代码决定的。例如,套接字读取的一个等待实现可以被设计为:
def read(sock, n):
# sock must be in non-blocking mode
try:
return sock.recv(n)
except EWOULDBLOCK:
event_loop.add_reader(sock.fileno, current_task())
return SUSPEND
在real asyncio中,修改未来的状态,而不是返回m
data = await read(sock, 1024)
data = read(sock, 1024)
if data is SUSPEND:
return SUSPEND
self.pos += 1
self.parts[self.pos](...)
>>> def test():
... yield 1
... yield 2
...
>>> gen = test()
>>> next(gen)
1
>>> next(gen)
2
>>> next(gen)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> def test():
... val = yield 1
... print(val)
... yield 2
... yield 3
...
>>> gen = test()
>>> next(gen)
1
>>> gen.send("abc")
abc
2
>>> gen.throw(Exception())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in test
Exception
>>> def test():
... yield 1
... return "abc"
...
>>> gen = test()
>>> next(gen)
1
>>> try:
... next(gen)
... except StopIteration as exc:
... print(exc.value)
...
abc
>>> def inner():
... inner_result = yield 2
... print('inner', inner_result)
... return 3
...
>>> def outer():
... yield 1
... val = yield from inner()
... print('outer', val)
... yield 4
...
>>> gen = outer()
>>> next(gen)
1
>>> next(gen) # Goes inside inner() automatically
2
>>> gen.send("abc")
inner abc
outer 3
4
async def inner():
return 1
async def outer():
await inner()
def subfoo(bar):
qux = 3
return qux * bar
root -\
: \- subfoo --\
:/--<---return --/
|
V
def cofoo(bar):
qux = yield bar # yield marks a break point
return qux
root -\
: \- cofoo --\
:/--<+--yield --/
| :
V :
def wrap():
yield 'before'
yield from cofoo()
yield 'after'
root -\
: \-> coro_a -yield-from-> coro_b --\
:/ <-+------------------------yield ---/
| :
:\ --+-- coro_a.send----------yield ---\
: coro_b <-/
def foo(): # subroutine?
return None
def foo(): # coroutine?
yield from foofoo() # generator? coroutine?
async def foo(): # coroutine!
await foofoo() # coroutine!
return None
loop -\
: \-> coroutine --await--> event --\
:/ <-+----------------------- yield --/
| :
| : # loop waits for event to happen
| :
:\ --+-- send(reply) -------- yield --\
: coroutine <--yield-- event <-/
class AsyncSleep:
"""Event to sleep until a point in time"""
def __init__(self, until: float):
self.until = until
# used whenever someone ``await``s an instance of this Event
def __await__(self):
# yield this Event to the loop
yield self
def __repr__(self):
return '%s(until=%.1f)' % (self.__class__.__name__, self.until)
import time
async def asleep(duration: float):
"""await that ``duration`` seconds pass"""
await AsyncSleep(time.time() + duration / 2)
await AsyncSleep(time.time() + duration / 2)
coroutine = asleep(100)
while True:
print(coroutine.send(None))
time.sleep(0.1)
def run(*coroutines):
"""Cooperatively run all ``coroutines`` until completion"""
# store wake-up-time and coroutines
waiting = [(0, coroutine) for coroutine in coroutines]
while waiting:
# 2. pick the first coroutine that wants to wake up
until, coroutine = waiting.pop(0)
# 3. wait until this point in time
time.sleep(max(0.0, until - time.time()))
# 4. run this coroutine
try:
command = coroutine.send(None)
except StopIteration:
continue
# 1. sort coroutines by their desired suspension
if isinstance(command, AsyncSleep):
waiting.append((command.until, coroutine))
waiting.sort(key=lambda item: item[0])
async def sleepy(identifier: str = "coroutine", count=5):
for i in range(count):
print(identifier, 'step', i + 1, 'at %.2f' % time.time())
await asleep(0.1)
run(*(sleepy("coroutine %d" % j) for j in range(5)))
readable, writeable, _ = select.select(rlist, wlist, xlist, timeout)
write_target = open('/tmp/foo')
readable, writeable, _ = select.select([], [write_target], [])
class AsyncRead:
def __init__(self, file, amount=1):
self.file = file
self.amount = amount
self._buffer = ''
def __await__(self):
while len(self._buffer) < self.amount:
yield self
# we only get here if ``read`` should not block
self._buffer += self.file.read(1)
return self._buffer
def __repr__(self):
return '%s(file=%s, amount=%d, progress=%d)' % (
self.__class__.__name__, self.file, self.amount, len(self._buffer)
)
# new
waiting_read = {} # type: Dict[file, coroutine]
# old
time.sleep(max(0.0, until - time.time()))
# new
readable, _, _ = select.select(list(reads), [], [])
# new - reschedule waiting coroutine, run readable coroutine
if readable:
waiting.append((until, coroutine))
waiting.sort()
coroutine = waiting_read[readable[0]]
# new
if isinstance(command, AsyncSleep):
...
elif isinstance(command, AsyncRead):
...
def run(*coroutines):
"""Cooperatively run all ``coroutines`` until completion"""
waiting_read = {} # type: Dict[file, coroutine]
waiting = [(0, coroutine) for coroutine in coroutines]
while waiting or waiting_read:
# 2. wait until the next coroutine may run or read ...
try:
until, coroutine = waiting.pop(0)
except IndexError:
until, coroutine = float('inf'), None
readable, _, _ = select.select(list(waiting_read), [], [])
else:
readable, _, _ = select.select(list(waiting_read), [], [], max(0.0, until - time.time()))
# ... and select the appropriate one
if readable and time.time() < until:
if until and coroutine:
waiting.append((until, coroutine))
waiting.sort()
coroutine = waiting_read.pop(readable[0])
# 3. run this coroutine
try:
command = coroutine.send(None)
except StopIteration:
continue
# 1. sort coroutines by their desired suspension ...
if isinstance(command, AsyncSleep):
waiting.append((command.until, coroutine))
waiting.sort(key=lambda item: item[0])
# ... or register reads
elif isinstance(command, AsyncRead):
waiting_read[command.file] = coroutine
async def ready(path, amount=1024*32):
print('read', path, 'at', '%d' % time.time())
with open(path, 'rb') as file:
result = await AsyncRead(file, amount)
print('done', path, 'at', '%d' % time.time())
print('got', len(result), 'B')
run(sleepy('background', 5), ready('/dev/urandom'))
id background round 1
read /dev/urandom at 1530721148
id background round 2
id background round 3
id background round 4
id background round 5
done /dev/urandom at 1530721148
got 1024 B
class AsyncRecv:
def __init__(self, connection, amount=1, read_buffer=1024):
assert not connection.getblocking(), 'connection must be non-blocking for async recv'
self.connection = connection
self.amount = amount
self.read_buffer = read_buffer
self._buffer = b''
def __await__(self):
while len(self._buffer) < self.amount:
try:
self._buffer += self.connection.recv(self.read_buffer)
except BlockingIOError:
yield self
return self._buffer
def __repr__(self):
return '%s(file=%s, amount=%d, progress=%d)' % (
self.__class__.__name__, self.connection, self.amount, len(self._buffer)
)
# old
elif isinstance(command, AsyncRead):
waiting_read[command.file] = coroutine
# new
elif isinstance(command, AsyncRead):
waiting_read[command.file] = coroutine
elif isinstance(command, AsyncRecv):
waiting_read[command.connection] = coroutine
# file
file = open(path, 'rb')
# non-blocking socket
connection = socket.socket()
connection.setblocking(False)
# open without blocking - retry on failure
try:
connection.connect((url, port))
except BlockingIOError:
pass
id background round 1
read localhost:25000 at 1530783569
read /dev/urandom at 1530783569
done localhost:25000 at 1530783569 got 32768 B
id background round 2
id background round 3
id background round 4
done /dev/urandom at 1530783569 got 4096 B
id background round 5