Python 延迟列表达到最大递归深度
我有大量文档要上传到MongoDB中(可能n>100000)。我不想一次创建100000个延迟,但我不想按顺序执行和等待每个查询,因为我有一个到MongoDB的连接池,我想充分利用它。因此,我有一个生成器函数,它将生成延迟,以供Python 延迟列表达到最大递归深度,python,twisted,twisted.internet,Python,Twisted,Twisted.internet,我有大量文档要上传到MongoDB中(可能n>100000)。我不想一次创建100000个延迟,但我不想按顺序执行和等待每个查询,因为我有一个到MongoDB的连接池,我想充分利用它。因此,我有一个生成器函数,它将生成延迟,以供delferredlazylist使用 def generate_update_deferreds(collection, many_docs): for doc in many_docs: d = collection.update({'_id'
delferredlazylist
使用
def generate_update_deferreds(collection, many_docs):
for doc in many_docs:
d = collection.update({'_id': doc['_id']}, doc, upsert=True)
yield d
@defer.inlineCallbacks
def update_docs(collection, many_docs):
gen_deferreds = generate_update_deferreds(collection, many_docs)
results = yield DeferredLazyList(gen_deferreds, count=pool_size, consume_errors=True)
这是链接延迟升级的生成和延迟懒散列表的代码
def generate_update_deferreds(collection, many_docs):
for doc in many_docs:
d = collection.update({'_id': doc['_id']}, doc, upsert=True)
yield d
@defer.inlineCallbacks
def update_docs(collection, many_docs):
gen_deferreds = generate_update_deferreds(collection, many_docs)
results = yield DeferredLazyList(gen_deferreds, count=pool_size, consume_errors=True)
DeferredLazyList
与类似,但它不接受延迟列表以等待,而是接受迭代器。延迟从迭代器中检索,同时只有count
延迟处于活动状态。这用于有效地批量延迟,因为它们是在生成时创建的
class DeferredLazyList(defer.Deferred):
"""
The ``DeferredLazyList`` class is used for collecting the results of
many deferreds. This is similar to ``DeferredList``
(``twisted.internet.defer.DeferredList``) but works with an iterator
yielding deferreds. This will only maintain a certain number of
deferreds simultaneously. Once one of the deferreds finishes, another
will be obtained from the iterator.
"""
def __init__(self, deferreds, count=None, consume_errors=None):
defer.Deferred.__init__(self)
if count is None:
count = 1
self.__consume_errors = bool(consume_errors)
self.__iter = enumerate(deferreds)
self.__results = []
for _i in xrange(count):
# Start specified number of simultaneous deferreds.
if not self.called:
self.__next_save_result(None, None, None)
else:
break
def __next_save_result(self, result, success, index):
"""
Called when a deferred completes.
"""
# Make sure we can save result at index.
if index is not None:
results_len = len(self.__results)
if results_len <= index:
self.__results += [NO_RESULT] * (index - results_len + 1)
# Save result.
self.__results[index] = (success, result)
# Get next deferred.
try:
i, d = self.__iter.next()
d.addCallbacks(self.__next_save_result, self.__next_save_result, callbackArgs=(True, i), errbackArgs=(False, i))
except StopIteration:
# Iterator is exhausted, callback self with results.
self.callback(self.__results)
# Pass through result.
return result if success or not self.__consume_errors else None
任何帮助都将不胜感激。顺便说一下,我使用Twisted 12.1.0运行Python2.7.3,MongoDB的内容实际上只与理解上下文相关
我希望从每个延迟的任务中得到结果,但是
cooperative()
不会返回这些结果,因此我在将每个延迟的任务返回到CooperativeTask
s之前,为它们添加了一个回调:
from twisted.internet.defer import DeferredList, inlineCallbacks
from twisted.internet.task import cooperate
NO_RESULT = object()
def generate_update_deferreds(collection, many_docs, save_results):
for i, doc in enumerate(update_docs):
d = collection.update({'_id': doc['_id']}, doc, upsert=True)
d.addBoth(save_result, i, save_results) # Save result
yield d
def save_result(result, i, save_results):
save_results[i] = result
@inlineCallbacks
def update_docs(collection, many_docs):
save_results = [NO_RESULT] * len(many_docs)
gen_deferreds = generate_update_deferreds(collection, many_docs, save_results))
workers = [cooperate(gen_deferreds).whenDone() for _i in xrange(count)]
yield defer.DeferredList(workers)
# Handle save_results...
Twisted中有一些工具可以帮助您更轻松地执行此操作。例如,合作:
from twisted.internet.task import cooperate
def generate_update_deferreds(collection, many_docs):
for doc in update_docs:
d = collection.update({'_id': doc['_id']}, doc, upsert=True)
yield d
work = generate_update_deferreds(...)
worker_tasks = []
for i in range(count):
task = cooperate(work)
worker_tasks.append(task)
all_done_deferred = DeferredList([task.whenDone() for task in worker_tasks])