如何获取“的数量”;工作“;让Python多处理池来完成?
到目前为止,每当我需要使用时,我都是通过手动创建一个“流程池”并与所有子流程共享一个工作队列来实现的 例如:如何获取“的数量”;工作“;让Python多处理池来完成?,python,process,parallel-processing,multiprocessing,pool,Python,Process,Parallel Processing,Multiprocessing,Pool,到目前为止,每当我需要使用时,我都是通过手动创建一个“流程池”并与所有子流程共享一个工作队列来实现的 例如: from multiprocessing import Process, Queue class MyClass: def __init__(self, num_processes): self._log = logging.getLogger() self.process_list = [] self.wor
from multiprocessing import Process, Queue
class MyClass:
def __init__(self, num_processes):
self._log = logging.getLogger()
self.process_list = []
self.work_queue = Queue()
for i in range(num_processes):
p_name = 'CPU_%02d' % (i+1)
self._log.info('Initializing process %s', p_name)
p = Process(target = do_stuff,
args = (self.work_queue, 'arg1'),
name = p_name)
通过这种方式,我可以向队列中添加内容,这些内容将由子流程使用。然后,我可以通过检查队列.qsize()
,来监视处理的距离:
现在我想这可以简化很多代码
我不知道的是,我如何监控还有多少“工作”要做
以以下为例:
from multiprocessing import Pool
class MyClass:
def __init__(self, num_processes):
self.process_pool = Pool(num_processes)
# ...
result_list = []
for i in range(1000):
result = self.process_pool.apply_async(do_stuff, ('arg1',))
result_list.append(result)
# ---> here: how do I monitor the Pool's processing progress?
# ...?
有什么想法吗?从文档中,我觉得您要做的是以列表或其他顺序收集
结果,然后迭代结果列表检查就绪
,以构建输出列表。然后,您可以通过比较未处于就绪状态的剩余结果对象的数量与调度的作业总数来计算处理状态。请参见使用管理器队列。这是工作进程之间共享的队列。如果使用普通队列,则每个工作进程都会对其进行pickle和unpickle处理,并因此进行复制,因此每个工作进程都无法更新队列
然后让工作人员向队列中添加内容,并在工作人员工作时监视队列的状态。您需要使用map\u async
执行此操作,因为这样可以查看整个结果何时准备就绪,从而可以中断监视循环
例如:
import time
from multiprocessing import Pool, Manager
def play_function(args):
"""Mock function, that takes a single argument consisting
of (input, queue). Alternately, you could use another function
as a wrapper.
"""
i, q = args
time.sleep(0.1) # mock work
q.put(i)
return i
p = Pool()
m = Manager()
q = m.Queue()
inputs = range(20)
args = [(i, q) for i in inputs]
result = p.map_async(play_function, args)
# monitor loop
while True:
if result.ready():
break
else:
size = q.qsize()
print(size)
time.sleep(0.1)
outputs = result.get()
我提出了下面的异步调用解决方案
这是一个微不足道的玩具脚本示例,但我认为应该广泛应用
基本上,在无限循环中,在列表生成器中轮询结果对象的就绪值并求和,以获得剩余调度池任务数
一旦没有剩余,则断开并连接()&关闭()
根据需要在循环中添加睡眠
原理与上述解决方案相同,但没有队列。如果您还跟踪最初向池发送的任务数量,则可以计算完成百分比等
import multiprocessing
import os
import time
from random import randrange
def worker():
print os.getpid()
#simulate work
time.sleep(randrange(5))
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=8)
result_objs = []
print "Begin dispatching work"
task_count = 10
for x in range(task_count):
result_objs.append(pool.apply_async(func=worker))
print "Done dispatching work"
while True:
incomplete_count = sum(1 for x in result_objs if not x.ready())
if incomplete_count == 0:
print "All done"
break
print str(incomplete_count) + " Tasks Remaining"
print str(float(task_count - incomplete_count) / task_count * 100) + "% Complete"
time.sleep(.25)
pool.close()
pool.join()
我也遇到了同样的问题,并为MapResult对象提出了一个稍微简单的解决方案(尽管使用内部MapResult数据)
请注意,剩余值并不总是精确的,因为区块大小通常根据要处理的项目数向上取整
您可以使用pool.map\u async(get\u stuff,todo,chunksize=1)来实现这一点。
import multiprocessing
import os
import time
from random import randrange
def worker():
print os.getpid()
#simulate work
time.sleep(randrange(5))
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=8)
result_objs = []
print "Begin dispatching work"
task_count = 10
for x in range(task_count):
result_objs.append(pool.apply_async(func=worker))
print "Done dispatching work"
while True:
incomplete_count = sum(1 for x in result_objs if not x.ready())
if incomplete_count == 0:
print "All done"
break
print str(incomplete_count) + " Tasks Remaining"
print str(float(task_count - incomplete_count) / task_count * 100) + "% Complete"
time.sleep(.25)
pool.close()
pool.join()
pool = Pool(POOL_SIZE)
result = pool.map_async(get_stuff, todo)
while not result.ready():
remaining = result._number_left * result._chunksize
sys.stderr.write('\r\033[2KRemaining: %d' % remaining)
sys.stderr.flush()
sleep(.1)
print >> sys.stderr, '\r\033[2KRemaining: 0'