Python 多处理脚本比普通脚本慢
我似乎不能对多重处理一窍不通。我试图做一些基本的操作,但多处理脚本似乎要花很长时间Python 多处理脚本比普通脚本慢,python,multiprocessing,Python,Multiprocessing,我似乎不能对多重处理一窍不通。我试图做一些基本的操作,但多处理脚本似乎要花很长时间 import multiprocessing, time, psycopg2 class Consumer(multiprocessing.Process): def __init__(self, task_queue, result_queue): multiprocessing.Process.__init__(self) self.task_queue = tas
import multiprocessing, time, psycopg2
class Consumer(multiprocessing.Process):
def __init__(self, task_queue, result_queue):
multiprocessing.Process.__init__(self)
self.task_queue = task_queue
self.result_queue = result_queue
def run(self):
proc_name = self.name
while True:
next_task = self.task_queue.get()
if next_task is None:
print ('Tasks Complete')
self.task_queue.task_done()
break
answer = next_task()
self.task_queue.task_done()
self.result_queue.put(answer)
return
class Task(object):
def __init__(self, a):
self.a = a
def __call__(self):
#Some more work will go in here but for now just return the value
return self.a
def __str__(self):
return 'ARC'
def run(self):
print ('IN')
if __name__ == '__main__':
start_time = time.time()
numberList = []
for x in range(1000000):
numberList.append(x)
result = []
counter = 0
total = 0
for id in numberList:
total =+ id
counter += 1
print(counter)
print("Finished in Seconds: %s" %(time.time()-start_time))
###############################################################################################################################
#Mutliprocessing starts here....
###############################################################################################################################
start_time = time.time()
tasks = multiprocessing.JoinableQueue()
results = multiprocessing.Queue()
num_consumers = multiprocessing.cpu_count()
consumers = [Consumer(tasks, results) for i in range(num_consumers)]
for w in consumers:
w.start()
num_jobs = len(numberList)
for i in range(num_jobs):
tasks.put(Task(numberList[i]))
for i in range(num_consumers):
tasks.put(None)
print("So far: %s" %(time.time()-start_time))
result = []
while num_jobs:
result.append(results.get())
num_jobs -= 1
print (len(result))
print("Finished in Seconds: %s" %(time.time()-start_time))
原稿来自
第一个基本for循环平均完成时间为0.4秒,多处理for循环平均完成时间为56秒,而我预计它会是vise versa
我是否缺少一些逻辑,或者它实际上比较慢?否则,我将如何构造它以使其比标准for循环更快 通过队列在进程之间传递每个对象会增加开销。现在,您已经测量到一百万个对象的开销为56秒。传递更少、更大的对象将减少开销,但不会消除开销。为了从多处理中获益,与需要传输的数据量相比,每个任务执行的计算应该相对繁重。通过队列将每个对象从一个进程传递到另一个进程会增加开销。现在,您已经测量到一百万个对象的开销为56秒。传递更少、更大的对象将减少开销,但不会消除开销。为了从多处理中获益,与需要传输的数据量相比,每个任务执行的计算应该相对繁重。您的多处理代码实际上设计过度,并且实际上没有完成它应该完成的工作。我把它改写成更简单的,实际上做它应该做的,现在它比简单循环快:
import multiprocessing
import time
def add_list(l):
total = 0
counter = 0
for ent in l:
total += ent
counter += 1
return (total, counter)
def split_list(l, n):
# Split `l` into `n` equal lists.
# Borrowed from http://stackoverflow.com/a/2136090/2073595
return [l[i::n] for i in xrange(n)]
if __name__ == '__main__':
start_time = time.time()
numberList = range(1000000):
counter = 0
total = 0
for id in numberList:
total += id
counter += 1
print(counter)
print(total)
print("Finished in Seconds: %s" %(time.time()-start_time))
start_time = time.time()
num_consumers = multiprocessing.cpu_count()
# Split the list up so that each consumer can add up a subsection of the list.
lists = split_list(numberList, num_consumers)
p = multiprocessing.Pool(num_consumers)
results = p.map(add_list, lists)
total = 0
counter = 0
# Combine the results each worker returned.
for t, c in results:
total += t
counter += c
print(counter)
print(total)
print("Finished in Seconds: %s" %(time.time()-start_time))
以下是输出:
Standard:
1000000
499999500000
Finished in Seconds: 0.272150039673
Multiprocessing:
1000000
499999500000
Finished in Seconds: 0.238755941391
正如@Aruistante所指出的,您的工作负载非常轻,因此这里并没有充分感受到多处理的好处。如果你做的是更繁重的处理,你会看到更大的区别。你的多处理代码确实设计过度,而且实际上没有完成它应该做的工作。我把它改写成更简单的,实际上做它应该做的,现在它比简单循环快:
import multiprocessing
import time
def add_list(l):
total = 0
counter = 0
for ent in l:
total += ent
counter += 1
return (total, counter)
def split_list(l, n):
# Split `l` into `n` equal lists.
# Borrowed from http://stackoverflow.com/a/2136090/2073595
return [l[i::n] for i in xrange(n)]
if __name__ == '__main__':
start_time = time.time()
numberList = range(1000000):
counter = 0
total = 0
for id in numberList:
total += id
counter += 1
print(counter)
print(total)
print("Finished in Seconds: %s" %(time.time()-start_time))
start_time = time.time()
num_consumers = multiprocessing.cpu_count()
# Split the list up so that each consumer can add up a subsection of the list.
lists = split_list(numberList, num_consumers)
p = multiprocessing.Pool(num_consumers)
results = p.map(add_list, lists)
total = 0
counter = 0
# Combine the results each worker returned.
for t, c in results:
total += t
counter += c
print(counter)
print(total)
print("Finished in Seconds: %s" %(time.time()-start_time))
以下是输出:
Standard:
1000000
499999500000
Finished in Seconds: 0.272150039673
Multiprocessing:
1000000
499999500000
Finished in Seconds: 0.238755941391
正如@Aruistante所指出的,您的工作负载非常轻,因此这里并没有充分感受到多处理的好处。如果你做的是更重的处理,你会看到一个更大的区别。分叉子进程(多处理所做的)是非常昂贵的。对于像您这样的玩具工作负载,它将非常慢。分叉子进程(多进程所做的)非常昂贵。对于像你这样的玩具工作负载,它会慢得多。我已经在我的工作流程中采用了这一点,现在我有了一个很高的工作负载,我可以看到不同之处。如果您能够,请在这里看到一个后续问题:我已经在我的工作流程中采用了这一点,现在我有一个很高的工作负载,我可以看到不同之处。如果您可以,请在此处查看后续问题: