Python 多处理脚本比普通脚本慢

Python 多处理脚本比普通脚本慢,python,multiprocessing,Python,Multiprocessing,我似乎不能对多重处理一窍不通。我试图做一些基本的操作,但多处理脚本似乎要花很长时间 import multiprocessing, time, psycopg2 class Consumer(multiprocessing.Process): def __init__(self, task_queue, result_queue): multiprocessing.Process.__init__(self) self.task_queue = tas

我似乎不能对多重处理一窍不通。我试图做一些基本的操作,但多处理脚本似乎要花很长时间

import multiprocessing, time, psycopg2

class Consumer(multiprocessing.Process):

    def __init__(self, task_queue, result_queue):
        multiprocessing.Process.__init__(self)
        self.task_queue = task_queue
        self.result_queue = result_queue

    def run(self):
        proc_name = self.name
        while True:
            next_task = self.task_queue.get()
            if next_task is None:
                print ('Tasks Complete')
                self.task_queue.task_done()
                break            
            answer = next_task()
            self.task_queue.task_done()
            self.result_queue.put(answer)
        return

class Task(object):
    def __init__(self, a):
        self.a = a

    def __call__(self):
        #Some more work will go in here but for now just return the value
        return self.a

    def __str__(self):
        return 'ARC'
    def run(self):
        print ('IN')


if __name__ == '__main__':
    start_time = time.time()
    numberList = []

    for x in range(1000000):
        numberList.append(x) 

    result = []
    counter = 0
    total = 0
    for id in numberList:
        total =+ id
        counter += 1
    print(counter)
    print("Finished in Seconds: %s" %(time.time()-start_time))
    ###############################################################################################################################
    #Mutliprocessing starts here....
    ###############################################################################################################################        
    start_time = time.time()
    tasks = multiprocessing.JoinableQueue()
    results = multiprocessing.Queue()

    num_consumers = multiprocessing.cpu_count() 
    consumers = [Consumer(tasks, results) for i in range(num_consumers)]
    for w in consumers:
        w.start()

    num_jobs = len(numberList)

    for i in range(num_jobs):
        tasks.put(Task(numberList[i]))

    for i in range(num_consumers):
        tasks.put(None)

    print("So far: %s" %(time.time()-start_time))
    result = []
    while num_jobs:
        result.append(results.get())
        num_jobs -= 1
    print (len(result))
    print("Finished in Seconds: %s" %(time.time()-start_time))
原稿来自

第一个基本for循环平均完成时间为0.4秒,多处理for循环平均完成时间为56秒,而我预计它会是vise versa


我是否缺少一些逻辑,或者它实际上比较慢?否则,我将如何构造它以使其比标准for循环更快

通过队列在进程之间传递每个对象会增加开销。现在,您已经测量到一百万个对象的开销为56秒。传递更少、更大的对象将减少开销,但不会消除开销。为了从多处理中获益,与需要传输的数据量相比,每个任务执行的计算应该相对繁重。

通过队列将每个对象从一个进程传递到另一个进程会增加开销。现在,您已经测量到一百万个对象的开销为56秒。传递更少、更大的对象将减少开销,但不会消除开销。为了从多处理中获益,与需要传输的数据量相比,每个任务执行的计算应该相对繁重。

您的多处理代码实际上设计过度,并且实际上没有完成它应该完成的工作。我把它改写成更简单的,实际上做它应该做的,现在它比简单循环快:

import multiprocessing
import time


def add_list(l):
    total = 0 
    counter = 0 
    for ent in l:
        total += ent 
        counter += 1
    return (total, counter)

def split_list(l, n): 
    # Split `l` into `n` equal lists.
    # Borrowed from http://stackoverflow.com/a/2136090/2073595
    return [l[i::n] for i in xrange(n)]

if __name__ == '__main__':
    start_time = time.time()
    numberList = range(1000000):

    counter = 0 
    total = 0 
    for id in numberList:
        total += id
        counter += 1
    print(counter)
    print(total)
    print("Finished in Seconds: %s" %(time.time()-start_time))
    start_time = time.time()

    num_consumers = multiprocessing.cpu_count() 
    # Split the list up so that each consumer can add up a subsection of the list.
    lists = split_list(numberList, num_consumers)
    p = multiprocessing.Pool(num_consumers)
    results = p.map(add_list, lists)
    total = 0 
    counter = 0 
    # Combine the results each worker returned.
    for t, c in results:
        total += t
        counter += c
    print(counter)
    print(total)

    print("Finished in Seconds: %s" %(time.time()-start_time))
以下是输出:

Standard:
1000000
499999500000
Finished in Seconds: 0.272150039673
Multiprocessing:
1000000
499999500000
Finished in Seconds: 0.238755941391

正如@Aruistante所指出的,您的工作负载非常轻,因此这里并没有充分感受到多处理的好处。如果你做的是更繁重的处理,你会看到更大的区别。

你的多处理代码确实设计过度,而且实际上没有完成它应该做的工作。我把它改写成更简单的,实际上做它应该做的,现在它比简单循环快:

import multiprocessing
import time


def add_list(l):
    total = 0 
    counter = 0 
    for ent in l:
        total += ent 
        counter += 1
    return (total, counter)

def split_list(l, n): 
    # Split `l` into `n` equal lists.
    # Borrowed from http://stackoverflow.com/a/2136090/2073595
    return [l[i::n] for i in xrange(n)]

if __name__ == '__main__':
    start_time = time.time()
    numberList = range(1000000):

    counter = 0 
    total = 0 
    for id in numberList:
        total += id
        counter += 1
    print(counter)
    print(total)
    print("Finished in Seconds: %s" %(time.time()-start_time))
    start_time = time.time()

    num_consumers = multiprocessing.cpu_count() 
    # Split the list up so that each consumer can add up a subsection of the list.
    lists = split_list(numberList, num_consumers)
    p = multiprocessing.Pool(num_consumers)
    results = p.map(add_list, lists)
    total = 0 
    counter = 0 
    # Combine the results each worker returned.
    for t, c in results:
        total += t
        counter += c
    print(counter)
    print(total)

    print("Finished in Seconds: %s" %(time.time()-start_time))
以下是输出:

Standard:
1000000
499999500000
Finished in Seconds: 0.272150039673
Multiprocessing:
1000000
499999500000
Finished in Seconds: 0.238755941391

正如@Aruistante所指出的,您的工作负载非常轻,因此这里并没有充分感受到多处理的好处。如果你做的是更重的处理,你会看到一个更大的区别。

分叉子进程(多处理所做的)是非常昂贵的。对于像您这样的玩具工作负载,它将非常慢。分叉子进程(多进程所做的)非常昂贵。对于像你这样的玩具工作负载,它会慢得多。我已经在我的工作流程中采用了这一点,现在我有了一个很高的工作负载,我可以看到不同之处。如果您能够,请在这里看到一个后续问题:我已经在我的工作流程中采用了这一点,现在我有一个很高的工作负载,我可以看到不同之处。如果您可以,请在此处查看后续问题: