Python,使用多进程比不使用它要慢
在花了大量时间尝试多处理之后,我想出了以下代码,这是一个基准测试: 示例1:Python,使用多进程比不使用它要慢,python,multithreading,benchmarking,multiprocess,Python,Multithreading,Benchmarking,Multiprocess,在花了大量时间尝试多处理之后,我想出了以下代码,这是一个基准测试: 示例1: from multiprocessing import Process class Alter(Process): def __init__(self, word): Process.__init__(self) self.word = word self.word2 = '' def run(self): # Alter strin
from multiprocessing import Process
class Alter(Process):
def __init__(self, word):
Process.__init__(self)
self.word = word
self.word2 = ''
def run(self):
# Alter string + test processing speed
for i in range(80000):
self.word2 = self.word2 + self.word
if __name__=='__main__':
# Send a string to be altered
thread1 = Alter('foo')
thread2 = Alter('bar')
thread1.start()
thread2.start()
# wait for both to finish
thread1.join()
thread2.join()
print(thread1.word2)
print(thread2.word2)
word2 = 'foo'
word3 = 'bar'
word = 'foo'
for i in range(80000):
word2 = word2 + word
word = 'bar'
for i in range(80000):
word3 = word3 + word
print(word2)
print(word3)
Building the test list...
Processing the test list using a single process...
Processing the test list using multiple processes...
Single process: 34.73sec
Multiple processes: 24.97sec
这将在2秒钟内完成(多线程处理时间的一半)。出于好奇,我决定下一步运行:
示例2:
from multiprocessing import Process
class Alter(Process):
def __init__(self, word):
Process.__init__(self)
self.word = word
self.word2 = ''
def run(self):
# Alter string + test processing speed
for i in range(80000):
self.word2 = self.word2 + self.word
if __name__=='__main__':
# Send a string to be altered
thread1 = Alter('foo')
thread2 = Alter('bar')
thread1.start()
thread2.start()
# wait for both to finish
thread1.join()
thread2.join()
print(thread1.word2)
print(thread2.word2)
word2 = 'foo'
word3 = 'bar'
word = 'foo'
for i in range(80000):
word2 = word2 + word
word = 'bar'
for i in range(80000):
word3 = word3 + word
print(word2)
print(word3)
Building the test list...
Processing the test list using a single process...
Processing the test list using multiple processes...
Single process: 34.73sec
Multiple processes: 24.97sec
令我恐惧的是,这只持续了不到半秒钟
这是怎么回事?我期望多进程运行得更快——如果示例1将示例2分为两个进程,那么它不应该在示例2的一半时间内完成吗
更新:
在考虑了克里斯的反馈之后,我将“实际”代码包含了最多的处理时间,并引导我考虑多重处理:
self.ListVar = [[13379+ strings],[13379+ strings],
[13379+ strings],[13379+ strings]]
for b in range(len(self.ListVar)):
self.list1 = []
self.temp = []
for n in range(len(self.ListVar[b])):
if not self.ListVar[b][n] in self.temp:
self.list1.insert(n, self.ListVar[b][n] + '(' +
str(self.ListVar[b].count(self.ListVar[b][n])) +
')')
self.temp.insert(0, self.ListVar[b][n])
self.ListVar[b] = list(self.list1)
此示例太小,无法从多处理中获益 启动新流程时会有很多开销。如果涉及繁重的处理,则可以协商。但是您的示例并没有那么密集,因此您一定会注意到开销
您可能会注意到与真实线程的一个更大的区别,太糟糕的python(嗯,CPython)在CPU绑定的线程方面存在问题。ETA:既然您已经发布了代码,我可以告诉您有一种简单的方法可以更快地完成您正在做的事情(>100倍) 我看到您所做的是在字符串列表中的每个项的括号中添加一个频率。不必每次都计算所有元素(正如您可以使用cProfile确认的那样,这是到目前为止代码中最大的瓶颈),您只需创建一个从每个元素到其频率的映射。这样,您只需浏览列表两次—一次创建频率字典,一次使用频率字典添加频率 在这里,我将展示我的新方法,计时,并使用生成的测试用例将其与旧方法进行比较。测试用例甚至显示新结果与旧结果完全相同。注意:下面您真正需要注意的是新的_方法
import random
import time
import collections
import cProfile
LIST_LEN = 14000
def timefunc(f):
t = time.time()
f()
return time.time() - t
def random_string(length=3):
"""Return a random string of given length"""
return "".join([chr(random.randint(65, 90)) for i in range(length)])
class Profiler:
def __init__(self):
self.original = [[random_string() for i in range(LIST_LEN)]
for j in range(4)]
def old_method(self):
self.ListVar = self.original[:]
for b in range(len(self.ListVar)):
self.list1 = []
self.temp = []
for n in range(len(self.ListVar[b])):
if not self.ListVar[b][n] in self.temp:
self.list1.insert(n, self.ListVar[b][n] + '(' + str(self.ListVar[b].count(self.ListVar[b][n])) + ')')
self.temp.insert(0, self.ListVar[b][n])
self.ListVar[b] = list(self.list1)
return self.ListVar
def new_method(self):
self.ListVar = self.original[:]
for i, inner_lst in enumerate(self.ListVar):
freq_dict = collections.defaultdict(int)
# create frequency dictionary
for e in inner_lst:
freq_dict[e] += 1
temp = set()
ret = []
for e in inner_lst:
if e not in temp:
ret.append(e + '(' + str(freq_dict[e]) + ')')
temp.add(e)
self.ListVar[i] = ret
return self.ListVar
def time_and_confirm(self):
"""
Time the old and new methods, and confirm they return the same value
"""
time_a = time.time()
l1 = self.old_method()
time_b = time.time()
l2 = self.new_method()
time_c = time.time()
# confirm that the two are the same
assert l1 == l2, "The old and new methods don't return the same value"
return time_b - time_a, time_c - time_b
p = Profiler()
print p.time_and_confirm()
当我运行它时,它得到的次数是(15.963812112808228,0.0596117973276367),这意味着它大约快250倍,尽管这一优势取决于列表的长度和每个列表中的频率分布。我相信您会同意,有了这种速度优势,您可能不需要使用多处理:)
(以下是我的原始答案,留给后人)
ETA:顺便说一下,值得注意的是,这个算法在列表长度上大致是线性的,而您使用的代码是二次的。这意味着元素的数量越多,它的性能就更有优势。例如,如果将每个列表的长度增加到1000000,则只需5秒即可运行。根据推断,旧代码将占用一天的时间:)
这取决于您正在执行的操作。例如:
import time
NUM_RANGE = 100000000
from multiprocessing import Process
def timefunc(f):
t = time.time()
f()
return time.time() - t
def multi():
class MultiProcess(Process):
def __init__(self):
Process.__init__(self)
def run(self):
# Alter string + test processing speed
for i in xrange(NUM_RANGE):
a = 20 * 20
thread1 = MultiProcess()
thread2 = MultiProcess()
thread1.start()
thread2.start()
thread1.join()
thread2.join()
def single():
for i in xrange(NUM_RANGE):
a = 20 * 20
for i in xrange(NUM_RANGE):
a = 20 * 20
print timefunc(multi) / timefunc(single)
在我的机器上,多处理操作只占用单线程操作的约60%的时间。多处理可能对您正在做的事情有用,但对您考虑使用它的方式没有帮助。由于您基本上是对列表的每个成员进行计算,因此可以使用
multiprocessing.Pool.map
方法对列表成员进行并行计算
下面的示例显示了使用单个进程和使用多处理.Pool.map
时代码的性能:
from multiprocessing import Pool
from random import choice
from string import printable
from time import time
def build_test_list():
# Builds a test list consisting of 5 sublists of 10000 strings each.
# each string is 20 characters long
testlist = [[], [], [], [], []]
for sublist in testlist:
for _ in xrange(10000):
sublist.append(''.join(choice(printable) for _ in xrange(20)))
return testlist
def process_list(l):
# the time-consuming code
result = []
tmp = []
for n in range(len(l)):
if l[n] not in tmp:
result.insert(n, l[n]+' ('+str(l.count(l[n]))+')')
tmp.insert(0, l[n])
return result
def single(l):
# process the test list elements using a single process
results = []
for sublist in l:
results.append(process_list(sublist))
return results
def multi(l):
# process the test list elements in parallel
pool = Pool()
results = pool.map(process_list, l)
return results
print "Building the test list..."
testlist = build_test_list()
print "Processing the test list using a single process..."
starttime = time()
singleresults = single(testlist)
singletime = time() - starttime
print "Processing the test list using multiple processes..."
starttime = time()
multiresults = multi(testlist)
multitime = time() - starttime
# make sure they both return the same thing
assert singleresults == multiresults
print "Single process: {0:.2f}sec".format(singletime)
print "Multiple processes: {0:.2f}sec".format(multitime)
输出:
from multiprocessing import Process
class Alter(Process):
def __init__(self, word):
Process.__init__(self)
self.word = word
self.word2 = ''
def run(self):
# Alter string + test processing speed
for i in range(80000):
self.word2 = self.word2 + self.word
if __name__=='__main__':
# Send a string to be altered
thread1 = Alter('foo')
thread2 = Alter('bar')
thread1.start()
thread2.start()
# wait for both to finish
thread1.join()
thread2.join()
print(thread1.word2)
print(thread2.word2)
word2 = 'foo'
word3 = 'bar'
word = 'foo'
for i in range(80000):
word2 = word2 + word
word = 'bar'
for i in range(80000):
word3 = word3 + word
print(word2)
print(word3)
Building the test list...
Processing the test list using a single process...
Processing the test list using multiple processes...
Single process: 34.73sec
Multiple processes: 24.97sec
这个线程非常有用 只要快速观察一下上面David Robinson提供的良好的第二个代码(于2012年1月8日5:34回答),该代码更适合我当前的需要 在我的例子中,我以前有一个目标函数的运行时间记录,没有进行多处理。当使用他的代码实现一个多处理函数时,他的timepunc(multi)并没有反映multi的实际时间,而是似乎反映了在父进程中花费的时间 我所做的是将计时功能外部化,我得到的时间看起来更像预期:
start = timefunc()
multi()/single()
elapsed = (timefunc()-start)/(--number of workers--)
print(elapsed)
在我使用双核的情况下,“x”工作人员使用目标函数执行的总时间比在目标函数上使用“x”迭代运行简单for循环快两倍
我对多重处理是新的,所以请注意这个观察。你会考虑什么“重处理”。我已将这两个示例的范围增加到100000。示例1在17秒内完成!示例2在0秒内完成。我试图在范围()中走得更高,但示例1实际上在10之后没有返回minutes@Rhys首先,你有一个例子,它会不断侵蚀记忆,这肯定会引起问题。我不知道,真正的CPU限制的处理代码可能是矩阵分解之类的。询问字符串列表(17000个字符串的列表),如果(每个)有任何重复条目。如果是这样的话,在括号中加上字符串项的重复数。。。我应该为此使用多处理吗?里斯:也许你应该发布一段实际代码?我们可能还可以建议其他性能优化。@Rhys让我给你一条最重要的建议:当涉及到优化时,测量,测量,再测量。除非您开始运行探查器来查看瓶颈发生的确切位置,否则一切都是猜测。进程的开销确实比线程高。这是事实。但是我不能百分之百肯定地说这会对你的特定代码产生什么影响。我也不能决定谁来打分:It’你的答案和David的答案都很好。我想我会给他分数,因为他没有那么多,但我相信我将来会使用这段代码。谢谢,我还没学会alot@Rhys没问题;)一旦这有用了,我很高兴。(您可以对多个答案打分,但只能选择一个作为答案。)