Python 同时运行此循环的最佳方法？_Python_Python 3.x_Multithreading_Performance_Multiprocessing

Python 同时运行此循环的最佳方法？

python python-3.x multithreading performance

Python 同时运行此循环的最佳方法？,python,python-3.x,multithreading,performance,multiprocessing,Python,Python 3.x,Multithreading,Performance,Multiprocessing,我有以下代码： data = [2,5,3,16,2,5] def f(x): return 2*x f_total = 0 for x in data: f_total += f(x) print(f_total/len(data)) 我想加速for循环。（实际上，代码更复杂，我想在一台具有许多处理核心的超级计算机上运行它）。我已经读到，我可以通过多处理库做到这一点，在那里我可以让python3同时运行不同的循环块，但我对它有点迷茫你能

我有以下代码：

data = [2,5,3,16,2,5]        

def f(x):       
    return 2*x

f_total = 0
for x in data:
    f_total += f(x)

print(f_total/len(data))

我想加速for循环。（实际上，代码更复杂，我想在一台具有许多处理核心的超级计算机上运行它）。我已经读到，我可以通过

多处理

库做到这一点，在那里我可以让python3同时运行不同的循环块，但我对它有点迷茫

你能给我解释一下如何用这个最小版本的程序来做吗

谢谢

import multiprocessing
from numpy import random

"""
This mentions the number of worker threads that you want to run in parallel.
Depending on the number of cores in your system you should choose the appropriate
number of threads. When you call 'map' function it will distribute the input
values in that many parts
"""
NUM_CORES = 6
data = random.rand(100, 1)

"""
+2 so that the cores are not left idle in case a thread is waiting for I/O. 
Choose by performing an empirical analysis depending on the function you are trying to compute.
It could match up to NUM_CORES as well. You can vary the chunksize as well depending on the size of 'data' that you have. 
"""
NUM_THREADS = NUM_CORES+2
CHUNKSIZE = int(len(data)/(NUM_THREADS))    


def f(x):       
    return 2*x

# This takes care of creating pool of worker threads which will be assigned the jobs
pool = multiprocessing.Pool(NUM_THREADS)

# map vs imap. If the data is large go for imap else map is also good.
it = pool.imap(f, data, chunksize=CHUNKSIZE)

f_total = 0
# Iterate and sum up the result
for value in it:
    f_total += sum(value)

print(f_total/len(data))

这是否回答了您的问题？这台超级计算机是我研究所的@ranka47它可能会回答我的问题，但我不能完全理解它，也许一个更详细/更简单的答案对我有用？谢谢你这么详细的回答！所以我想我可以想象一个工人是我电脑中的一个单一核心，在一个特定的独立任务中工作，或者可以是我喜欢的任何数字？如果是这样，如何明智地选择工作人员的数量？另外，如果

int（len（data）/（NUM_CORES-2））

不等于

len（data）

，python会知道它需要为一些工作人员分配一些额外的迭代来完全消耗

数据吗？最后，我不认为需要sum（value）
，仅仅做f_total+=value
就足够了，因为value
已经是一个数字了？奖励：我一直在玩NUM_CORES
，发现即使我的电脑有8个（通过os.cpu_count（）
），如果我输入一个更大的数字（不是更大），比如NUM_CORES=10
，我也能获得更好的性能（至少对于这个愚蠢的例子来说，尽管数据量更大）。如何为NUM_CORES
选择最佳数量？（我想这也与我的第一个问题有关）它是sum（value）
，因为imap
正在返回一个列表。您也可以将其替换为值[0]
。关于NUM_线程的选择，我错了。你可以给更多的价值，但在一定程度上。我不知道如何选择线程数的公式。我建议做一个实证分析。更重要的是当线程等待I/O时使用core。然而，由于上下文切换，非常高的值可能会增加开销。