Python 如何在保存数据时在for循环中使用多处理池？_Python_Multiprocessing_Pool

Python 如何在保存数据时在for循环中使用多处理池？

python

Python 如何在保存数据时在for循环中使用多处理池？,python,multiprocessing,pool,Python,Multiprocessing,Pool,我有一些数据，我正试图在其中应用multiprocessing.pool，因为我有一台有16个处理器的机器。我在这里生成一些伪数据： y = pd.Series(np.random.randint(400, high=600, size=1250)) date_today = datetime.now() x = pd.date_range(date_today, date_today + timedelta(1250), freq='D') data = pd.DataFrame(colum

我有一些数据，我正试图在其中应用multiprocessing.pool，因为我有一台有16个处理器的机器。我在这里生成一些伪数据：

y = pd.Series(np.random.randint(400, high=600, size=1250))
date_today = datetime.now()
x = pd.date_range(date_today, date_today + timedelta(1250), freq='D')
data = pd.DataFrame(columns=['Date','Price'])
data['Date'] = x
data['Price'] = y
d={name: group for name, group in data.groupby(np.arange(len(data)) // (len(data)))}

我真正想要的是在for循环参数中应用pool。因此，使用每个常量的处理器：

parameters = range(300,550,50)
portfolio = pd.DataFrame(columns=['Parameter','Date','Price','Calculation'])
for key, value in sorted(d.items()):
    for constante in parameters:
        print('Constante:',constante)
        # HERE I WANT TO USE MP.POOL()

在代码中，我使用某种类型的移动窗口来执行计算。这是代码的最简单版本。因此，我想在写入DF时，为参数中的每个常量分配一个进程。如何做到这一点

您可能希望使用类似这样的

多处理.pool.map

，不过您可能需要根据需要进行调整

from functools import partial
from multiprocessing import Pool

def pool_map_fn(value=None, constante=None, i=None):
    s = {'val': value[i:i+constante]}
    window = pd.concat([s['val']['Date'],s['val']['Price']], axis=1)
    window['Price'] = pd.to_numeric(window['Price'], errors='coerce').fillna(0)
    calc = window['Price'].mean()                                        
    date_variable = window['Date'].iloc[-1]
    price_var = window['Price'].iloc[-1]
    if price_var < calc:
        print('Parameter',constante,'Lower than average',date_variable,price_var,calc)  
        portfolio = portfolio.append({'Parameter': constante,
                                      'Date': date_variable, 
                                      'Price': price_var,
                                      'Calculation': calc}, ignore_index=True)
    if price_var > calc:
        print('Parameter',constante,'Higher than average',date_variable,price_var,calc)

parameters = range(300,550,50)
portfolio = pd.DataFrame(columns=['Parameter','Date','Price','Calculation'])
for key, value in sorted(d.items()):
    for constante in parameters:
        with Pool() as pool:
            results = pool.map(partial(pool_map_fn, value=value, constante=constante),
                               range(len(value) - constante + 1))

从functools导入部分
来自多处理导入池
def池映射（值=无，常数=无，i=无）：
s={'val'：值[i:i+constante]}
window=pd.concat（[s['val']['Date']，s['val']['Price']]，axis=1）
window['Price']=pd.to_numeric（window['Price']，errors='concurve'）。fillna（0）
计算=窗口['Price'].平均值（）
日期变量=窗口['date'].iloc[-1]
price_var=window['price'].iloc[-1]
如果价格变量<计算：
打印（'Parameter'，constante，'Lower by average'，日期变量，价格变量，计算）
portfolio=portfolio.append（{'Parameter'：constante，
“日期”：日期变量，
“价格”：价格变量，
“计算”：计算}，忽略_索引=真）
如果价格变量>计算：
打印（'参数'，常量，'高于平均值'，日期变量，价格变量，计算）
参数=范围（300550,50）
portfolio=pd.DataFrame（列=['Parameter'，'Date'，'Price'，'Calculation']）
对于键，排序后的值（d.items（））：
对于参数中的constante：
使用Pool（）作为池：
结果=pool.map（部分（pool\u map\u fn，value=value，constante=constante），
范围（长度（值）-恒定+1）

注意：这是未经测试的，但应该有效，如果出现错误，请尝试解决它们，因为概念应该是合理的。

TypeError:pool_map_fn（）为参数“value”获取了多个值；与此错误抗争尝试将

arg移动到该fn的第一个arg，我可能已经搞糟了分部接收arg的方式，这大部分是在我的头脑中。是的，这解决了它，但是获得局部变量引用错误（投资组合）；你建议使用全局参数吗？还有一个问题；是否有一个变量参数可以用来指定要使用的处理器的具体数量？编辑：您可能已经理解了这个问题。代码行“print（'Parameter'，constante，'Lower by average'，date_variable，price_var，calc）”和参数：constante的输出不是会有所不同吗？比如说，如果我使用4个处理器，并行参数300、350、400和450？JayDough我的答案是低质量队列-我把你的最后一个逗号标记为粗鲁。如果答案有错误：请求修复。如果答案不适合你，请投否决票。不要要求ppl删除他们投入时间的事情，那只是没有完成。没有人强迫你接受这个答案。