Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/296.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python multiprocessing.Pool.map抛出MemoryError_Python_Memory_Multiprocessing_Python Multiprocessing_Reinforcement Learning - Fatal编程技术网

Python multiprocessing.Pool.map抛出MemoryError

Python multiprocessing.Pool.map抛出MemoryError,python,memory,multiprocessing,python-multiprocessing,reinforcement-learning,Python,Memory,Multiprocessing,Python Multiprocessing,Reinforcement Learning,我正在重写一个强化学习框架,从串行代码执行到并行(多处理),以减少训练时间。它可以工作,但经过几个小时的训练后,会抛出一个MemoryError。我尝试在每个循环之后添加gc.collect,没有任何更改 下面是利用多处理的for循环: for episode in episodes: env.episode = episode flex_list = [0,1,2]

我正在重写一个强化学习框架,从串行代码执行到并行(多处理),以减少训练时间。它可以工作,但经过几个小时的训练后,会抛出一个
MemoryError
。我尝试在每个循环之后添加
gc.collect
,没有任何更改

下面是利用多处理的for循环:

for episode in episodes:
    env.episode = episode
    flex_list = [0,1,2]                                                                                          
    for machine in env.list_of_machines:                                                                            
        flex_plan = []                                                                                              
        for time_step in range(0,env.steplength):
            flex_plan.append(random.choice(flex_list))
        machine.flex_plan = flex_plan
    env.current_step = 0                                                                                            
    steps = []
    state = env.reset(restricted=True)                                                                              
    steps.append(state)

    # multiprocessing part, has condition to use a specific amount of CPUs or 'all' of them
    ####################################################
    func_part = partial(parallel_pool, episode=episode, episodes=episodes, env=env, agent=agent, state=state, log_data_qvalues=log_data_qvalues, log_data=log_data, steps=steps)
    if CPUs_used == 'all':
        mp.Pool().map(func_part, range(env.steplength-1))
    else:
        mp.Pool(CPUs_used).map(func_part, range(env.steplength-1))
    ############################################################
    # model is saved periodically, not only in the end
    save_interval = 100 #set episode interval to save models
    if (episode + 1) % save_interval == 0:
        agent.save_model(f'models/model_{filename}_{episode + 1}')
        print(f'model saved at episode {episode + 1}')

    plt.close()
    gc.collect()
26集训练后的输出:

Episode: 26/100   Action: 1/11    Phase: 3/3    Measurement Count: 231/234   THD fake slack: 0.09487   Psoll: [0.02894068 0.00046048 0.         0.        ]    Laptime: 0.181
Episode: 26/100   Action: 1/11    Phase: 3/3    Measurement Count: 232/234   THD fake slack: 0.09488   Psoll: [0.02894068 0.00046048 0.         0.        ]    Laptime: 0.181
Episode: 26/100   Action: 1/11    Phase: 3/3    Measurement Count: 233/234   THD fake slack: 0.09489   Psoll: [0.02894068 0.00046048 0.         0.        ]    Laptime: 0.179
Traceback (most recent call last):
  File "C:/Users/Artur/Desktop/RL_framework/train.py", line 87, in <module>
    main()
  File "C:/Users/Artur/Desktop/RL_framework/train.py", line 77, in main
    duration = cf.training(episodes, env, agent, filename, topology=topology, multi_processing=multi_processing, CPUs_used=CPUs_used)
  File "C:\Users\Artur\Desktop\RL_framework\help_functions\custom_functions.py", line 166, in training
    save_interval = parallel_training(range(episodes), env, agent, log_data_qvalues, log_data, filename, CPUs_used)
  File "C:\Users\Artur\Desktop\RL_framework\help_functions\custom_functions.py", line 81, in parallel_training
    mp.Pool().map(func_part, range(env.steplength-1))
  File "C:\Users\Artur\Anaconda\lib\multiprocessing\pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Users\Artur\Anaconda\lib\multiprocessing\pool.py", line 657, in get
    raise self._value
  File "C:\Users\Artur\Anaconda\lib\multiprocessing\pool.py", line 431, in _handle_tasks
    put(task)
  File "C:\Users\Artur\Anaconda\lib\multiprocessing\connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "C:\Users\Artur\Anaconda\lib\multiprocessing\reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
MemoryError
插曲:26/100动作:1/11阶段:3/3测量计数:231/234 THD假松弛:0.09487 Psoll:[0.02894068 0.00046048 0.0.]圈速:0.181
情节:26/100动作:1/11阶段:3/3测量计数:232/234 THD假松弛:0.09488 Psoll:[0.02894068 0.00046048 0.0.]圈速:0.181
情节:26/100动作:1/11阶段:3/3测量计数:233/234 THD假松弛:0.09489 Psoll:[0.02894068 0.00046048 0.0.]圈速:0.179
回溯(最近一次呼叫最后一次):
文件“C:/Users/Artur/Desktop/RL_framework/train.py”,第87行,在
main()
文件“C:/Users/Artur/Desktop/RL_framework/train.py”,第77行,主目录
持续时间=cf.training(剧集、环境、代理、文件名、拓扑=拓扑、多线程处理=多线程处理、CPU使用=CPU使用)
文件“C:\Users\Artur\Desktop\RL\u framework\help\u functions\custom\u functions.py”,第166行,在培训中
保存间隔=并行训练(范围(集)、环境、代理、日志数据值、日志数据、文件名、使用的CPU)
文件“C:\Users\Artur\Desktop\RL\u framework\help\u functions\custom\u functions.py”,第81行,并行\u培训
mp.Pool().map(函数部分,范围(环境步长-1))
文件“C:\Users\Artur\Anaconda\lib\multiprocessing\pool.py”,第268行,在map中
返回self.\u map\u async(func、iterable、mapstar、chunksize).get()
get中第657行的文件“C:\Users\Artur\Anaconda\lib\multiprocessing\pool.py”
提升自我价值
文件“C:\Users\Artur\Anaconda\lib\multiprocessing\pool.py”,第431行,在\u handle\u tasks中
放置(任务)
文件“C:\Users\Artur\Anaconda\lib\multiprocessing\connection.py”,第206行,在send中
self.\u发送\u字节(\u ForkingPickler.dumps(obj))
文件“C:\Users\Artur\Anaconda\lib\multiprocessing\reduce.py”,第51行,转储
cls(buf,协议).dump(obj)
记忆者

有没有办法解决这个问题?

当您在循环中创建进程时,我相信您的内存会被阻塞,因为您创建的进程在完成运行后会挂起

警告:multiprocessing.pool对象的内部资源 需要通过使用 池作为上下文管理器或通过调用close()和terminate()进行 手动。如果不这样做,可能会导致流程挂起 最后定稿。请注意,依赖垃圾收集器是不正确的 作为CPython销毁池不能确保 将调用该池(有关详细信息,请参阅object.del())

我建议您尝试稍微重构代码:

# set the CPUs_used to a desired number or None to use all available CPUs
with mp.Pool(processes=CPUs_used) as p:
    p.map(func_part, range(env.steplength-1))

或者您可以手动
.close()
.join()
,以最适合您的编码风格的为准。

这是否回答了您的问题?