Python multiprocessing.Pool.map抛出MemoryError
我正在重写一个强化学习框架,从串行代码执行到并行(多处理),以减少训练时间。它可以工作,但经过几个小时的训练后,会抛出一个Python multiprocessing.Pool.map抛出MemoryError,python,memory,multiprocessing,python-multiprocessing,reinforcement-learning,Python,Memory,Multiprocessing,Python Multiprocessing,Reinforcement Learning,我正在重写一个强化学习框架,从串行代码执行到并行(多处理),以减少训练时间。它可以工作,但经过几个小时的训练后,会抛出一个MemoryError。我尝试在每个循环之后添加gc.collect,没有任何更改 下面是利用多处理的for循环: for episode in episodes: env.episode = episode flex_list = [0,1,2]
MemoryError
。我尝试在每个循环之后添加gc.collect
,没有任何更改
下面是利用多处理的for循环:
for episode in episodes:
env.episode = episode
flex_list = [0,1,2]
for machine in env.list_of_machines:
flex_plan = []
for time_step in range(0,env.steplength):
flex_plan.append(random.choice(flex_list))
machine.flex_plan = flex_plan
env.current_step = 0
steps = []
state = env.reset(restricted=True)
steps.append(state)
# multiprocessing part, has condition to use a specific amount of CPUs or 'all' of them
####################################################
func_part = partial(parallel_pool, episode=episode, episodes=episodes, env=env, agent=agent, state=state, log_data_qvalues=log_data_qvalues, log_data=log_data, steps=steps)
if CPUs_used == 'all':
mp.Pool().map(func_part, range(env.steplength-1))
else:
mp.Pool(CPUs_used).map(func_part, range(env.steplength-1))
############################################################
# model is saved periodically, not only in the end
save_interval = 100 #set episode interval to save models
if (episode + 1) % save_interval == 0:
agent.save_model(f'models/model_{filename}_{episode + 1}')
print(f'model saved at episode {episode + 1}')
plt.close()
gc.collect()
26集训练后的输出:
Episode: 26/100 Action: 1/11 Phase: 3/3 Measurement Count: 231/234 THD fake slack: 0.09487 Psoll: [0.02894068 0.00046048 0. 0. ] Laptime: 0.181
Episode: 26/100 Action: 1/11 Phase: 3/3 Measurement Count: 232/234 THD fake slack: 0.09488 Psoll: [0.02894068 0.00046048 0. 0. ] Laptime: 0.181
Episode: 26/100 Action: 1/11 Phase: 3/3 Measurement Count: 233/234 THD fake slack: 0.09489 Psoll: [0.02894068 0.00046048 0. 0. ] Laptime: 0.179
Traceback (most recent call last):
File "C:/Users/Artur/Desktop/RL_framework/train.py", line 87, in <module>
main()
File "C:/Users/Artur/Desktop/RL_framework/train.py", line 77, in main
duration = cf.training(episodes, env, agent, filename, topology=topology, multi_processing=multi_processing, CPUs_used=CPUs_used)
File "C:\Users\Artur\Desktop\RL_framework\help_functions\custom_functions.py", line 166, in training
save_interval = parallel_training(range(episodes), env, agent, log_data_qvalues, log_data, filename, CPUs_used)
File "C:\Users\Artur\Desktop\RL_framework\help_functions\custom_functions.py", line 81, in parallel_training
mp.Pool().map(func_part, range(env.steplength-1))
File "C:\Users\Artur\Anaconda\lib\multiprocessing\pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\Artur\Anaconda\lib\multiprocessing\pool.py", line 657, in get
raise self._value
File "C:\Users\Artur\Anaconda\lib\multiprocessing\pool.py", line 431, in _handle_tasks
put(task)
File "C:\Users\Artur\Anaconda\lib\multiprocessing\connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "C:\Users\Artur\Anaconda\lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
MemoryError
插曲:26/100动作:1/11阶段:3/3测量计数:231/234 THD假松弛:0.09487 Psoll:[0.02894068 0.00046048 0.0.]圈速:0.181
情节:26/100动作:1/11阶段:3/3测量计数:232/234 THD假松弛:0.09488 Psoll:[0.02894068 0.00046048 0.0.]圈速:0.181
情节:26/100动作:1/11阶段:3/3测量计数:233/234 THD假松弛:0.09489 Psoll:[0.02894068 0.00046048 0.0.]圈速:0.179
回溯(最近一次呼叫最后一次):
文件“C:/Users/Artur/Desktop/RL_framework/train.py”,第87行,在
main()
文件“C:/Users/Artur/Desktop/RL_framework/train.py”,第77行,主目录
持续时间=cf.training(剧集、环境、代理、文件名、拓扑=拓扑、多线程处理=多线程处理、CPU使用=CPU使用)
文件“C:\Users\Artur\Desktop\RL\u framework\help\u functions\custom\u functions.py”,第166行,在培训中
保存间隔=并行训练(范围(集)、环境、代理、日志数据值、日志数据、文件名、使用的CPU)
文件“C:\Users\Artur\Desktop\RL\u framework\help\u functions\custom\u functions.py”,第81行,并行\u培训
mp.Pool().map(函数部分,范围(环境步长-1))
文件“C:\Users\Artur\Anaconda\lib\multiprocessing\pool.py”,第268行,在map中
返回self.\u map\u async(func、iterable、mapstar、chunksize).get()
get中第657行的文件“C:\Users\Artur\Anaconda\lib\multiprocessing\pool.py”
提升自我价值
文件“C:\Users\Artur\Anaconda\lib\multiprocessing\pool.py”,第431行,在\u handle\u tasks中
放置(任务)
文件“C:\Users\Artur\Anaconda\lib\multiprocessing\connection.py”,第206行,在send中
self.\u发送\u字节(\u ForkingPickler.dumps(obj))
文件“C:\Users\Artur\Anaconda\lib\multiprocessing\reduce.py”,第51行,转储
cls(buf,协议).dump(obj)
记忆者
有没有办法解决这个问题?当您在循环中创建进程时,我相信您的内存会被阻塞,因为您创建的进程在完成运行后会挂起 从 警告:multiprocessing.pool对象的内部资源 需要通过使用 池作为上下文管理器或通过调用close()和terminate()进行 手动。如果不这样做,可能会导致流程挂起 最后定稿。请注意,依赖垃圾收集器是不正确的 作为CPython销毁池不能确保 将调用该池(有关详细信息,请参阅object.del()) 我建议您尝试稍微重构代码:
# set the CPUs_used to a desired number or None to use all available CPUs
with mp.Pool(processes=CPUs_used) as p:
p.map(func_part, range(env.steplength-1))
或者您可以手动
.close()
和.join()
,以最适合您的编码风格的为准。这是否回答了您的问题?