Python 具有少量GPU的多个cpu生产商不使用100%的GPU(Pytork)
我尝试使用多个CPU并行地实现棋盘游戏自玩数据生成,以同时进行自玩。对于父进程,我为30个CPU创建了4个NN模型(1个模型用于10个CPU,1个模型用于训练),每个模型位于不同的GPU中(该模型使用batchnorm实现为20块类似resnet的体系结构),伪代码如下Python 具有少量GPU的多个cpu生产商不使用100%的GPU(Pytork),python,parallel-processing,deep-learning,pytorch,reinforcement-learning,Python,Parallel Processing,Deep Learning,Pytorch,Reinforcement Learning,我尝试使用多个CPU并行地实现棋盘游戏自玩数据生成,以同时进行自玩。对于父进程,我为30个CPU创建了4个NN模型(1个模型用于10个CPU,1个模型用于训练),每个模型位于不同的GPU中(该模型使用batchnorm实现为20块类似resnet的体系结构),伪代码如下 nnet = NN(gpu_num=0) nnet1 = NN(gpu_num=1) nnet2 = NN(gpu_num=2) nnet3 = NN(gpu_num=3) for i in range(num_iterati
nnet = NN(gpu_num=0)
nnet1 = NN(gpu_num=1)
nnet2 = NN(gpu_num=2)
nnet3 = NN(gpu_num=3)
for i in range(num_iteration):
nnet1.load_state_dict(nnet.state_dict())
nnet2.load_state_dict(nnet.state_dict())
nnet3.load_state_dict(nnet.state_dict())
samples = parallel_self_play()
nnet.train(samples)
pool = mp.Pool(processes=num_cpu) #30
for i in range(self.args.numEps):
results = []
if i % 3 == 0:
net = self.nnet1
elif i % 3 == 1:
net = self.nnet2
else:
net = self.nnet3
results.append(pool.apply_async(AsyncSelfPlay, args=(net))
# get results from results array then return it
return results
parallel_self_play()的实现如下
nnet = NN(gpu_num=0)
nnet1 = NN(gpu_num=1)
nnet2 = NN(gpu_num=2)
nnet3 = NN(gpu_num=3)
for i in range(num_iteration):
nnet1.load_state_dict(nnet.state_dict())
nnet2.load_state_dict(nnet.state_dict())
nnet3.load_state_dict(nnet.state_dict())
samples = parallel_self_play()
nnet.train(samples)
pool = mp.Pool(processes=num_cpu) #30
for i in range(self.args.numEps):
results = []
if i % 3 == 0:
net = self.nnet1
elif i % 3 == 1:
net = self.nnet2
else:
net = self.nnet3
results.append(pool.apply_async(AsyncSelfPlay, args=(net))
# get results from results array then return it
return results
我的代码在第一次自玩时gpu利用率几乎为100%(每次迭代不到10分钟),但在第一次迭代(训练)后,当我将新权重加载到nnet1-3时,gpu利用率再也不会达到80%(每次迭代约30分钟-1小时)。我在处理代码时注意到一些事情