Python 如何提高DQN的性能?
我创建了一个深度Q网络来玩snake。代码运行良好,除了性能在培训周期内没有真正提高。最后,它与执行随机操作的代理几乎没有区别。以下是培训代码:Python 如何提高DQN的性能?,python,machine-learning,reinforcement-learning,Python,Machine Learning,Reinforcement Learning,我创建了一个深度Q网络来玩snake。代码运行良好,除了性能在培训周期内没有真正提高。最后,它与执行随机操作的代理几乎没有区别。以下是培训代码: def train(self): self.build_model() for episode in range(self.max_episodes): self.current_episode = episode env = SnakeEnv(self.screen)
def train(self):
self.build_model()
for episode in range(self.max_episodes):
self.current_episode = episode
env = SnakeEnv(self.screen)
episode_reward = 0
for timestep in range(self.max_steps):
env.render(self.screen)
state = env.get_state()
action = None
epsilon = self.current_eps
if epsilon > random.random():
action = np.random.choice(env.action_space) #explore
else:
values = self.policy_model.predict(env.get_state()) #exploit
action = np.argmax(values)
experience = env.step(action)
if(experience['done'] == True):
episode_reward += 5 * (len(env.snake.List) - 1)
episode_reward += experience['reward']
break
episode_reward += experience['reward']
if(len(self.memory) < self.memory_size):
self.memory.append(Experience(experience['state'], experience['action'], experience['reward'], experience['next_state']))
else:
self.memory[self.push_count % self.memory_size] = Experience(experience['state'], experience['action'], experience['reward'], experience['next_state'])
self.push_count += 1
self.decay_epsilon(episode)
if self.can_sample_memory():
memory_sample = self.sample_memory()
#q_pred = np.zeros((self.batch_size, 1))
#q_target = np.zeros((self.batch_size, 1))
#i = 0
for memory in memory_sample:
memstate = memory.state
action = memory.action
next_state = memory.next_state
reward = memory.reward
max_q = reward + self.discount_rate * self.replay_model.predict(next_state)
#q_pred[i] = q_value
#q_target[i] = max_q
#i += 1
self.policy_model.fit(memstate, max_q, epochs=1, verbose=0)
print("Episode: ", episode, " Total Reward: ", episode_reward)
if episode % self.target_update == 0:
self.replay_model.set_weights(self.policy_model.get_weights())
self.policy_model.save_weights('weights.hdf5')
pygame.quit()
以下是网络体系结构:
model = models.Sequential()
model.add(Dense(500, activation = 'relu', kernel_initializer = 'random_uniform', bias_initializer = 'zeros', input_dim = 400))
model.add(Dense(500, activation = 'relu', kernel_initializer = 'random_uniform', bias_initializer = 'zeros'))
model.add(Dense(5, activation = 'tanh', kernel_initializer = 'random_uniform', bias_initializer = 'zeros')) #tanh for last layer because q value can be > 1
model.compile(loss='mean_squared_error', optimizer = 'adam')
作为参考,由于蛇可以移动的4个方向,网络输出5个值,如果不采取行动,则额外输出1个值。此外,我没有像传统的DQN那样成为游戏的屏幕截图,而是传入一个400维向量,作为游戏发生的20 x 20网格的表示。代理人靠近食物或吃了食物会得到1的奖励,如果它死了会得到-1的奖励。我怎样才能提高成绩?我认为主要问题是你的学习率高。尝试使用低于0.001的值。Atari DQN使用0.00025 同时将traget_update设置为高于10。例如,500和更多 要查看某些内容,步骤数至少应为10000 将批处理大小降低到32或64 您是否考虑过实施其他一些改进?像佩尔,决斗DQN? 看看这个: <>也许你不想重新实现一个轮子,考虑< /P> 最后,您可以检查类似的项目:
model = models.Sequential()
model.add(Dense(500, activation = 'relu', kernel_initializer = 'random_uniform', bias_initializer = 'zeros', input_dim = 400))
model.add(Dense(500, activation = 'relu', kernel_initializer = 'random_uniform', bias_initializer = 'zeros'))
model.add(Dense(5, activation = 'tanh', kernel_initializer = 'random_uniform', bias_initializer = 'zeros')) #tanh for last layer because q value can be > 1
model.compile(loss='mean_squared_error', optimizer = 'adam')