Python 为什么我的DQN';在突破点v0的特工不知道?

Python 为什么我的DQN';在突破点v0的特工不知道?,python,chainer,Python,Chainer,我使用chainerRL并尝试突破v0 我运行这个代码。 它确实有效,但我的经纪人无法获得奖励(奖励总是低于5分) python 2.7 ubuntu 14.04 请教我为什么我不能 我也不明白为什么这里的数字是972>l5=L.Linear(972,512) 作为ChainerRL的作者,如果你想处理Atari环境,我建议你从examples/ale/train.*.py开始,一步一步地定制它。深度强化学习对超参数和网络架构的变化非常敏感,如果你一次引入很多变化,很难判断哪种变化是导致培训失败

我使用chainerRL并尝试突破v0

我运行这个代码。 它确实有效,但我的经纪人无法获得奖励(奖励总是低于5分)

python 2.7 ubuntu 14.04

请教我为什么我不能

我也不明白为什么这里的数字是972>l5=L.Linear(972,512)


作为ChainerRL的作者,如果你想处理Atari环境,我建议你从
examples/ale/train.*.py
开始,一步一步地定制它。深度强化学习对超参数和网络架构的变化非常敏感,如果你一次引入很多变化,很难判断哪种变化是导致培训失败的原因

我尝试在通过代理打印统计数据时运行您的脚本。get_statistics()发现Q值过高,这表明培训不顺利

$ python yourscript.py
[2017-07-10 18:14:45,309] Making new env: Breakout-v0
observation space   : Box(210, 160, 3)
action space        : Discrete(6)
episode: 1 reward: 0
[('average_q', 0.0), ('average_loss', 0.0)]
episode: 2 reward: 1.0
[('average_q', 0.0), ('average_loss', 0.0)]
episode: 3 reward: 0
[('average_q', 0.0), ('average_loss', 0.0)]
episode: 4 reward: 0
[('average_q', 0.0), ('average_loss', 0.0)]
episode: 5 reward: 2.0
[('average_q', 0.0), ('average_loss', 0.0)]
episode: 6 reward: 0
[('average_q', 0.0), ('average_loss', 0.0)]
episode: 7 reward: 1.0
[('average_q', 0.0), ('average_loss', 0.0)]
episode: 8 reward: 2.0
[('average_q', 0.0), ('average_loss', 0.0)]
episode: 9 reward: 1.0
[('average_q', 0.0), ('average_loss', 0.0)]
episode: 10 reward: 2.0
[('average_q', 0.05082079044988309), ('average_loss', 0.0028927958279822935)]
episode: 11 reward: 4.0
[('average_q', 7.09331367665307), ('average_loss', 0.0706595716528489)]
episode: 12 reward: 0
[('average_q', 17.418094266218915), ('average_loss', 0.251431955409951)]
episode: 13 reward: 1.0
[('average_q', 40.903169833428954), ('average_loss', 1.0959175910071859)]
episode: 14 reward: 2.0
[('average_q', 115.25579476118122), ('average_loss', 2.513677824600575)]
episode: 15 reward: 2.0
[('average_q', 258.7392539556941), ('average_loss', 6.20968827451279)]
episode: 16 reward: 1.0
[('average_q', 569.6735852049942), ('average_loss', 19.295426012437833)]
episode: 17 reward: 4.0
[('average_q', 1403.8461185742353), ('average_loss', 32.6092646561004)]
episode: 18 reward: 1.0
[('average_q', 2138.438909199657), ('average_loss', 44.90832410172697)]
episode: 19 reward: 1.0
[('average_q', 3112.752923036582), ('average_loss', 88.50687458947431)]
episode: 20 reward: 1.0
[('average_q', 4138.601621651058), ('average_loss', 106.09160137599618)]

作为ChainerRL的作者,如果你想处理Atari环境,我建议你从
examples/ale/train.*.py
开始,一步一步地定制它。深度强化学习对超参数和网络架构的变化非常敏感,如果你一次引入很多变化,很难判断哪种变化是导致培训失败的原因

我尝试在通过代理打印统计数据时运行您的脚本。get_statistics()发现Q值过高,这表明培训不顺利

$ python yourscript.py
[2017-07-10 18:14:45,309] Making new env: Breakout-v0
observation space   : Box(210, 160, 3)
action space        : Discrete(6)
episode: 1 reward: 0
[('average_q', 0.0), ('average_loss', 0.0)]
episode: 2 reward: 1.0
[('average_q', 0.0), ('average_loss', 0.0)]
episode: 3 reward: 0
[('average_q', 0.0), ('average_loss', 0.0)]
episode: 4 reward: 0
[('average_q', 0.0), ('average_loss', 0.0)]
episode: 5 reward: 2.0
[('average_q', 0.0), ('average_loss', 0.0)]
episode: 6 reward: 0
[('average_q', 0.0), ('average_loss', 0.0)]
episode: 7 reward: 1.0
[('average_q', 0.0), ('average_loss', 0.0)]
episode: 8 reward: 2.0
[('average_q', 0.0), ('average_loss', 0.0)]
episode: 9 reward: 1.0
[('average_q', 0.0), ('average_loss', 0.0)]
episode: 10 reward: 2.0
[('average_q', 0.05082079044988309), ('average_loss', 0.0028927958279822935)]
episode: 11 reward: 4.0
[('average_q', 7.09331367665307), ('average_loss', 0.0706595716528489)]
episode: 12 reward: 0
[('average_q', 17.418094266218915), ('average_loss', 0.251431955409951)]
episode: 13 reward: 1.0
[('average_q', 40.903169833428954), ('average_loss', 1.0959175910071859)]
episode: 14 reward: 2.0
[('average_q', 115.25579476118122), ('average_loss', 2.513677824600575)]
episode: 15 reward: 2.0
[('average_q', 258.7392539556941), ('average_loss', 6.20968827451279)]
episode: 16 reward: 1.0
[('average_q', 569.6735852049942), ('average_loss', 19.295426012437833)]
episode: 17 reward: 4.0
[('average_q', 1403.8461185742353), ('average_loss', 32.6092646561004)]
episode: 18 reward: 1.0
[('average_q', 2138.438909199657), ('average_loss', 44.90832410172697)]
episode: 19 reward: 1.0
[('average_q', 3112.752923036582), ('average_loss', 88.50687458947431)]
episode: 20 reward: 1.0
[('average_q', 4138.601621651058), ('average_loss', 106.09160137599618)]

请告诉我如何使用你的例子。我已经安装了ALEi在from chainerrl.envs import ale File“/usr/local/lib/python2.7/dist-packages/chainerrl/envs/ale.py”第17行运行的“python train_dqn_ale.py rom[breakout.bin]”输出是“Traceback(最近一次调用):文件“train_dqn_ale.py”“,第13行,从ale_python_接口导入ale接口导入错误:没有名为ale_python_接口的模块您需要安装ale及其python接口。有关安装说明,请参阅。谢谢。我装了啤酒。然后我运行了“python train\u dqn\u ale.py rom[breakout]”。请告诉我如何使用您的示例。我已经安装了ALEi在from chainerrl.envs import ale File“/usr/local/lib/python2.7/dist-packages/chainerrl/envs/ale.py”第17行运行的“python train_dqn_ale.py rom[breakout.bin]”输出是“Traceback(最近一次调用):文件“train_dqn_ale.py”“,第13行,从ale_python_接口导入ale接口导入错误:没有名为ale_python_接口的模块您需要安装ale及其python接口。有关安装说明,请参阅。谢谢。我装了啤酒。然后我运行了“python train\u dqn\u ale.py rom[breakout]”。