-
Notifications
You must be signed in to change notification settings - Fork 23
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Hengyuan Hu
committed
Dec 26, 2017
1 parent
7f524ac
commit 51d639e
Showing
1 changed file
with
52 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
###Pytorch Implementation of Rainbow | ||
|
||
This repo is a partial implementation of the [Rainbow](https://arxiv.org/pdf/1710.02298.pdf) | ||
agent published by researchers from DeepMind. | ||
The implementation is efficient and of high quality. It trains at the speed of | ||
350 frames/s on my personal PC with a 3.5GHz CPU and GTX1080 GPU. | ||
|
||
Rainbow is a deep Q learning based agent that combines a bunch of existing techiques | ||
such as dueling dqn, distributional dqn, etc. This repo currenly implemented the | ||
following dqn variants: | ||
* [DQN]() (often named Nature DQN) | ||
* [Double DQN](https://arxiv.org/abs/1509.06461) | ||
* [Dueling DQN](https://arxiv.org/abs/1511.06581) | ||
* [Distributional DQN](https://arxiv.org/pdf/1707.06887.pdf) | ||
* [Noisy Net](https://arxiv.org/abs/1706.10295) for exploration | ||
|
||
and it will need the following extensions to become a full "Rainbow": | ||
* Multi-step learning | ||
* Priority Replay | ||
|
||
##Hyperparameters | ||
The hyperparameters in this repo follows the ones described in | ||
[Rainbow](https://arxiv.org/pdf/1710.02298.pdf) | ||
paper as close as possible. However, there may still be some differences due to | ||
misunderstanding. | ||
|
||
##Performance | ||
DQN agent often takes days to train. For sanity check, we can | ||
train a agent to play a simple game "boxing". Follwing is the learning curve | ||
of a dueling double dqn trained on boxing. | ||
![](figs/boxing.png) | ||
The agent almost solves boxing after around 12M frames, which is a good sign | ||
that the implementation is working. | ||
|
||
To test the distributional DQN and Noisy Net, the agent is trained on "breakout" since | ||
distributional DQN performs significantly better than others on this game, | ||
reaching >400 scores rapidly while other DQN methods struggle to do so. | ||
![](figs/breakout.png) | ||
From the figure we see that the agent can reach >400 scores very rapidly and steadily. | ||
Note that the publicly reported numbers on papers are produced by training the agent for | ||
200M frames while here it trains only for 50M frames due to computation cost. | ||
|
||
Figures here are smoothed. | ||
|
||
##Future Works | ||
We plan to implement multi-step learing and priority replay. Also, the current | ||
implementation uses a simple wrapper on the [Arcade Learning Enviroment](https://github.com/mgbellemare/Arcade-Learning-Environment). | ||
We may want to shift to OpenAI gym for better visualization and video recording. | ||
On top of Rainbow, it will also be interesting to include other new techniques, | ||
such as [Distributional RL with Quantile Regression](https://arxiv.org/pdf/1710.10044.pdf) | ||
|
||
Contributions and bug-catchings are welcome! |