[go: nahoru, domu]

Skip to content

Commit

Permalink
Add readme.
Browse files Browse the repository at this point in the history
  • Loading branch information
Hengyuan Hu committed Dec 26, 2017
1 parent 7f524ac commit 51d639e
Showing 1 changed file with 52 additions and 0 deletions.
52 changes: 52 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
###Pytorch Implementation of Rainbow

This repo is a partial implementation of the [Rainbow](https://arxiv.org/pdf/1710.02298.pdf)
agent published by researchers from DeepMind.
The implementation is efficient and of high quality. It trains at the speed of
350 frames/s on my personal PC with a 3.5GHz CPU and GTX1080 GPU.

Rainbow is a deep Q learning based agent that combines a bunch of existing techiques
such as dueling dqn, distributional dqn, etc. This repo currenly implemented the
following dqn variants:
* [DQN]() (often named Nature DQN)
* [Double DQN](https://arxiv.org/abs/1509.06461)
* [Dueling DQN](https://arxiv.org/abs/1511.06581)
* [Distributional DQN](https://arxiv.org/pdf/1707.06887.pdf)
* [Noisy Net](https://arxiv.org/abs/1706.10295) for exploration

and it will need the following extensions to become a full "Rainbow":
* Multi-step learning
* Priority Replay

##Hyperparameters
The hyperparameters in this repo follows the ones described in
[Rainbow](https://arxiv.org/pdf/1710.02298.pdf)
paper as close as possible. However, there may still be some differences due to
misunderstanding.

##Performance
DQN agent often takes days to train. For sanity check, we can
train a agent to play a simple game "boxing". Follwing is the learning curve
of a dueling double dqn trained on boxing.
![](figs/boxing.png)
The agent almost solves boxing after around 12M frames, which is a good sign
that the implementation is working.

To test the distributional DQN and Noisy Net, the agent is trained on "breakout" since
distributional DQN performs significantly better than others on this game,
reaching >400 scores rapidly while other DQN methods struggle to do so.
![](figs/breakout.png)
From the figure we see that the agent can reach >400 scores very rapidly and steadily.
Note that the publicly reported numbers on papers are produced by training the agent for
200M frames while here it trains only for 50M frames due to computation cost.

Figures here are smoothed.

##Future Works
We plan to implement multi-step learing and priority replay. Also, the current
implementation uses a simple wrapper on the [Arcade Learning Enviroment](https://github.com/mgbellemare/Arcade-Learning-Environment).
We may want to shift to OpenAI gym for better visualization and video recording.
On top of Rainbow, it will also be interesting to include other new techniques,
such as [Distributional RL with Quantile Regression](https://arxiv.org/pdf/1710.10044.pdf)

Contributions and bug-catchings are welcome!

0 comments on commit 51d639e

Please sign in to comment.