[go: nahoru, domu]

Skip to content

A TensorFlow implementation of Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures.

License

Notifications You must be signed in to change notification settings

alex-petrenko/scalable_agent

 
 

Repository files navigation

Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

This repository contains an implementation of "Importance Weighted Actor-Learner Architectures", along with a dynamic batching module. This is not an officially supported Google product.

For a detailed description of the architecture please read our paper. Please cite the paper if you use the code from this repository in your work.

Bibtex

@inproceedings{impala2018,
  title={IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures},
  author={Espeholt, Lasse and Soyer, Hubert and Munos, Remi and Simonyan, Karen and Mnih, Volodymir and Ward, Tom and Doron, Yotam and Firoiu, Vlad and Harley, Tim and Dunning, Iain and others},
  booktitle={Proceedings of the International Conference on Machine Learning (ICML)},
  year={2018}
}

Setup instructions

  • Build conda environment
# make sure you are under this folder
conda env create -f environment.yml 
conda activate scalable_agent
pip install deepmind_lab-1.0-py3-none-any.whl

Running the Code

Single Machine Training on a Single Level

We support three environments: Atari, Vizdoom, DMLab-30.

# Atari (BreakoutNoFrameskip-v4)
python -m experiment --level_name=atari_breakout --num_actors=4 --logdir=logdir/atari_4_1
# Vizdoom (doom_battle)
python -m experiment --level_name=doom_benchmark --num_actors=4 --logdir=logdir/doom_4_1
# DMLab (rooms_collect_good_objects_train)
python -m experiment --level_name=rooms_collect_good_objects_train --num_actors=4 --logdir=logdir/dmlab_4_1

Distributed Training on DMLab-30

Training on the full DMLab-30. Across 10 runs with different seeds but identical hyperparameters, we observed between 45 and 50 capped human normalized training score with different seeds (--seed=[seed]). Test scores are usually an absolute of ~2% lower.

Learner

python experiment.py --job_name=learner --task=0 --num_actors=150 \
    --level_name=dmlab30 --batch_size=32 --entropy_cost=0.0033391318945337044 \
    --learning_rate=0.00031866995608948655 \
    --total_environment_frames=10000000000 --reward_clipping=soft_asymmetric

Actor(s)

for i in $(seq 0 149); do
  python experiment.py --job_name=actor --task=$i \
      --num_actors=150 --level_name=dmlab30 --dataset_path=[...] &
done;
wait

Test Score

python experiment.py --mode=test --level_name=dmlab30 --dataset_path=[...] \
    --test_num_episodes=10

About

A TensorFlow implementation of Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 94.3%
  • C++ 4.9%
  • Dockerfile 0.8%