Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

This repository contains an implementation of "Importance Weighted Actor-Learner Architectures", along with a dynamic batching module. This is not an officially supported Google product.

For a detailed description of the architecture please read our paper. Please cite the paper if you use the code from this repository in your work.

Bibtex

@inproceedings{impala2018,
  title={IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures},
  author={Espeholt, Lasse and Soyer, Hubert and Munos, Remi and Simonyan, Karen and Mnih, Volodymir and Ward, Tom and Doron, Yotam and Firoiu, Vlad and Harley, Tim and Dunning, Iain and others},
  booktitle={Proceedings of the International Conference on Machine Learning (ICML)},
  year={2018}
}

Setup instructions

Build conda environment

# make sure you are under this folder
conda env create -f environment.yml 
conda activate scalable_agent

Download deepmind_lab-1.0-py3-none-any.whl from https://drive.google.com/file/d/10SjkK7AcegkWBgx31-fbKTNRgNLd0wax/view?usp=sharing

pip install deepmind_lab-1.0-py3-none-any.whl

Running the Code

Single Machine Training on a Single Level

We support three environments: Atari, Vizdoom, DMLab-30.

# Atari (BreakoutNoFrameskip-v4)
python -m experiment --level_name=atari_breakout --num_actors=4 --logdir=logdir/atari_4_1

# Vizdoom (doom_battle)
python -m experiment --level_name=doom_benchmark --num_actors=4 --logdir=logdir/doom_4_1

# DMLab (rooms_collect_good_objects_train)
python -m experiment --level_name=rooms_collect_good_objects_train --num_actors=4 --logdir=logdir/dmlab_4_1

Distributed Training on DMLab-30

Training on the full DMLab-30. Across 10 runs with different seeds but identical hyperparameters, we observed between 45 and 50 capped human normalized training score with different seeds (--seed=[seed]). Test scores are usually an absolute of ~2% lower.

Learner

python experiment.py --job_name=learner --task=0 --num_actors=150 \
    --level_name=dmlab30 --batch_size=32 --entropy_cost=0.0033391318945337044 \
    --learning_rate=0.00031866995608948655 \
    --total_environment_frames=10000000000 --reward_clipping=soft_asymmetric

Actor(s)

for i in $(seq 0 149); do
  python experiment.py --job_name=actor --task=$i \
      --num_actors=150 --level_name=dmlab30 --dataset_path=[...] &
done;
wait

Test Score

python experiment.py --mode=test --level_name=dmlab30 --dataset_path=[...] \
    --test_num_episodes=10

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
algorithms		algorithms
envs		envs
utils		utils
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
batcher.cc		batcher.cc
dmlab30.py		dmlab30.py
dynamic_batching.py		dynamic_batching.py
dynamic_batching_test.py		dynamic_batching_test.py
environment.yml		environment.yml
environments.py		environments.py
environments_atari.py		environments_atari.py
environments_doom.py		environments_doom.py
experiment.py		experiment.py
py_process.py		py_process.py
py_process_test.py		py_process_test.py
vtrace.py		vtrace.py
vtrace_test.py		vtrace_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Bibtex

Setup instructions

Running the Code

Single Machine Training on a Single Level

Distributed Training on DMLab-30

Learner

Actor(s)

Test Score

About

Releases

Packages

Languages

License

alex-petrenko/scalable_agent

Folders and files

Latest commit

History

Repository files navigation

Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Bibtex

Setup instructions

Running the Code

Single Machine Training on a Single Level

Distributed Training on DMLab-30

Learner

Actor(s)

Test Score

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages