Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test_data_parallelism.py		test_data_parallelism.py
test_model_parallelism.py		test_model_parallelism.py

Repository files navigation

PyTorch Distributed Training

Prerequisite

See requirements.txt

Files

test_data_parallelism.py
- Nearly identical to Accelerate's example but using a larger model and changing the default batch_size settings.
- Launch dual GPU training on single node with mixed precision:
```
python -m torch.distributed.run --nproc_per_node 2 --use_env test_data_parallelism.py --fp16=True
```
test_model_parallelism.py
- Data parallelism & model parallelism without relying on Accelerate library.
- Only supports single node dual GPU training without mixed precision.
- Launch dual GPU training on single node without mixed precision:
```
python test_model_parallelism.py 
```

License

Apache License 2.0

About

Tested distributed training

Apache-2.0 license

Report repository

Languages

Python 100.0%