move fsdp handling to accelerate #23158

pacman100 · 2023-05-05T07:41:58Z

What does this PR do?

Moves PyTorch FSDP handling to Accelerate
Should be merged after accelerate DDP integrate #23151
No user-facing change. Now, users can use accelerate launch for fsdp in Trainer, e.g.:

accelerate launch --num_processes=2 --use_fsdp --mixed_precision=bf16 --fsdp_auto_wrap_policy=TRANSFORMER_BASED_WRAP  --fsdp_transformer_layer_cls_to_wrap="BertLayer" --fsdp_sharding_strategy=1 --fsdp_state_dict_type=FULL_STATE_DICT ./examples/pytorch/text-classification/run_glue.py   --model_name_or_path bert-base-cased   --task_name $TASK_NAME   --do_train   --do_eval   --max_seq_length 128   --per_device_train_batch_size 16   --learning_rate 5e-5   --num_train_epochs 3   --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir

Continue to use torchrun with trainer args as usual.

torchrun --nnodes 1 --nproc-per-node 2 ./examples/pytorch/text-classification/run_glue.py   --model_name_or_path bert-base-cased   --task_name $TASK_NAME   --do_train   --do_eval   --max_seq_length 128   --per_device_train_batch_size 16   --learning_rate 5e-5   --num_train_epochs 3   --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --fsdp "full_shard auto_wrap" --fsdp_transformer_layer_cls_to_wrap BertLayer --bf16

HuggingFaceDocBuilderDev · 2023-05-05T08:00:12Z

The documentation is not available anymore as the PR was closed or merged.

muellerzr

Thanks! A few questions I had below

src/transformers/trainer.py

src/transformers/training_args.py

sgugger

Should we deprecate the entire fsdp_config to tell users to use the Accelerate config instead? We wouldn't be able to actually remove it before a majot release, but we can remove the doc for instance.

pacman100 · 2023-05-08T10:15:25Z

Hello Sylvain, we can't do that as FSDP XLA integration uses it and that isn't supported yet in accelerate

* mixed precision support via accelerate * fix issues * fix for the sharded ddp case * fix flax and tf failing tests * `refactor the place to create `Accelerator` object * move ddp prep to accelerate * fix 😅 * resolving comments * move fsdp handling to accelerate * fixex * fix saving

pacman100 added 9 commits May 4, 2023 13:35

mixed precision support via accelerate

b3987a8

fix issues

862d04b

fix for the sharded ddp case

f2196be

fix flax and tf failing tests

2339a48

refactor the place to create Accelerator` object

263b134

move ddp prep to accelerate

a5bf517

fix 😅

f00ce09

resolving comments

254f9a4

move fsdp handling to accelerate

88e7350

pacman100 added 2 commits May 5, 2023 13:32

fixex

b37ad2a

fix saving

ec73bf2

pacman100 marked this pull request as ready for review May 5, 2023 12:06

pacman100 mentioned this pull request May 5, 2023

shift torch dynamo handling to accelerate #23168

Merged

pacman100 requested review from sgugger and muellerzr May 5, 2023 12:11

muellerzr approved these changes May 5, 2023

View reviewed changes

src/transformers/trainer.py Show resolved Hide resolved

src/transformers/training_args.py Show resolved Hide resolved

sgugger reviewed May 5, 2023

View reviewed changes

Merge branch 'main' into smangrul/accelerate-fsdp-integrate

1b53dda

pacman100 changed the base branch from smangrul/accelerate-ddp-integrate to main May 10, 2023 04:47

pacman100 changed the base branch from main to smangrul/accelerate-ddp-integrate May 10, 2023 04:47

Base automatically changed from smangrul/accelerate-ddp-integrate to main May 31, 2023 08:12

Merge branch 'main' into smangrul/accelerate-fsdp-integrate

a1980ea

pacman100 merged commit 0b77407 into main May 31, 2023

pacman100 deleted the smangrul/accelerate-fsdp-integrate branch May 31, 2023 08:40

pacman100 mentioned this pull request Jun 1, 2023

Add support for HYBRID_SHARD and _HYBRID_SHARD_ZERO2 in the trainer #23812

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

move fsdp handling to accelerate #23158

move fsdp handling to accelerate #23158

move fsdp handling to accelerate #23158

move fsdp handling to accelerate #23158

Conversation

What does this PR do?

Choose a reason for hiding this comment

Choose a reason for hiding this comment