[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move fsdp handling to accelerate #23158

Merged
merged 13 commits into from
May 31, 2023
Merged

Conversation

pacman100
Copy link
Contributor
@pacman100 pacman100 commented May 5, 2023

What does this PR do?

  1. Moves PyTorch FSDP handling to Accelerate
  2. Should be merged after accelerate DDP integrate #23151
  3. No user-facing change. Now, users can use accelerate launch for fsdp in Trainer, e.g.:
accelerate launch --num_processes=2 --use_fsdp --mixed_precision=bf16 --fsdp_auto_wrap_policy=TRANSFORMER_BASED_WRAP  --fsdp_transformer_layer_cls_to_wrap="BertLayer" --fsdp_sharding_strategy=1 --fsdp_state_dict_type=FULL_STATE_DICT ./examples/pytorch/text-classification/run_glue.py   --model_name_or_path bert-base-cased   --task_name $TASK_NAME   --do_train   --do_eval   --max_seq_length 128   --per_device_train_batch_size 16   --learning_rate 5e-5   --num_train_epochs 3   --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir

Continue to use torchrun with trainer args as usual.

torchrun --nnodes 1 --nproc-per-node 2 ./examples/pytorch/text-classification/run_glue.py   --model_name_or_path bert-base-cased   --task_name $TASK_NAME   --do_train   --do_eval   --max_seq_length 128   --per_device_train_batch_size 16   --learning_rate 5e-5   --num_train_epochs 3   --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --fsdp "full_shard auto_wrap" --fsdp_transformer_layer_cls_to_wrap BertLayer --bf16

@HuggingFaceDocBuilderDev
Copy link
HuggingFaceDocBuilderDev commented May 5, 2023

The documentation is not available anymore as the PR was closed or merged.

@pacman100 pacman100 marked this pull request as ready for review May 5, 2023 12:06
Copy link
Contributor
@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A few questions I had below

src/transformers/trainer.py Show resolved Hide resolved
src/transformers/training_args.py Show resolved Hide resolved
Copy link
Collaborator
@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we deprecate the entire fsdp_config to tell users to use the Accelerate config instead? We wouldn't be able to actually remove it before a majot release, but we can remove the doc for instance.

@pacman100
Copy link
Contributor Author

Hello Sylvain, we can't do that as FSDP XLA integration uses it and that isn't supported yet in accelerate

@pacman100 pacman100 changed the base branch from smangrul/accelerate-ddp-integrate to main May 10, 2023 04:47
@pacman100 pacman100 changed the base branch from main to smangrul/accelerate-ddp-integrate May 10, 2023 04:47
Base automatically changed from smangrul/accelerate-ddp-integrate to main May 31, 2023 08:12
@pacman100 pacman100 merged commit 0b77407 into main May 31, 2023
@pacman100 pacman100 deleted the smangrul/accelerate-fsdp-integrate branch May 31, 2023 08:40
sheonhan pushed a commit to sheonhan/transformers that referenced this pull request Jun 1, 2023
* mixed precision support via accelerate

* fix issues

* fix for the sharded ddp case

* fix flax and tf failing tests

* `refactor the place to create `Accelerator` object

* move ddp prep to accelerate

* fix 😅

* resolving comments

* move fsdp handling to accelerate

* fixex

* fix saving
gojiteji pushed a commit to gojiteji/transformers that referenced this pull request Jun 5, 2023
* mixed precision support via accelerate

* fix issues

* fix for the sharded ddp case

* fix flax and tf failing tests

* `refactor the place to create `Accelerator` object

* move ddp prep to accelerate

* fix 😅

* resolving comments

* move fsdp handling to accelerate

* fixex

* fix saving
novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023
* mixed precision support via accelerate

* fix issues

* fix for the sharded ddp case

* fix flax and tf failing tests

* `refactor the place to create `Accelerator` object

* move ddp prep to accelerate

* fix 😅

* resolving comments

* move fsdp handling to accelerate

* fixex

* fix saving
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants