[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU and CPU memory utilization while running the peft_lora_clm_accelerate_ds_zero3_offload.py script #76

Closed
karthikmurugadoss opened this issue Feb 13, 2023 · 2 comments

Comments

@karthikmurugadoss
Copy link
karthikmurugadoss commented Feb 13, 2023

Hi! Thanks a lot for this fantastic package!

I was running the examples/causal_language_modeling/peft_lora_clm_accelerate_ds_zero3_offload.py script for the bloomz-7b1 model. As per the README, I was expecting ~18.1GB GPU memory and 35GB CPU memory, however from the logs generated (please see below; logs for the 15th epoch) the GPU memory consumption seems to be a lot more i.e. close to 32GB GPU memory while CPU memory is much lesser.

Edit: I think I missed some setup steps required for the deepspeed offloading since the is_ds_zero_3 variable in line 238 is always False. Please let me know! Thank you

Note: I'm running this on a Ubuntu 18.04 x86_64 machine with a single 40GB A100 GPU.

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:06<00:00,  1.11it/s]
GPU Memory before entering the train : 27026
GPU Memory consumed at the end of the train (end-begin): 242
GPU Peak Memory consumed during the train (max-begin): 5011
GPU Total Peak Memory consumed during the train (max): 32037
CPU Memory before entering the train : 4080
CPU Memory consumed at the end of the train (end-begin): 0
CPU Peak Memory consumed during the train (max-begin): 0
CPU Total Peak Memory consumed during the train (max): 4080
epoch=15: train_ppl=tensor(2.0908, device='cuda:0') train_epoch_loss=tensor(0.7375, device='cuda:0')
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:08<00:00,  1.26s/it]
GPU Memory before entering the eval : 27268
GPU Memory consumed at the end of the eval (end-begin): -242
GPU Peak Memory consumed during the eval (max-begin): 1465
GPU Total Peak Memory consumed during the eval (max): 28733
CPU Memory before entering the eval : 4080
CPU Memory consumed at the end of the eval (end-begin): 0
CPU Peak Memory consumed during the eval (max-begin): 0
CPU Total Peak Memory consumed during the eval (max): 4080
accuracy=84.0
eval_preds[:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'no complaint']
dataset['train'][label_column][:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint']
@pacman100
Copy link
Contributor
pacman100 commented Feb 13, 2023

Hello @karthikmurugadoss , for running the script using DeepSpeed ZeRO stage 3 with CPU offloading, please follow the steps mentioned here:
https://github.com/huggingface/peft#example-of-peft-model-training-using--accelerates-deepspeed-integration

@karthikmurugadoss
Copy link
Author

Thanks a lot @pacman100 - there was an issue in my DeepSpeed config that I didn't catch earlier

Thanks for the quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants