GPU and CPU memory utilization while running the peft_lora_clm_accelerate_ds_zero3_offload.py script #76

karthikmurugadoss · 2023-02-13T00:11:20Z

Hi! Thanks a lot for this fantastic package!

I was running the examples/causal_language_modeling/peft_lora_clm_accelerate_ds_zero3_offload.py script for the bloomz-7b1 model. As per the README, I was expecting ~18.1GB GPU memory and 35GB CPU memory, however from the logs generated (please see below; logs for the 15th epoch) the GPU memory consumption seems to be a lot more i.e. close to 32GB GPU memory while CPU memory is much lesser.

Edit: I think I missed some setup steps required for the deepspeed offloading since the is_ds_zero_3 variable in line 238 is always False. Please let me know! Thank you

Note: I'm running this on a Ubuntu 18.04 x86_64 machine with a single 40GB A100 GPU.

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:06<00:00,  1.11it/s]
GPU Memory before entering the train : 27026
GPU Memory consumed at the end of the train (end-begin): 242
GPU Peak Memory consumed during the train (max-begin): 5011
GPU Total Peak Memory consumed during the train (max): 32037
CPU Memory before entering the train : 4080
CPU Memory consumed at the end of the train (end-begin): 0
CPU Peak Memory consumed during the train (max-begin): 0
CPU Total Peak Memory consumed during the train (max): 4080
epoch=15: train_ppl=tensor(2.0908, device='cuda:0') train_epoch_loss=tensor(0.7375, device='cuda:0')
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:08<00:00,  1.26s/it]
GPU Memory before entering the eval : 27268
GPU Memory consumed at the end of the eval (end-begin): -242
GPU Peak Memory consumed during the eval (max-begin): 1465
GPU Total Peak Memory consumed during the eval (max): 28733
CPU Memory before entering the eval : 4080
CPU Memory consumed at the end of the eval (end-begin): 0
CPU Peak Memory consumed during the eval (max-begin): 0
CPU Total Peak Memory consumed during the eval (max): 4080
accuracy=84.0
eval_preds[:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'no complaint']
dataset['train'][label_column][:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint']

The text was updated successfully, but these errors were encountered:

pacman100 · 2023-02-13T05:28:25Z

Hello @karthikmurugadoss , for running the script using DeepSpeed ZeRO stage 3 with CPU offloading, please follow the steps mentioned here:
https://github.com/huggingface/peft#example-of-peft-model-training-using--accelerates-deepspeed-integration

karthikmurugadoss · 2023-02-13T19:20:32Z

Thanks a lot @pacman100 - there was an issue in my DeepSpeed config that I didn't catch earlier

Thanks for the quick response!

karthikmurugadoss closed this as completed Feb 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU and CPU memory utilization while running the peft_lora_clm_accelerate_ds_zero3_offload.py script #76

GPU and CPU memory utilization while running the peft_lora_clm_accelerate_ds_zero3_offload.py script #76

GPU and CPU memory utilization while running the peft_lora_clm_accelerate_ds_zero3_offload.py script #76

GPU and CPU memory utilization while running the peft_lora_clm_accelerate_ds_zero3_offload.py script #76

Comments