Using unsloth mode gradient checkpointing without LoRA #644

Robinysh · 2024-06-14T21:52:57Z

Currently unsloth offers a customized version of gradient checkpointing that claims to be better. The only way I'm aware of using it is with the below code.

model = FastLanguageModel.get_peft_model(
    model,
    use_gradient_checkpointing = "unsloth", # <<<<<<<
)

But using FastLanguageModel.get_peft_model will patch the model with LoRA. Is there any way to use the unsloth customized gradient checkpointing without LoRA? Or does it even make sense to use it without? Are the customized tricks specific to pefts?

The text was updated successfully, but these errors were encountered:

danielhanchen · 2024-06-15T10:27:09Z

We'll be adding all model support in a future release which will enable Unsloth GC for other models! Unsure on normal full finetuning or pretraining - I would suggest using Deepspeed to offload stuff, and not Unsloth

Robinysh · 2024-06-15T17:50:18Z

Great to know its on the todo list. I'm not looking for offloading techniques as the performance drop is quite significant, I'm rather trying to do gradient checkpointing during pretraining. The pytorch implementation should be good enough for the time being.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using unsloth mode gradient checkpointing without LoRA #644

Using unsloth mode gradient checkpointing without LoRA #644

Using unsloth mode gradient checkpointing without LoRA #644

Using unsloth mode gradient checkpointing without LoRA #644

Comments