[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to change to PEFT model dynamically? #1829

Closed
whr819987540 opened this issue Jun 5, 2024 · 4 comments
Closed

How to change to PEFT model dynamically? #1829

whr819987540 opened this issue Jun 5, 2024 · 4 comments

Comments

@whr819987540
Copy link

python==3.7.12
PEFT==0.3.0

@BenjaminBossan

I fine-tune the eleventh transformer of Bert as below:

target_modules = []
target_modules.append("11.attention.self.query")
target_modules.append("11.attention.self.value")

lora_config = LoraConfig(
    r = self.args.lora_rank,
    lora_alpha = self.args.lora_alpha,
    target_modules = target_modules,
    lora_dropout = 0.05,
    bias = "none"
)

After training for a few epochs, I also want to fine-tune the first transformer. How to achieve this?

@BenjaminBossan
Copy link
Member

This is not directly possible. What you could try is to add all the layers you want to eventually train into target_modules. Then, you go through the modules and manually disable the gradient of those you don't want to train:

target_modules = [
    "0.attention.self.query", "0.attention.self.value",
    "11.attention.self.query", "11.attention.self.value",
]
...
model = get_peft_model(...)
for name, module in model.named_modules():
    if name ...:
        module.requires_grad_(False)

I've never tried this, so not 100% sure if it'll work, but worth a try.

@whr819987540
Copy link
Author

Actually, that's what I am using. But model (just fine-tune the last transformer) created by this way performs differently with model created directly (pass the target_modules argument). I guess it's because of the initialization of additional layers changes the random number sequence.

@BenjaminBossan
Copy link
Member

Yes, if you create more LoRA layers, the random seed will be ticked more often. But I don't understand what the big issue would be with that. Maybe you could show your code and explain what you expect vs what actually happens.

@whr819987540
Copy link
Author

Just to know whether it's the reason of random seed.

Thanks a lot for your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants