Use `peft` for RLHF #71

younesbelkada · 2023-02-11T08:37:44Z

Feature request

We should leverage trl: https://github.com/lvwerra/trl - the recent library from Hugging Face for RLHF, to apply PPO using peft and LoRA

I think peft should just work out of the box, the first step could be trying to adapt gpt2-sentiment.py script: https://github.com/lvwerra/trl/blob/main/examples/sentiment/scripts/gpt2-sentiment.py to use peft

The text was updated successfully, but these errors were encountered:

BirgerMoell · 2023-02-15T09:31:29Z

@pacman100 @younesbelkada I could work on this if there is any interest. What would be the first steps?

younesbelkada · 2023-02-15T09:42:47Z

Thanks for your interest @BirgerMoell !! That would be definitely helpful
We need to check first the compatbility of peft and trl (i.e. it might not directly work out of the box but we will need to double check), so we will most likely adapt the gpt2 script ourselves just to make sure we fix things that needs to be fixed, then you can help us maybe converting the t5 script? Let me know what do you think!
If you feel you want to take the challenge and start converting the gpt2 script feel free to do it and we'll be happy to review your Pull Request / code changes ! 🔥

BirgerMoell · 2023-02-15T09:58:48Z

I think it sounds like a good idea if you start out with the gpt2 script and then I can do the t5 script. I think that would be a nice way to do it and I can understand much more by following along with your work.
Btw I'm on the huggingface discord if you want to messag me my username is birger

younesbelkada · 2023-03-02T13:53:32Z

The PR huggingface/trl#163 should add peft support on trl, we may need to convert the other scripts such as t5 summarization etc as well once this PR gets merged!

younesbelkada · 2023-06-21T15:20:20Z

Hi everyone,
Closing this issue as we now officially support PEFT in TRL , making it possible to apply RLHF + PEFT

pacman100 added the wip label Feb 24, 2023

younesbelkada closed this as completed Jun 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `peft` for RLHF #71

Use `peft` for RLHF #71

Use peft for RLHF #71

Use peft for RLHF #71

Comments

Feature request

Use `peft` for RLHF #71

Use `peft` for RLHF #71