[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed-task batching #759

Closed
einarbmag opened this issue Jul 27, 2023 · 5 comments
Closed

Mixed-task batching #759

einarbmag opened this issue Jul 27, 2023 · 5 comments

Comments

@einarbmag
Copy link

Feature request

Ability to load a PEFT model with multiple adapters, and perform mixed-task batch inference. This should be feasible with IA^3 and prompt/prefix tuning, not sure about LoRA.

Motivation

In the IA^3 paper (Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning), it is mentioned that one of the requirements for a PEFT method to be compared with few-shot ICL, it should allow for mixed-task batches.

This would allow for building a very resource-efficient LLM application where we serve a single PEFT model with a large number of adapters to cover different tasks that we want the application to perform, and can handle concurrent requests from multiple users, all performing different tasks.

Your contribution

I could possibly get people in my team to look at this if we can get some advice on how to structure the change.

@BenjaminBossan
Copy link
Member

I think that right now, it would not be possible to have different tasks/adapters within a single batch. Moreover, I don't think it can be easily supported (not impossible, but would require serious refactoring).

What does work already is to have a single model with multiple adapters and then switch the adapters by calling set_adapter(<adapter-name>) between batches (should also work with LoRA). That way, you would still have a rather resource efficient deployment.

In practice, this approach would probably mean that you have to buffer prediction calls from different adapters to have any benefit from batching and to amortize the cost of switching adapters. Depending on your use case, this may or may not be feasible.

@einarbmag
Copy link
Author

Thanks, I suspected that it would be a big reactor... I will probably do as you say initially. Might run the base model as one instance and then peft model with changeable adapters as a second instance for specific tasks.

I will ask the IA^3 authors as well, but do you know of any other implementation that supports mixed-task batches?

@BenjaminBossan
Copy link
Member

I will ask the IA^3 authors as well, but do you know of any other implementation that supports mixed-task batches?

No, I'm not aware, but please let us know if you find (or build) something.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@github-actions github-actions bot closed this as completed Sep 3, 2023
@ruirui-zhang
Copy link

Training or serving multiple LoRA modules in a single batch with the following community implementations:
[https://github.com/S-LoRA/S-LoRA]
[https://github.com/sabetAI/BLoRA]
[https://github.com/TUDB-Labs/multi-lo...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants