-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mixed-task batching #759
Comments
I think that right now, it would not be possible to have different tasks/adapters within a single batch. Moreover, I don't think it can be easily supported (not impossible, but would require serious refactoring). What does work already is to have a single model with multiple adapters and then switch the adapters by calling In practice, this approach would probably mean that you have to buffer prediction calls from different adapters to have any benefit from batching and to amortize the cost of switching adapters. Depending on your use case, this may or may not be feasible. |
Thanks, I suspected that it would be a big reactor... I will probably do as you say initially. Might run the base model as one instance and then peft model with changeable adapters as a second instance for specific tasks. I will ask the IA^3 authors as well, but do you know of any other implementation that supports mixed-task batches? |
No, I'm not aware, but please let us know if you find (or build) something. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Training or serving multiple LoRA modules in a single batch with the following community implementations: |
Feature request
Ability to load a PEFT model with multiple adapters, and perform mixed-task batch inference. This should be feasible with IA^3 and prompt/prefix tuning, not sure about LoRA.
Motivation
In the IA^3 paper (Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning), it is mentioned that one of the requirements for a PEFT method to be compared with few-shot ICL, it should allow for mixed-task batches.
This would allow for building a very resource-efficient LLM application where we serve a single PEFT model with a large number of adapters to cover different tasks that we want the application to perform, and can handle concurrent requests from multiple users, all performing different tasks.
Your contribution
I could possibly get people in my team to look at this if we can get some advice on how to structure the change.
The text was updated successfully, but these errors were encountered: