Mixed-task batching #759

einarbmag · 2023-07-27T14:48:32Z

Feature request

Ability to load a PEFT model with multiple adapters, and perform mixed-task batch inference. This should be feasible with IA^3 and prompt/prefix tuning, not sure about LoRA.

Motivation

In the IA^3 paper (Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning), it is mentioned that one of the requirements for a PEFT method to be compared with few-shot ICL, it should allow for mixed-task batches.

This would allow for building a very resource-efficient LLM application where we serve a single PEFT model with a large number of adapters to cover different tasks that we want the application to perform, and can handle concurrent requests from multiple users, all performing different tasks.

Your contribution

I could possibly get people in my team to look at this if we can get some advice on how to structure the change.

BenjaminBossan · 2023-07-28T15:25:25Z

I think that right now, it would not be possible to have different tasks/adapters within a single batch. Moreover, I don't think it can be easily supported (not impossible, but would require serious refactoring).

What does work already is to have a single model with multiple adapters and then switch the adapters by calling set_adapter(<adapter-name>) between batches (should also work with LoRA). That way, you would still have a rather resource efficient deployment.

In practice, this approach would probably mean that you have to buffer prediction calls from different adapters to have any benefit from batching and to amortize the cost of switching adapters. Depending on your use case, this may or may not be feasible.

einarbmag · 2023-07-28T18:13:11Z

Thanks, I suspected that it would be a big reactor... I will probably do as you say initially. Might run the base model as one instance and then peft model with changeable adapters as a second instance for specific tasks.

I will ask the IA^3 authors as well, but do you know of any other implementation that supports mixed-task batches?

BenjaminBossan · 2023-07-31T08:55:28Z

I will ask the IA^3 authors as well, but do you know of any other implementation that supports mixed-task batches?

No, I'm not aware, but please let us know if you find (or build) something.

github-actions · 2023-08-26T15:03:17Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

ruirui-zhang · 2024-01-17T04:17:57Z

Training or serving multiple LoRA modules in a single batch with the following community implementations:
[https://github.com/S-LoRA/S-LoRA]
[https://github.com/sabetAI/BLoRA]
[https://github.com/TUDB-Labs/multi-lo...

einarbmag mentioned this issue Jul 28, 2023

Multi-task batching r-three/t-few#30

Open

github-actions bot closed this as completed Sep 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixed-task batching #759

Mixed-task batching #759

Mixed-task batching #759

Mixed-task batching #759

Comments

Feature request

Motivation

Your contribution