FEAT / Bitsandbytes: Add `dequantize` API for bitsandbytes quantized models #30806

younesbelkada · 2024-05-14T15:12:21Z

What does this PR do?

This PR adds a new feature dequantize in order to de-quantize models for interesting usecases such as the one described in #30177

The API is very simple:

from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer

model_id = "facebook/opt-125m"

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=BitsAndBytesConfig(load_in_4bit=True))
tokenizer = AutoTokenizer.from_pretrained(model_id)

model.dequantize()

text = tokenizer("Hello my name is", return_tensors="pt").to(0)

out = model.generate(**text)
print(tokenizer.decode(out[0]))

Users just need to make sure they have enough GPU RAM in order to store the unquantized model, otherwise they might face unexpected behaviour

Added the support for 4-bit / 8-bit models and nice tests + docs to educate users on how to use this new API.

cc @amyeroberts @SunMarc

HuggingFaceDocBuilderDev · 2024-05-14T15:37:08Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

Thanks for adding this new method in quantizer ! This will make fine-tuning with quantized model way easier ! I left a few minor comments.

docs/source/en/quantization.md

src/transformers/integrations/__init__.py

src/transformers/modeling_utils.py

src/transformers/quantizers/quantizer_bnb_4bit.py

src/transformers/quantizers/quantizer_bnb_8bit.py

SunMarc · 2024-05-15T10:06:45Z

src/transformers/integrations/bitsandbytes.py

+    if cls_name == "Params4bit":
+        return bnb.functional.dequantize_4bit(weight.data, weight.quant_state)


The user might want to know in which precision the model was dequantized since they don't have the possibility to control that. I think it could be great to give that information since there is no default value (as opposed to from_pretrained which loads the model in fp32).
Two ways to get that:

just check the dtype of the weights at the end ( potentially the easiest way )

check what happens in dequantize_4bit . In the method, you see that they get the output dtype with weight.quant_state.dtype.

We can potentially add a torch_dtype attribute in the future if it makes sense.

Nice catch! The output dtype should be correctly inferred here: https://github.com/TimDettmers/bitsandbytes/blob/b891f80ba514833f41f0e9226983b02a9fb5c44b/bitsandbytes/functional.py#L1349 through the compute_dtype so it should be accurate - I added a warning_once staement to inform users on the dequantized dtype: 1a4a906

amyeroberts

Thanks for adding this! +1 on all of @SunMarc's comments.

tests/quantization/bnb/test_mixed_int8.py

amyeroberts · 2024-05-15T10:46:29Z

src/transformers/integrations/bitsandbytes.py

+    if cls_name == "Params4bit":
+        return bnb.functional.dequantize_4bit(weight.data, weight.quant_state)


amyeroberts · 2024-05-15T10:48:03Z

src/transformers/integrations/bitsandbytes.py

+
+    Returns the converted model and a boolean that indicates if the conversion has been successfull or not.
+    """
+    import bitsandbytes as bnb


This is already imported at the top of the module

Nice catch ! Should be fixed now

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

amyeroberts

Thanks for adding this feature and iterating!

amyeroberts · 2024-05-15T14:05:32Z

src/transformers/integrations/bitsandbytes.py

+            )
+        # Remove the last key for recursion
+        current_key_name.pop(-1)
+    return model, has_been_replaced


One general comment, if instead you could have a private method _dequantize_and_replace, which handles the recursion, you don't need to return has_been_replaced here. When someone calls dequantize_and_replace, I don't think has_been_replaced is ever used and could be confusing e.g.:

# This is just dequantize_and_replace from before def _dequantize_and_replace( model, modules_to_not_convert=None, current_key_name=None, quantization_config=None, has_been_replaced=False, ): ... return model, has_been_replaced def dequantize_and_replace( model, modules_to_not_convert=None, current_key_name=None, quantization_config=None, has_been_replaced=False, ): model, has_been_replaced = _dequantize_and_replace(...) return model

makes sense ! Will do !

Done in 8b904f7 !

src/transformers/integrations/bitsandbytes.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

amyeroberts · 2024-05-15T14:43:14Z

src/transformers/integrations/bitsandbytes.py

+    )
+
+    if not has_been_replaced:
+        logger.warning(


RonanKMcGovern · 2024-05-15T16:57:05Z

Yeah this is great, thanks

younesbelkada · 2024-05-16T09:58:09Z

Great thanks @RonanKMcGovern ! let us know how it goes

…models (#30806) * add method * change method name * more comments * Apply suggestions from code review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fixup * add docstrings and fix comment * warn users on the de-quantized dtype * Update src/transformers/quantizers/base.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/integrations/bitsandbytes.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * final suggestion - use private method --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

…models (huggingface#30806) * add method * change method name * more comments * Apply suggestions from code review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fixup * add docstrings and fix comment * warn users on the de-quantized dtype * Update src/transformers/quantizers/base.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/integrations/bitsandbytes.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * final suggestion - use private method --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

younesbelkada added 2 commits May 14, 2024 17:07

add method

e0c39a9

change method name

c748425

younesbelkada mentioned this pull request May 14, 2024

Load nf4 weights/model in bfloat16 #30177

Closed

younesbelkada requested review from SunMarc and amyeroberts May 14, 2024 15:13

more comments

14d51c2

SunMarc approved these changes May 15, 2024

View reviewed changes

amyeroberts reviewed May 15, 2024

View reviewed changes

younesbelkada and others added 5 commits May 15, 2024 13:02

Apply suggestions from code review

be7af7c

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

fixup

1cff84d

add docstrings and fix comment

ba01b82

warn users on the de-quantized dtype

1a4a906

Update src/transformers/quantizers/base.py

309581b

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

younesbelkada requested a review from amyeroberts May 15, 2024 13:19

amyeroberts approved these changes May 15, 2024

View reviewed changes

younesbelkada and others added 2 commits May 15, 2024 16:22

Update src/transformers/integrations/bitsandbytes.py

2813fb8

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

final suggestion - use private method

8b904f7

amyeroberts reviewed May 15, 2024

View reviewed changes

Merge remote-tracking branch 'origin/main' into add-dequant

7f17efa

younesbelkada merged commit 3f43582 into huggingface:main May 15, 2024
22 checks passed

younesbelkada deleted the add-dequant branch May 15, 2024 15:17

faaany mentioned this pull request May 24, 2024

fix models that fail in test_model_parallelism #30876

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT / Bitsandbytes: Add `dequantize` API for bitsandbytes quantized models #30806

FEAT / Bitsandbytes: Add `dequantize` API for bitsandbytes quantized models #30806

		if cls_name == "Params4bit":
		return bnb.functional.dequantize_4bit(weight.data, weight.quant_state)

FEAT / Bitsandbytes: Add dequantize API for bitsandbytes quantized models #30806

FEAT / Bitsandbytes: Add dequantize API for bitsandbytes quantized models #30806

Conversation

What does this PR do?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FEAT / Bitsandbytes: Add `dequantize` API for bitsandbytes quantized models #30806

FEAT / Bitsandbytes: Add `dequantize` API for bitsandbytes quantized models #30806