-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
partial implementation of lqlora #8324
base: develop
Are you sure you want to change the base?
Conversation
|
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8324 +/- ##
===========================================
- Coverage 54.41% 54.39% -0.03%
===========================================
Files 632 633 +1
Lines 99475 99516 +41
===========================================
Hits 54127 54127
- Misses 45348 45389 +41 ☔ View full report in Codecov by Sentry. |
lora_A = Ur @ paddle.diag(paddle.sqrt(Sr)) | ||
lora_B = paddle.diag(paddle.sqrt(Sr)) @ Vhr | ||
|
||
Q = qlora_weight_quantize_dequantize(W-lora_A@lora_B, double_quant=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double_quant=True,应该作为一个可调节参数,qlora_weight_quantize_dequantize中的其他参数也一样
Sr = S[:num_ranks] | ||
Vhr = Vh[:num_ranks] | ||
|
||
lora_A = Ur @ paddle.diag(paddle.sqrt(Sr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
配置的时候需要考虑lora scaling,看起来lora scaling只能强制为1
|
||
if W.dtype in [paddle.float16]: | ||
old_dtype = W.dtype | ||
W = paddle.cast(W, dtype=paddle.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast成fp32的原因?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有没有实验结果可以参考一下效果 |
提交之前,修复格式问题
|
|
||
import paddle | ||
from paddlenlp.quantization.qlora import qlora_weight_quantize_dequantize | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议把lqlora初始化的过程写成一个lqlora_init的函数,通过lora_config传入是否使用lqlora,考虑在621行前对lora_module apply这个lqlora_init,https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/peft/lora/lora_model.py#L621
@@ -477,6 +478,9 @@ def neft_post_hook(module, input, output): | |||
else: | |||
model = LoRAModel.from_pretrained(model=model, lora_path=model_args.lora_path) | |||
|
|||
if model_args.lqlora: | |||
transform_lora_layers(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
传入到lora_config lqlora来控制
PR types
PR changes
Description