[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault (Core Dumped) when convert whisper with int8 quantization #61695

Open
SantiagoMoreno-UdeA opened this issue Aug 25, 2023 · 12 comments
Assignees
Labels
comp:lite TF Lite related issues ModelOptimizationToolkit TF Model Optimization Toolkit stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.12 For issues related to Tensorflow 2.12 TFLiteConverter For issues related to TFLite converter type:bug Bug

Comments

@SantiagoMoreno-UdeA
Copy link
SantiagoMoreno-UdeA commented Aug 25, 2023

System information
Linux 20.04
pip Tensorflow==2.12.0
using tranformers WhisperForConditionalgeneration

I'm trying to convert from TF to tflite and quantized to int8 Whisper, using the whisper model from tranformers WhisperForConditionalGeneration. At some point the conversion crash.
Here is the colab for more details:
Colab: https://colab.research.google.com/drive/1oAVoUxRFZLkS1uqqFN8HdgRVk0IWAlsN?usp=sharing

Also I attach the Error Trace from my server running in CPU and also running in GPU (TITAN RTX 24GB).
CPU: TraceTflite.txt

GPU: TraceTflite_GPU.txt

@SantiagoMoreno-UdeA SantiagoMoreno-UdeA added the TFLiteConverter For issues related to TFLite converter label Aug 25, 2023
@tilakrayal tilakrayal added comp:lite TF Lite related issues type:bug Bug TF 2.12 For issues related to Tensorflow 2.12 labels Aug 28, 2023
@tilakrayal tilakrayal assigned pjpratik and unassigned tilakrayal Aug 28, 2023
@SantiagoMoreno-UdeA SantiagoMoreno-UdeA changed the title Core Dumped when convert whisper with int8 quantization Segmentation Fault (Core Dumped) when convert whisper with int8 quantization Aug 28, 2023
@pjpratik
Copy link
Contributor

Hi @SantiagoMoreno-UdeA

I was able to reproduce this issue in TF Nightly as well. Please find the gist here.

A similar issue is being tracked in #59716

Does dynamic range quantization works for your case?

Thanks.

@pjpratik pjpratik added the stat:awaiting response Status - Awaiting response from author label Aug 28, 2023
@SantiagoMoreno-UdeA
Copy link
Author

Hi @pjpratik!

Thanks for answering

I need all model in Int8 'cause I'm attempting to run whisper inference in a NPU and this only support int8 data type.
So Dynamic quantization is not an option for me :/.

Looking in advance for your answer.

Cheers!

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Aug 28, 2023
@pjpratik
Copy link
Contributor

@SantiagoMoreno-UdeA Thanks for the information.

@pkgoogle Could you please look into this issue?

Thanks.

@pjpratik pjpratik assigned pkgoogle and unassigned pjpratik Aug 28, 2023
@pkgoogle pkgoogle added the ModelOptimizationToolkit TF Model Optimization Toolkit label Aug 28, 2023
@pkgoogle
Copy link

I was able to reproduce from @pjpratik's gist.

@abattery can you please take a look

@pkgoogle pkgoogle added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Aug 28, 2023
@SantiagoMoreno-UdeA
Copy link
Author

Hello @abattery have you had time to take a look on this?

@nyadla-sys
Copy link
Member
nyadla-sys commented Sep 14, 2023

@SantiagoMoreno-UdeA I guess MUL op used in this model requires 16bit activation in order to preserve its accuracy. I am still not sure what is going on with TFLiteconverter
Here is the same issue I raised so long time back and no one addresses it
#58451

@nyadla-sys
Copy link
Member
nyadla-sys commented Sep 14, 2023

When I have analyzed, observed seg fault here ...
#0 0x00007f623573d7d3 in mlir::quant::QuantizedType::getExpressedType() const () from /usr/local/lib/python3.9/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#1 0x00007f623573e1ac in mlir::quant::QuantizedType::castFromExpressedType(mlir::Type) ()
from /usr/local/lib/python3.9/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so

@SantiagoMoreno-UdeA
Copy link
Author

@nyadla-sys It seems that so far quantize Whisper it's very tricky. Thank you for your information I'll take a look.

@emirkin
Copy link
emirkin commented Oct 22, 2023

Related to #29829

@James-Shared-Studios
Copy link

Hi there, I am facing the same issue when trying to convert whisper into int8 for running on TPU, is there any update please? Thank you.

@SantiagoMoreno-UdeA
Copy link
Author

Hi @James-Shared-Studios, No the error remains. It seems that it's very low level error.

@6nl
Copy link
6nl commented Mar 8, 2024

I found that tflite versions of Whisper generate NaN values when processing the -float("inf") values that are used in one part of the transformer codebase (specifically, the logits processor that kicks in when you call generate with forced tokens). Perhaps those NaNs make the int8 quantization crash too. I made a crude patch here, which has worked for me to stop the NaNs happening: nyadla-sys/whisper.tflite#15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues ModelOptimizationToolkit TF Model Optimization Toolkit stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.12 For issues related to Tensorflow 2.12 TFLiteConverter For issues related to TFLite converter type:bug Bug
Projects
None yet
Development

No branches or pull requests

9 participants