-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation Fault (Core Dumped) when convert whisper with int8 quantization #61695
Comments
Hi @pjpratik! Thanks for answering I need all model in Int8 'cause I'm attempting to run whisper inference in a NPU and this only support int8 data type. Looking in advance for your answer. Cheers! |
@SantiagoMoreno-UdeA Thanks for the information. @pkgoogle Could you please look into this issue? Thanks. |
Hello @abattery have you had time to take a look on this? |
@SantiagoMoreno-UdeA I guess MUL op used in this model requires 16bit activation in order to preserve its accuracy. I am still not sure what is going on with TFLiteconverter |
When I have analyzed, observed seg fault here ... |
@nyadla-sys It seems that so far quantize Whisper it's very tricky. Thank you for your information I'll take a look. |
Related to #29829 |
Hi there, I am facing the same issue when trying to convert whisper into int8 for running on TPU, is there any update please? Thank you. |
Hi @James-Shared-Studios, No the error remains. It seems that it's very low level error. |
I found that tflite versions of Whisper generate NaN values when processing the -float("inf") values that are used in one part of the transformer codebase (specifically, the logits processor that kicks in when you call generate with forced tokens). Perhaps those NaNs make the int8 quantization crash too. I made a crude patch here, which has worked for me to stop the NaNs happening: nyadla-sys/whisper.tflite#15 |
System information
Linux 20.04
pip Tensorflow==2.12.0
using tranformers WhisperForConditionalgeneration
I'm trying to convert from TF to tflite and quantized to int8 Whisper, using the whisper model from tranformers WhisperForConditionalGeneration. At some point the conversion crash.
Here is the colab for more details:
Colab: https://colab.research.google.com/drive/1oAVoUxRFZLkS1uqqFN8HdgRVk0IWAlsN?usp=sharing
Also I attach the Error Trace from my server running in CPU and also running in GPU (TITAN RTX 24GB).
CPU: TraceTflite.txt
GPU: TraceTflite_GPU.txt
The text was updated successfully, but these errors were encountered: