-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TFLite GPUv2: ADD(x, 1e-5) results in severely wrong output #67216
Comments
Hi @gustavla , I replicated your issue using Qualcom Ai hub, and i got the same results as you. Let me verify the same through an Android app and I will get back to you. |
|
Hi @gustavla , I tried reproducing your issue using an Android app but I kept running into issues with passing the inputs to the tflite model. |
@sawantkumar I do not have a stand-alone app that reproduces this. However, the original repro did run through an app using TFLite on a real Android device. Feel free to re-save the npy file in another format that you are more used to using for repros like this. For instance, you can do packed row-major as GPUv2 configuration: |
Hi @gustavla , If possible, can you share me the model tensorflow to Tflite model conversion script. The issue could lie there also. |
Sorry for the late turn around. Took a look at the network. You are tapping into the intermediate tensors, but we reuse intermediate tensors, no matter whether it's a graph output tensor (sorry, that's a limitation that requires some engineering resources to fix and we never prioritized that). If you really want to tap into an intermediate tensor, you have to add a small no-op. For example, if you want to read what you have named as "variance" in the above picture (output of MEAN), you have to add a small non-zero value, e.g. |
System information
Assets:
numpy.savez
)Please take a look at two outputs in particular of this network:
key = "model_13/featurefusion_network/encoder/query_layer/norm/LayerNormalization/moments/variance"
(variance)key2 = "model_13/featurefusion_network/encoder/query_layer/norm/LayerNormalization/batchnorm/add"
(add)The variable
variance
gets fed into ADD(x, 0.000009999999747378752) and comes out asadd
.I ran this on the CPU (xnnpack) and the GPU (GPUv2) and got totally different results.
variance
looks like this across CPU and GPU (so far consistent):add
looks like this across CPU and GPU:Here, the values on the GPU has gone completely off the rails. They do not look random though, since there is a periodicity to the output (error alternates between around 1.6 and 0.6).
Standalone code to reproduce the issue
This should be simple to set up through benchmark tool or any other way to run GPUv2 directly. I ran it through Qualcomm's AI Hub (https://aihub.qualcomm.com), so I'm attaching the script that I used as a reference. This also shows how the example inputs can be loaded into python.
The text was updated successfully, but these errors were encountered: