Numerical precision issue of operators selu, leakyRelu, softplus and their corresponding backward operators on Bfloat16 vs float32 #67440

wzzll123 · 2024-05-13T12:05:08Z

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

binary

TensorFlow version

tf 2.16.1

Custom code

Yes

OS platform and distribution

No response

Mobile device

No response

Python version

3.10.9

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

I'd like to bring to attention an issue concerning the numerical precision of several operators (selu, leaky, relu) when operating on Bfloat16 versus float32 data types. I conducted comparisons using 20,000 random tensors for these operators, assessing the outputs in both Bfloat16 and float32 and computing the discrepancies. My observations indicate that differences generated by TensorFlow are generally more pronounced compared to PyTorch. Particularly noteworthy is the significant error produced by the SeluGrad operator. The results are summarized in the table below:

Operator	TensorFlow	PyTorch
selu	0.24918	0.12243
leakyrelu	0.01875	0.00094
softplus	0.05488	0.01554
seluGrad	10.41794	0.12406
leakyreluGrad	0.01875	0.00094
softplusGrad	0.13502	0.12484

In a standalone code to reproduce the issue, I provide illustrative instances for seluGrad operators, where the output discrepancy between Bfloat16 and float32 can be as high as 10.4.

Standalone code to reproduce the issue

import tensorflow as tf
import numpy as np

features = tf.convert_to_tensor(np.array([-0.00112915]), dtype=tf.float32)
gradients = tf.convert_to_tensor(np.array([-14.6875]), dtype=tf.float32)

x = tf.Variable(features)
k = tf.constant(gradients)
with tf.GradientTape(persistent=True) as tape:
    y = tf.nn.selu(features=x)
    z = k*y
    fianl = tf.reduce_mean(z)
    
print('float32 gradient:',tape.gradient(z, x))


features = tf.cast(features, dtype=tf.bfloat16)
gradients = tf.cast(gradients, dtype=tf.bfloat16)
x = tf.Variable(features)
k = tf.constant(gradients)
with tf.GradientTape(persistent=True) as tape:
    y = tf.nn.selu(features=x)
    z = k*y
    fianl = tf.reduce_mean(z)
print('float16 gradient:',tape.gradient(z, x))

Relevant log output

float32 gradient: tf.Tensor([-25.792944], shape=(1,), dtype=float32)
bfloat16 gradient: tf.Tensor([-15.375], shape=(1,), dtype=bfloat16)

SuryanarayanaY · 2024-05-14T04:38:27Z

Hi @wzzll123 ,

I have tested the code with tf-nightly and replicated the precisional differences wrt float32 and bfloat16. Attached gist for reference. There seems to be an issue.

wzzll123 · 2024-05-14T06:16:29Z

Hi @SuryanarayanaY , thanks for replying.

The precision discrepancies may stem from the fact that activation operators' computations don't incorporate precision improvement. Similar issues have arisen in PyTorch, which were fixed by using precision improvement. Perhaps TensorFlow could address this issue similarly?

google-ml-butler bot added the type:bug Bug label May 13, 2024

google-ml-butler bot assigned SuryanarayanaY May 13, 2024

SuryanarayanaY added TF 2.16 comp:ops OPs related issues labels May 14, 2024

tilakrayal added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numerical precision issue of operators selu, leakyRelu, softplus and their corresponding backward operators on Bfloat16 vs float32 #67440

Numerical precision issue of operators selu, leakyRelu, softplus and their corresponding backward operators on Bfloat16 vs float32 #67440

Numerical precision issue of operators selu, leakyRelu, softplus and their corresponding backward operators on Bfloat16 vs float32 #67440

Numerical precision issue of operators selu, leakyRelu, softplus and their corresponding backward operators on Bfloat16 vs float32 #67440

Comments

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output