-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing Tensorflow unit tests for BF16 hardware #65988
Comments
@christinaburge The issue is present only in the nightly build, which is intended for testing new features and might contain bugs. Could you try with the stable version with a minimal example and let us know? |
This is an issue in our downstream CI, would you be able to advise please? |
Hi, just wondering if anything has happened with this please? |
Ah sorry, this fell through the cracks. It's a bug in LLVM's lowering to the bf16 hardware, but I don't have access to a machine with that instruction set. Would you mind running the test with |
bug_files.tar.gz |
Thanks, created a reduced reproducer at llvm/llvm-project#94951 I'm not exactly sure it's the same issue, as this one requires SVE and your original error message had no SVE types in it. |
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
Yes
Source
source
TensorFlow version
tf 2.17.0
Custom code
No
OS platform and distribution
Linux Ubuntu 22.04
Mobile device
No response
Python version
No response
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
We have the following unit test failures in Tensorflow github/nightly:
//tensorflow/compiler/tests:conv3d_test_cpu
//tensorflow/compiler/tests:conv3d_test_cpu_mlir_bridge_test
//tensorflow/compiler/tests:stateful_random_ops_test_cpu
//tensorflow/compiler/tests:stateless_random_ops_test_cpu
//tensorflow/compiler/tests:stateless_random_ops_test_cpu_mlir_bridge_test
//tensorflow/compiler/tests:stateful_random_ops_test_cpu_mlir_bridge_test
//tensorflow/compiler/tests:stochastic_cast_op_test_cpu
On investigation, the first commit where this issue is present is a4d7e97, tests pass with the commit immediately prior to this.
The tests do not fail in the upstream CI because it uses N1 cores with no bf16 HW.
Standalone code to reproduce the issue
Relevant log output
The text was updated successfully, but these errors were encountered: