-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to compile TF 2.15.0/2.16.1 (with Cuda support) using clang in Ubuntu 22.04 #62459
Comments
When nvcc is used as compiler, compilation still fails. Configure and compile Logs are attached. |
The only way to have a successful compilation is to use gcc with NO Cuda support. |
The original bug while using clang can be fixed by making sure the library
However a new set of errors appear, which seem related to a confusing definition of
|
@feranick Could you please try to uninstall the clang version 14 and install the clang 13 and try to reconfigure TF ? Please let us know if it helps? |
Compilation still fails, although for a different reason. Log attached. |
Actually, I tried it in a different machine with a similar configuration, and even using clang-13 the error seems to be the same as in the original post (using clang-14). Log attached. |
As the issue seems to be related to CUDA, what is the preferred version of CUDA that Google advises to use for TF2.15.0? TF2.14.x works with CUDA 11.8, but this doesn't work (as per this bug) for TF2.15.0. |
12.2 is the CUDA version we bumped from TensorFlow 2.15. Could you please try with that and let us know the outcome. Thanks! |
Still getting the same error with CUDA 12.2. Log attached. |
I just tried to compile TF2.15.0 using CUDA 12.2 and |
It would also be good to update the ./configure to reflect that the min version of CUDA should be 12.2 (it still lists 11), and somewhere a list of the supported versions of each software required. |
We have updated the document in the code, due to some sync issue it is not published yet in the website. |
I have the same problem on rocky 8.8 compiling tensorflow 2.15 with cuda 12.2 and clang 16.0.1. Modules loaded for compilation (all compiled in house on the same host):
Running .configure results in the following .tf_configure.bazelrc (implying using clang as cuda compiler):
The bazel build command:
If i rerun configure and choose not to use clang as cuda compiler it results in a slightly
and the bazel build fails with a different error:
While the compilation itself apparently went further, it makes no sense to me that bazel subcommand failed with Either way, with or without clang bazel build fails. |
This is a separate issues than the one reported here (which is specific to Ubuntu, and related to the way CUDA is called within clang). I would recommend filing a separate issue for Rocky linux and libstfc++. |
I have also been trying to compile TF 2.15 (well, master actually) on Ubuntu 22.04 and have been running into some issues with clang. I have to say, I find the available documentation on compiling TF from source a tad contradictory and a tad confusing. My story so far is as follows.
I then tried to compile master, but it fails because, despite not configuring tensorrt, it is looking for tensorrt headers.
Compiling with clang gets me deep into the compilation, but it then bombs out:
I am now attempting the build with gcc ... |
Still an issue with TF 2.1.6.1. Apparently compiling with clang supports only CUDA up to v11.5. See log below. |
Have you tried it with Clang 17 on your ubuntu, here is the doc for installing Clang https://www.tensorflow.org/install/source#install_clang_recommended_linux_only |
I will. However, the updated clang 17 is not available in the standard Ubuntu repository (it's an external repo), which means that TF cannot be compiled wth Clang with standard tools. gcc works fine. It would be great to mention that during |
I was able to build 2.16.1 with CUDA 12.4 with clang 17 on Ubuntu 22.04 by manually adding the line "build:cuda --copt=-Wno-error=unused-command-line-argument" into .tf_configure.bazelrc after running configure and before running bazel build. I also had to "export TF_PYTHON_VERSION=3.10" before "bazel build --subcommands //tensorflow/tools/pip_package/v2:wheel --repo_env=WHEEL_NAME=tensorflow --config=cuda". I do have a separate build error (see issue 62047) if I add copt "-march=native" or "-mavx" though. |
Minor correction - the issue referenced in 62047 only happens with "-march=native". "-mavx" does build and run successfully. The other issue was blocked from comments since I added my note there. This was done with Intel 4410Y CPUs. |
You don't have to modify the |
I built v2.16.1 successfully with CUDA 12.3 + cuDNN 8.9.7 + tensorrt 8.6.1 with clang 17 on Ubuntu 22.04, with command line option |
Still present on r2.17. Any chance of seeing this fixed? |
Issue type
Build/Install
Have you reproduced the bug with TensorFlow Nightly?
No
Source
source
TensorFlow version
2.15.0/2.16.1
Custom code
No
OS platform and distribution
Linux Ubuntu 16.04
Mobile device
No response
Python version
3.10.12
Bazel version
6.1.0
GCC/compiler version
Clang 14.0.0-1ubuntu1.1
CUDA/cuDNN version
11.8-12.3
GPU model and memory
Quadro RTX 6000 24GB
Current behavior?
Compiling TF 2.15.0/2.16.1 in Ubuntu 22.04 using clang fails. Log is attached.
Standalone code to reproduce the issue
Relevant log output
The text was updated successfully, but these errors were encountered: