TF 2.17.0 RC0 Fails to work with GPUs (and TF 2.16 too) #63362

JuanVargas · 2024-03-10T00:17:36Z

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

binary

TensorFlow version

TF 2.16.1

Custom code

No

OS platform and distribution

Linux Ubuntu 22.04.4 LTS

Mobile device

No response

Python version

3.10.12

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

12.4

GPU model and memory

No response

Current behavior?

I created a python venv in which I installed TF 2.16.1 following your instructions: pip install tensorflow
When I run python, import tf, and issue tf.config.list_physical_devices('GPU')
I get an empty list [ ]

I created another python venv, installed TF 2.16.1, only this time with the instructions:

python3 -m pip install tensorflow[and-cuda]

When I run that version, import tensorflow as tf, and issue

tf.config.list_physical_devices('GPU')

I also get an empty list.

BTW, I have no problems running on my box TF 2.15.1 with GPUs. Julia also works just fine with GPUs and so does PyTorch.
the

Standalone code to reproduce the issue

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2024-03-09 19:15:45.018171: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-09 19:15:50.412646: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
>>> tf.__version__
'2.16.1'

tf.config.list_physical_devices('GPU') 
2024-03-09 19:16:28.923792: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-09 19:16:29.078379: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
>>>

Relevant log output

No response

sh-shahrokhi · 2024-03-10T00:52:30Z

It does not work with python=3.12.2 either. Same error. installed tensorflow with $ pip install tensorflow[and-cuda]

damadorPL · 2024-03-10T21:30:58Z

The same error on bare Ubuntu and WSL2 2.15 works without any problems with python 3.11

DiegoMont · 2024-03-10T21:32:27Z

I have the same problem with Ubuntu 22.04.4 with the following environment:

tensorflow==2.16.1
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] on linux
cuDNN 8.6.0.163
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

nvcc --version output:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

AlpriElse · 2024-03-11T00:37:31Z

I'm not sure if this is the root cause, but I resolved my own issue which also surfaced as a "Cannot dlopen some GPU libraries." error when trying to run python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

To resolve my issue, I followed the tested build versions here:
https://www.tensorflow.org/install/source#gpu

and I needed to update my existing installations from cuDNN 9 -> 8.9 and CUDA 12.4->12.3

When you're on an NVIDIA download page like this one for CUDA Toolkit, don't just download the latest version. See previous versions by hitting "Archive of Previous CUDA Releases"

@JuanVargas can you try uninstalling your existing CUDA installation to a tested build configuration for TF 2.16 by downgrading to CUDA 12.3?

I followed this post to uninstall my existing cuda installation:
https://askubuntu.com/questions/530043/removing-nvidia-cuda-toolkit-and-installing-new-one

@DiegoMont can you try upgrading your cuDNN to 8.9 and CUDA to 12.3?

Gwyki · 2024-03-11T02:08:11Z

I am having the same issue. Brand new Ubuntu 22.04 WSL2 image. Blank Conda environment with either python 3.12.* or 3.11.* fails to correctly setup tensorflow for GPU use when following the recommended:
pip install tensorflow[and-cuda]

Trying to list the physical devices results in:

2024-03-11 02:00:00.294704: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-11 02:00:00.709325: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-11 02:00:01.180225: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2d:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 02:00:01.180445: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

cuDNN 8.9.*
Cuda 12.3
Tensorflow 2.16.1
TensorRT 8.6.1

Is this a new issue caused by the fact that it doesn't appear that any system cuda needs to be separately installed in WSL2 anymore. I certainly didn't install one manually and yet nvidia-smi is happily reporting cuda version 12.3. It probably comes down to some env paths not set correctly but playing around with $CUDA_PATH and guessing the location within the conda environment has not resolved anything. TensorRT doesn't seem to be picked up yet is definitely installed in the conda environment. Pytorch GPU visibility works as expected.

SuryanarayanaY · 2024-03-11T05:00:26Z

Hi @JuanVargas ,

For GPU package you need to ensure the installation of CUDA driver which can be verified with nvidia-smi command. Then you need to install TF-cuda package with pip install tensorflow[and-cuda] which automatically installs required cuda/cudnn libraries.

I have checked in colab and able to detect GPU.Please refer attached gist.

damadorPL · 2024-03-11T08:38:40Z

doublequotes in pip install because of ZSH

pip install "tensorflow[and-cuda]==2.16.1"                                                                       
 

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: tensorflow==2.16.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (2.16.1)
Requirement already satisfied: absl-py>=1.0.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.1.0)
Requirement already satisfied: astunparse>=1.6.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (1.6.3)
Requirement already satisfied: flatbuffers>=23.5.26 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (24.3.7)
Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.5.4)
Requirement already satisfied: google-pasta>=0.1.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.2.0)
Requirement already satisfied: h5py>=3.10.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.10.0)
Requirement already satisfied: libclang>=13.0.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (16.0.6)
Requirement already satisfied: ml-dtypes~=0.3.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.3.2)
Requirement already satisfied: opt-einsum>=2.3.2 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.3.0)
Requirement already satisfied: packaging in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (24.0)
Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (4.25.3)
Requirement already satisfied: requests<3,>=2.21.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.31.0)
Requirement already satisfied: setuptools in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (69.1.1)
Requirement already satisfied: six>=1.12.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (1.16.0)
Requirement already satisfied: termcolor>=1.1.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.4.0)
Requirement already satisfied: typing-extensions>=3.6.6 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (4.10.0)
Requirement already satisfied: wrapt>=1.11.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (1.16.0)
Requirement already satisfied: grpcio<2.0,>=1.24.3 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (1.62.1)
Requirement already satisfied: tensorboard<2.17,>=2.16 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.16.2)
Requirement already satisfied: keras>=3.0.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.0.5)
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.36.0)
Requirement already satisfied: numpy<2.0.0,>=1.23.5 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (1.26.4)
Requirement already satisfied: nvidia-cublas-cu12==12.3.4.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.3.4.1)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.3.101 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.3.101)
Requirement already satisfied: nvidia-cuda-nvcc-cu12==12.3.107 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.3.107)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.3.107 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.3.107)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.3.101 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.3.101)
Requirement already satisfied: nvidia-cudnn-cu12==8.9.7.29 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (8.9.7.29)
Requirement already satisfied: nvidia-cufft-cu12==11.0.12.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (11.0.12.1)
Requirement already satisfied: nvidia-curand-cu12==10.3.4.107 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (10.3.4.107)
Requirement already satisfied: nvidia-cusolver-cu12==11.5.4.101 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (11.5.4.101)
Requirement already satisfied: nvidia-cusparse-cu12==12.2.0.103 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.2.0.103)
Requirement already satisfied: nvidia-nccl-cu12==2.19.3 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (2.19.3)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.3.101 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.3.101)
Requirement already satisfied: wheel<1.0,>=0.23.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from astunparse>=1.6.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.42.0)
Requirement already satisfied: rich in ./miniconda3/envs/tf/lib/python3.11/site-packages (from keras>=3.0.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (13.7.1)
Requirement already satisfied: namex in ./miniconda3/envs/tf/lib/python3.11/site-packages (from keras>=3.0.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.0.7)
Requirement already satisfied: dm-tree in ./miniconda3/envs/tf/lib/python3.11/site-packages (from keras>=3.0.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.1.8)
Requirement already satisfied: charset-normalizer<4,>=2 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from requests<3,>=2.21.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from requests<3,>=2.21.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from requests<3,>=2.21.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from requests<3,>=2.21.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2024.2.2)
Requirement already satisfied: markdown>=2.6.8 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorboard<2.17,>=2.16->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.5.2)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorboard<2.17,>=2.16->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.7.2)
Requirement already satisfied: werkzeug>=1.0.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorboard<2.17,>=2.16->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.0.1)
Requirement already satisfied: MarkupSafe>=2.1.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from werkzeug>=1.0.1->tensorboard<2.17,>=2.16->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.1.5)
Requirement already satisfied: markdown-it-py>=2.2.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from rich->keras>=3.0.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from rich->keras>=3.0.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.17.2)
Requirement already satisfied: mdurl~=0.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from markdown-it-py>=2.2.0->rich->keras>=3.0.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.1.2)

nvidia-smi             
                                                                                           
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01              Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 Ti     On  |   00000000:01:00.0  On |                  N/A |
|  0%   39C    P5             10W /  285W |    4334MiB /  12282MiB |     13%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        41      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+

python3

Python 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))2024-03-11 09:36:29.601060: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-11 09:36:29.921637: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-11 09:36:30.793353: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
>>> print(tf.config.list_physical_devices('GPU'))
2024-03-11 09:36:33.878560: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 09:36:33.980099: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
>>>

damadorPL · 2024-03-11T08:44:20Z

nvcc -V 
                                                                                                          
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

damadorPL · 2024-03-11T09:29:54Z

got it work :) first
https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------

then download Local Installer for Ubuntu22.04 x86_64 (Deb)

unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb

sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb   
                                                           
Selecting previously unselected package libcudnn8.
(Reading database ... 47318 files and directories currently installed.)
Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ...
Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ...
Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ...

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"  

                             
2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-11 10:27:47.909157: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-11 10:27:48.316717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-11 10:27:48.664469: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 10:27:48.688059: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 10:27:48.688111: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

JuanVargas · 2024-03-11T13:03:48Z

Hi Krzysztof I visited the site https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- where I found an entry listed as " Local Installer for UBuntu22.04 x86_64(Deb)" which I downloaded. Unfortunately what I got is a package named "cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb" which is not the same as the name you suggest in your message, which is " libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb" I assume what you meant is to get the libcudnn8_8.9.7.29*amd64.deb and the cuda12.2_amd64.deb separately and install both. I have CUDA 12.4. I will not go back to trying to make TF 2.16.1 work with older versions of CUDA (12.2 or 12.3) because sooner or later the TF team will have to produce a version with the updated version of CUDA. IMHO, rather than us wasting time going back in versions, the TF beak should invest time going forward to update TF to the current CUDA version. Thank you, Juan

…

On Mon, Mar 11, 2024 at 5:30 AM Krzysztof Radzikowski < ***@***.***> wrote: got it work :) first https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- then download Local Installer for Ubuntu22.04 x86_64 (Deb) <https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb/> unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ` sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb Selecting previously unselected package libcudnn8. (Reading database ... 47318 files and directories currently installed.) Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ... Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ... Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ... ` python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" 2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2024-03-11 10:27:47.909157: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-03-11 10:27:48.316717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-03-11 10:27:48.664469: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-03-11 10:27:48.688059: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2024-03-11 10:27:48.688111: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] — Reply to this email directly, view it on GitHub <#63362 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGK34PV2IIF5FUZ73EPKOTYXV2SZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXHE3TANRRGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

sh-shahrokhi · 2024-03-11T14:52:29Z

It's just tensorflow can't see the Cuda libraries. Instal tensorflow[and-cuda] and add this to your .bashrc or conda activation script. Adjust python version in it according to your setup. NVIDIA_PACKAGE_DIR="$CONDA_PREFIX/lib/python3.12/site-packages/nvidia" for dir in $NVIDIA_PACKAGE_DIR/*; do if [ -d "$dir/lib" ]; then export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH" fi done You won't need to install cuda or cudnn on the system. only the cuda libraries that are installed with $ pip install tensorflow[and-cuda] would be enough. On Mon, Mar 11, 2024, 7:04 a.m. Juan E. Vargas ***@***.***> wrote:

…

Hi Krzysztof I visited the site https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- where I found an entry listed as " Local Installer for UBuntu22.04 x86_64(Deb)" which I downloaded. Unfortunately what I got is a package named "cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb" which is not the same as the name you suggest in your message, which is " libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb" I assume what you meant is to get the libcudnn8_8.9.7.29*amd64.deb and the cuda12.2_amd64.deb separately and install both. I have CUDA 12.4. I will not go back to trying to make TF 2.16.1 work with older versions of CUDA (12.2 or 12.3) because sooner or later the TF team will have to produce a version with the updated version of CUDA. IMHO, rather than us wasting time going back in versions, the TF beak should invest time going forward to update TF to the current CUDA version. Thank you, Juan On Mon, Mar 11, 2024 at 5:30 AM Krzysztof Radzikowski < ***@***.***> wrote: > got it work :) first > > https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- > > then download Local Installer for Ubuntu22.04 x86_64 (Deb) > < https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb/> > > unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb > > ` > > sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb > Selecting previously unselected package libcudnn8. > (Reading database ... 47318 files and directories currently installed.) > Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ... > Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ... > Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ... > > ` > > python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" > > > 2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. > 2024-03-11 10:27:47.909157: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. > To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. > 2024-03-11 10:27:48.316717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT > 2024-03-11 10:27:48.664469: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node > Your kernel may have been built without NUMA support. > 2024-03-11 10:27:48.688059: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node > Your kernel may have been built without NUMA support. > 2024-03-11 10:27:48.688111: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node > Your kernel may have been built without NUMA support. > [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] > > > — > Reply to this email directly, view it on GitHub > < #63362 (comment)>, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AAGK34PV2IIF5FUZ73EPKOTYXV2SZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXHE3TANRRGU> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> > — Reply to this email directly, view it on GitHub <#63362 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AZRPJAGFJU5ZGBGHOSUK6DTYXWTU3AVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGM4TINRSGQ> . You are receiving this because you commented.Message ID: ***@***.***>

JuanVargas · 2024-03-11T15:01:32Z

will try that and will let you know. Thank you for the suggestion. Juan On Mon, Mar 11, 2024 at 10:52 AM Shayan Shahrokhi ***@***.***> wrote:

…

It's just tensorflow can't see the Cuda libraries. Instal tensorflow[and-cuda] and add this to your .bashrc or conda activation script NVIDIA_PACKAGE_DIR="$CONDA_PREFIX/lib/python3.12/site-packages/nvidia" for dir in $NVIDIA_PACKAGE_DIR/*; do if [ -d "$dir/lib" ]; then export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH" fi done On Mon, Mar 11, 2024, 7:04 a.m. Juan E. Vargas ***@***.***> wrote: > Hi Krzysztof > > I visited the site > > https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- > > where I found an entry listed as " Local Installer for UBuntu22.04 > x86_64(Deb)" which I downloaded. > Unfortunately what I got is a package named > "cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb" > which is not the same as the name you suggest in your message, which is " > libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb" > > I assume what you meant is to get the libcudnn8_8.9.7.29*amd64.deb and > the cuda12.2_amd64.deb separately and install both. > > I have CUDA 12.4. I will not go back to trying to make TF 2.16.1 work with > older versions of CUDA (12.2 or 12.3) because sooner or later > the TF team will have to produce a version with the updated version of > CUDA. IMHO, rather than us wasting time going back in versions, > the TF beak should invest time going forward to update TF to the current > CUDA version. > > Thank you, Juan > > > On Mon, Mar 11, 2024 at 5:30 AM Krzysztof Radzikowski < > ***@***.***> wrote: > > > got it work :) first > > > > > https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- > > > > then download Local Installer for Ubuntu22.04 x86_64 (Deb) > > < > https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb/> > > > > > unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb > > > > ` > > > > sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb > > Selecting previously unselected package libcudnn8. > > (Reading database ... 47318 files and directories currently installed.) > > Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ... > > Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ... > > Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ... > > > > ` > > > > python3 -c "import tensorflow as tf; > print(tf.config.list_physical_devices('GPU'))" > > > > > > 2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] oneDNN > custom operations are on. You may see slightly different numerical results > due to floating-point round-off errors from different computation orders. > To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. > > 2024-03-11 10:27:47.909157: I > tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary > is optimized to use available CPU instructions in performance-critical > operations. > > To enable the following instructions: AVX2 AVX_VNNI FMA, in other > operations, rebuild TensorFlow with the appropriate compiler flags. > > 2024-03-11 10:27:48.316717: W > tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could > not find TensorRT > > 2024-03-11 10:27:48.664469: I > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not > open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node > > Your kernel may have been built without NUMA support. > > 2024-03-11 10:27:48.688059: I > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not > open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node > > Your kernel may have been built without NUMA support. > > 2024-03-11 10:27:48.688111: I > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not > open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node > > Your kernel may have been built without NUMA support. > > [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] > > > > > > — > > Reply to this email directly, view it on GitHub > > < > #63362 (comment)>, > > > or unsubscribe > > < > https://github.com/notifications/unsubscribe-auth/AAGK34PV2IIF5FUZ73EPKOTYXV2SZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXHE3TANRRGU> > > > . > > You are receiving this because you were mentioned.Message ID: > > ***@***.***> > > > > — > Reply to this email directly, view it on GitHub > < #63362 (comment)>, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AZRPJAGFJU5ZGBGHOSUK6DTYXWTU3AVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGM4TINRSGQ> > . > You are receiving this because you commented.Message ID: > ***@***.***> > — Reply to this email directly, view it on GitHub <#63362 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGK34PLSBZFKQ5AQKIZ7X3YXXAMRAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGYZTAMRSGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

JuanVargas · 2024-03-11T15:29:34Z

Hi Shayan Shahrokhi Thank you for your suggestion (adding the location of the site-packages. I hope you would not mind if I ask : I saw that in your suggestion the name python 3.12 is listed. Is that the version of python that you used to test TF 2.16.1 compatibility with CUDA? Thank you, Juan

…

On Mon, Mar 11, 2024 at 11:01 AM Juan Vargas ***@***.***> wrote: will try that and will let you know. Thank you for the suggestion. Juan On Mon, Mar 11, 2024 at 10:52 AM Shayan Shahrokhi < ***@***.***> wrote: > It's just tensorflow can't see the Cuda libraries. > > Instal tensorflow[and-cuda] and add this to your .bashrc or conda > activation script > > NVIDIA_PACKAGE_DIR="$CONDA_PREFIX/lib/python3.12/site-packages/nvidia" > > for dir in $NVIDIA_PACKAGE_DIR/*; do > if [ -d "$dir/lib" ]; then > export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH" > fi > done > > > > On Mon, Mar 11, 2024, 7:04 a.m. Juan E. Vargas ***@***.***> > wrote: > > > Hi Krzysztof > > > > I visited the site > > > > > https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- > > > > where I found an entry listed as " Local Installer for UBuntu22.04 > > x86_64(Deb)" which I downloaded. > > Unfortunately what I got is a package named > > "cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb" > > which is not the same as the name you suggest in your message, which is > " > > libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb" > > > > I assume what you meant is to get the libcudnn8_8.9.7.29*amd64.deb and > > the cuda12.2_amd64.deb separately and install both. > > > > I have CUDA 12.4. I will not go back to trying to make TF 2.16.1 work > with > > older versions of CUDA (12.2 or 12.3) because sooner or later > > the TF team will have to produce a version with the updated version of > > CUDA. IMHO, rather than us wasting time going back in versions, > > the TF beak should invest time going forward to update TF to the > current > > CUDA version. > > > > Thank you, Juan > > > > > > On Mon, Mar 11, 2024 at 5:30 AM Krzysztof Radzikowski < > > ***@***.***> wrote: > > > > > got it work :) first > > > > > > > > > https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- > > > > > > then download Local Installer for Ubuntu22.04 x86_64 (Deb) > > > < > > > https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb/> > > > > > > > > > unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb > > > > > > ` > > > > > > sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb > > > Selecting previously unselected package libcudnn8. > > > (Reading database ... 47318 files and directories currently > installed.) > > > Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ... > > > Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ... > > > Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ... > > > > > > ` > > > > > > python3 -c "import tensorflow as tf; > > print(tf.config.list_physical_devices('GPU'))" > > > > > > > > > 2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] > oneDNN > > custom operations are on. You may see slightly different numerical > results > > due to floating-point round-off errors from different computation > orders. > > To turn them off, set the environment variable > `TF_ENABLE_ONEDNN_OPTS=0`. > > > 2024-03-11 10:27:47.909157: I > > tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow > binary > > is optimized to use available CPU instructions in performance-critical > > operations. > > > To enable the following instructions: AVX2 AVX_VNNI FMA, in other > > operations, rebuild TensorFlow with the appropriate compiler flags. > > > 2024-03-11 10:27:48.316717: W > > tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: > Could > > not find TensorRT > > > 2024-03-11 10:27:48.664469: I > > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could > not > > open file to read NUMA node: > /sys/bus/pci/devices/0000:01:00.0/numa_node > > > Your kernel may have been built without NUMA support. > > > 2024-03-11 10:27:48.688059: I > > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could > not > > open file to read NUMA node: > /sys/bus/pci/devices/0000:01:00.0/numa_node > > > Your kernel may have been built without NUMA support. > > > 2024-03-11 10:27:48.688111: I > > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could > not > > open file to read NUMA node: > /sys/bus/pci/devices/0000:01:00.0/numa_node > > > Your kernel may have been built without NUMA support. > > > [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] > > > > > > > > > — > > > Reply to this email directly, view it on GitHub > > > < > > > #63362 (comment)>, > > > > > > or unsubscribe > > > < > > > https://github.com/notifications/unsubscribe-auth/AAGK34PV2IIF5FUZ73EPKOTYXV2SZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXHE3TANRRGU> > > > > > > . > > > You are receiving this because you were mentioned.Message ID: > > > ***@***.***> > > > > > > > — > > Reply to this email directly, view it on GitHub > > < > #63362 (comment)>, > > > or unsubscribe > > < > https://github.com/notifications/unsubscribe-auth/AZRPJAGFJU5ZGBGHOSUK6DTYXWTU3AVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGM4TINRSGQ> > > > . > > You are receiving this because you commented.Message ID: > > ***@***.***> > > > > — > Reply to this email directly, view it on GitHub > <#63362 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAGK34PLSBZFKQ5AQKIZ7X3YXXAMRAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGYZTAMRSGY> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

sh-shahrokhi · 2024-03-11T15:31:23Z

It's the python in the environment that I installed tensorflow[and-cuda] there. On Mon, Mar 11, 2024, 9:30 a.m. Juan E. Vargas ***@***.***> wrote:

…

Hi Shayan Shahrokhi Thank you for your suggestion (adding the location of the site-packages. I hope you would not mind if I ask : I saw that in your suggestion the name python 3.12 is listed. Is that the version of python that you used to test TF 2.16.1 compatibility with CUDA? Thank you, Juan On Mon, Mar 11, 2024 at 11:01 AM Juan Vargas ***@***.***> wrote: > will try that and will let you know. Thank you for the suggestion. Juan > > > On Mon, Mar 11, 2024 at 10:52 AM Shayan Shahrokhi < > ***@***.***> wrote: > >> It's just tensorflow can't see the Cuda libraries. >> >> Instal tensorflow[and-cuda] and add this to your .bashrc or conda >> activation script >> >> NVIDIA_PACKAGE_DIR="$CONDA_PREFIX/lib/python3.12/site-packages/nvidia" >> >> for dir in $NVIDIA_PACKAGE_DIR/*; do >> if [ -d "$dir/lib" ]; then >> export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH" >> fi >> done >> >> >> >> On Mon, Mar 11, 2024, 7:04 a.m. Juan E. Vargas ***@***.***> >> wrote: >> >> > Hi Krzysztof >> > >> > I visited the site >> > >> > >> https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- >> > >> > where I found an entry listed as " Local Installer for UBuntu22.04 >> > x86_64(Deb)" which I downloaded. >> > Unfortunately what I got is a package named >> > "cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb" >> > which is not the same as the name you suggest in your message, which is >> " >> > libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb" >> > >> > I assume what you meant is to get the libcudnn8_8.9.7.29*amd64.deb and >> > the cuda12.2_amd64.deb separately and install both. >> > >> > I have CUDA 12.4. I will not go back to trying to make TF 2.16.1 work >> with >> > older versions of CUDA (12.2 or 12.3) because sooner or later >> > the TF team will have to produce a version with the updated version of >> > CUDA. IMHO, rather than us wasting time going back in versions, >> > the TF beak should invest time going forward to update TF to the >> current >> > CUDA version. >> > >> > Thank you, Juan >> > >> > >> > On Mon, Mar 11, 2024 at 5:30 AM Krzysztof Radzikowski < >> > ***@***.***> wrote: >> > >> > > got it work :) first >> > > >> > > >> > >> https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89-------------------------------- >> > > >> > > then download Local Installer for Ubuntu22.04 x86_64 (Deb) >> > > < >> > >> https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb/> >> >> > >> > > >> > > unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb >> > > >> > > ` >> > > >> > > sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb >> > > Selecting previously unselected package libcudnn8. >> > > (Reading database ... 47318 files and directories currently >> installed.) >> > > Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ... >> > > Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ... >> > > Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ... >> > > >> > > ` >> > > >> > > python3 -c "import tensorflow as tf; >> > print(tf.config.list_physical_devices('GPU'))" >> > > >> > > >> > > 2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] >> oneDNN >> > custom operations are on. You may see slightly different numerical >> results >> > due to floating-point round-off errors from different computation >> orders. >> > To turn them off, set the environment variable >> `TF_ENABLE_ONEDNN_OPTS=0`. >> > > 2024-03-11 10:27:47.909157: I >> > tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow >> binary >> > is optimized to use available CPU instructions in performance-critical >> > operations. >> > > To enable the following instructions: AVX2 AVX_VNNI FMA, in other >> > operations, rebuild TensorFlow with the appropriate compiler flags. >> > > 2024-03-11 10:27:48.316717: W >> > tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: >> Could >> > not find TensorRT >> > > 2024-03-11 10:27:48.664469: I >> > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could >> not >> > open file to read NUMA node: >> /sys/bus/pci/devices/0000:01:00.0/numa_node >> > > Your kernel may have been built without NUMA support. >> > > 2024-03-11 10:27:48.688059: I >> > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could >> not >> > open file to read NUMA node: >> /sys/bus/pci/devices/0000:01:00.0/numa_node >> > > Your kernel may have been built without NUMA support. >> > > 2024-03-11 10:27:48.688111: I >> > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could >> not >> > open file to read NUMA node: >> /sys/bus/pci/devices/0000:01:00.0/numa_node >> > > Your kernel may have been built without NUMA support. >> > > [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] >> > > >> > > >> > > — >> > > Reply to this email directly, view it on GitHub >> > > < >> > >> #63362 (comment)>, >> >> > >> > > or unsubscribe >> > > < >> > >> https://github.com/notifications/unsubscribe-auth/AAGK34PV2IIF5FUZ73EPKOTYXV2SZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXHE3TANRRGU> >> >> > >> > > . >> > > You are receiving this because you were mentioned.Message ID: >> > > ***@***.***> >> > > >> > >> > — >> > Reply to this email directly, view it on GitHub >> > < >> #63362 (comment)>, >> >> > or unsubscribe >> > < >> https://github.com/notifications/unsubscribe-auth/AZRPJAGFJU5ZGBGHOSUK6DTYXWTU3AVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGM4TINRSGQ> >> >> > . >> > You are receiving this because you commented.Message ID: >> > ***@***.***> >> > >> >> — >> Reply to this email directly, view it on GitHub >> < #63362 (comment)>, >> or unsubscribe >> < https://github.com/notifications/unsubscribe-auth/AAGK34PLSBZFKQ5AQKIZ7X3YXXAMRAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGYZTAMRSGY> >> . >> You are receiving this because you were mentioned.Message ID: >> ***@***.***> >> > — Reply to this email directly, view it on GitHub <#63362 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AZRPJACFR4YH7UCXMOEH7MTYXXEXVAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYG4YTQMZVGA> . You are receiving this because you commented.Message ID: ***@***.***>

damadorPL · 2024-03-11T15:38:17Z

https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ you can get .deb file there directrly

Gwyki · 2024-03-11T16:15:36Z

Thanks @sh-shahrokhi. I thought it was path related. Modified slightly to make it python version independent if you put it in your conda environment activation ([environment]/etc/activate.d/env_vars.sh).

NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
for dir in $NVIDIA_DIR/*; do
    if [ -d "$dir/lib" ]; then
        export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"
    fi
done

This is not a resolution as this post install step should not be necessary.

W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

I can't seem to do similar tricks to resolve the TensorRT issues when installed similarly into the conda environment. Any ideas?

sh-shahrokhi · 2024-03-11T23:50:00Z

Thanks @sh-shahrokhi. I thought it was path related. Modified slightly to make it python version independent if you put it in your conda environment activation ([environment]/etc/activate.d/env_vars.sh).
NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
for dir in $NVIDIA_DIR/*; do
    if [ -d "$dir/lib" ]; then
        export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"
    fi
done
This is not a resolution as this post install step should not be necessary.

W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

I can't seem to do similar tricks to resolve the TensorRT issues when installed similarly into the conda environment. Any ideas?

I don't actually use TensorRT, but I would check if the required .so file for it is visible to tensorflow. Maybe I would need to find the name of required file in tensorflow source code.

This actually doesn't change the fact that the new tensorflow version should be tested by google team before release, or the bugs should be fixed. It seems they only care about having a working docker image, not anything else.

Gwyki · 2024-03-12T00:48:53Z

I have given up on TensorRT. I guess I won't be using it either.

This actually doesn't change the fact that the new tensorflow version should be tested by google team before release, or the bugs should be fixed. It seems they only care about having a working docker image, not anything else.

Agreed. Installing TF has always been hit or miss and it seems that in the many years since I last used TF that hasn't changed one bit.

moozoo64 · 2024-03-12T00:50:13Z

Well, I wasted 8hr of my Sunday on this setting up another pc from scratch. Before reverting to the old version. Now looking to move off tensor flow.

mihaimaruseac · 2024-03-12T03:51:26Z

In general, we used to test RC versions before release. For example, we used to have RC0, RC1 and RC2 for TF 2.9. This gave people and downstream teams enough time to test and report issues.

It seems that 2.16.1 only had an RC0 (for 2.16.0).

The release process is (was?) like this:

cut the release branch (e.g., r2.17)
immediately trigger the release pipeline. This would create a few PRs to update version numbers, release notes, but after this step RC0 should be as close as possible to the version on master branch at the time the release branch has been cut. There should not be any code changes to the release branch at this point (except to maybe cherrypick fixes from master from hard bugs caused by cutting the branch at a wrong commit)
have at least a week of testing for downstream teams to test RC0
get fixes to discovered bugs landed on master, cherrypick them to release branch, after they are already tested on nightly releases
trigger RC1 pipeline. Again, no other code changes should occur now, except to fix bugs discovered during building
wait a week for downstream teams to test. If there are bugs, repeat the steps above for another RC, otherwise repeat the steps above for the final version.

Overall, this process would take number_of_RCs + 1 weeks with a possibility of a few more weeks of delay.

However, for 2.16 release, although the branch was cut on Feb 8th, there has been only one RC. Most likely issues can be solved by a patch release

google-ml-butler · 2024-03-12T13:02:08Z

Are you satisfied with the resolution of your issue?
Yes
No

JuanVargas · 2024-03-12T13:02:56Z

I am closing this (unresolved issue) because I am told by the Keras/TF team that the issue is related to TF.

eabase · 2024-06-14T01:21:27Z

Can someone care to explain why TF >2.10 cannot be run with GPU in native windows?
This totally makes no sense whatsoever, as all other HW, WSL, and Conda works with GPU. Including other python packages, such as Torch. So what is going on?

I.e. What is the problem and why is it not being addressed by the community?

sh-shahrokhi · 2024-06-14T02:26:42Z

Can someone care to explain why TF >2.10 cannot be run with GPU in native windows? This totally makes no sense whatsoever, as all other HW, WSL, and Conda works with GPU. Including other python packages, such as Torch. So what is going on?

I.e. What is the problem and why is it not being addressed by the community?

Google removed the native windows cuda build starting TF 2.11
There is nothing you can do about it, building from source with cuda will also fail in windows.

mihaimaruseac · 2024-06-14T19:08:55Z

Everyone that cared about full support of TF is no longer in the team. See above comments for more details and differences

eabase · 2024-06-15T00:12:30Z

@sh-shahrokhi

Google removed the native windows cuda build starting TF 2.11

Unfortunately that doesn't say anything. I don't see how you can "remove" any of that, apart from breaking the build scripts. Whatever you "remove" must still be present for all other nix builds. WSL is not that different from MSYS, MinGW, which (no longer) is too far from VS C/C++ builds.

sh-shahrokhi · 2024-06-15T03:16:55Z

@sh-shahrokhi

Google removed the native windows cuda build starting TF 2.11

Unfortunately that doesn't say anything. I don't see how you can "remove" any of that, apart from breaking the build scripts. Whatever you "remove" must still be present for all other nix builds. WSL is not that different from MSYS, MinGW, which (no longer) is too far from VS C/C++ builds.

#58629
Also:
#59918

ben-jy · 2024-06-17T11:33:14Z

Also kindly note that the current issue opened "TF 2.16.1 Fails to work with GPUs" involves Linux Operating Systems and potentially the additional steps to be specified in the official TensorFlow documentation in order to utilize GPUs locally.

I started a not very pleasant acquaintance with tensorflow with this version. As I understand it, the specific reason is 2.16.1 and it does not work in wsl. Because nothing worked for me. And the question is which version can be installed so that it works normally in wsl.
Also, for the future, I will say that installing anaconda does not help either. You can install a maximum of 2.10 version on it

@MrOxMasTer I totally understand your frustration but I reassure you that TensorFlow version 2.16.1 can actually work with your cuda-enabled GPU.

You can try the following:

Create a fresh conda virtual environment in WSL and activate it, like this:
conda create --name tf python=3.11
conda activate tf
Within the fresh conda virtual environment tf created in the previous step run the following commands sequentially:
pip install --upgrade pip
pip install tensorflow[and-cuda]
Set environment variables:

Note: This step is required in order to utilize your GPU but not yet included in the official TensorFlow documentation. All NVIDIA libs are installed with TensorFlow due to the fact you ran the command pip install tensorflow[and-cuda] in the previous step!

Locate the directory for the conda environment in your terminal window by running in the terminal:

echo $CONDA_PREFIX

Enter that directory and create these subdirectories and files:
cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh
Edit ./etc/conda/activate.d/env_vars.sh as follows:
#!/bin/sh

# Store original LD_LIBRARY_PATH 
export ORIGINAL_LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" 

# Get the CUDNN directory 
CUDNN_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)")))

# Set LD_LIBRARY_PATH to include CUDNN directory
export LD_LIBRARY_PATH=$(find ${CUDNN_DIR}/*/lib/ -type d -printf "%p:")${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# Get the ptxas directory  
PTXAS_DIR=$(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)")))

# Set PATH to include the directory containing ptxas
export PATH=$(find ${PTXAS_DIR}/*/bin/ -type d -printf "%p:")${PATH:+:${PATH}}
Edit ./etc/conda/deactivate.d/env_vars.sh as follows:
#!/bin/sh

# Restore original LD_LIBRARY_PATH
export LD_LIBRARY_PATH="${ORIGINAL_LD_LIBRARY_PATH}"

# Unset environment variables
unset CUDNN_DIR
unset PTXAS_DIR
Verify the GPU setup: python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Additionally, as I was informed the next version of TensorFlow will hopefully arrive within the next days!

I hope it helps!

Doesn't work for me :/ I even reinstalled completely WSL, but I still get an empty list when showing the available devices... Should CUDA be unistalled on Windows side ? When I use "nvidia-smi", it is written that I have the 12.5 Cuda Version, even if I didn't install anything on WSL... Is that normal ?

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.52.01 Driver Version: 555.99 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+

sgkouzias · 2024-06-17T15:54:11Z

Also kindly note that the current issue opened "TF 2.16.1 Fails to work with GPUs" involves Linux Operating Systems and potentially the additional steps to be specified in the official TensorFlow documentation in order to utilize GPUs locally.

I started a not very pleasant acquaintance with tensorflow with this version. As I understand it, the specific reason is 2.16.1 and it does not work in wsl. Because nothing worked for me. And the question is which version can be installed so that it works normally in wsl.
Also, for the future, I will say that installing anaconda does not help either. You can install a maximum of 2.10 version on it

@MrOxMasTer I totally understand your frustration but I reassure you that TensorFlow version 2.16.1 can actually work with your cuda-enabled GPU.

You can try the following:

Create a fresh conda virtual environment in WSL and activate it, like this:
conda create --name tf python=3.11
conda activate tf
Within the fresh conda virtual environment tf created in the previous step run the following commands sequentially:
pip install --upgrade pip
pip install tensorflow[and-cuda]
Set environment variables:

Note: This step is required in order to utilize your GPU but not yet included in the official TensorFlow documentation. All NVIDIA libs are installed with TensorFlow due to the fact you ran the command pip install tensorflow[and-cuda] in the previous step!

Locate the directory for the conda environment in your terminal window by running in the terminal:

echo $CONDA_PREFIX

Enter that directory and create these subdirectories and files:
cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh
Edit ./etc/conda/activate.d/env_vars.sh as follows:
#!/bin/sh

# Store original LD_LIBRARY_PATH 
export ORIGINAL_LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" 

# Get the CUDNN directory 
CUDNN_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)")))

# Set LD_LIBRARY_PATH to include CUDNN directory
export LD_LIBRARY_PATH=$(find ${CUDNN_DIR}/*/lib/ -type d -printf "%p:")${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# Get the ptxas directory  
PTXAS_DIR=$(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)")))

# Set PATH to include the directory containing ptxas
export PATH=$(find ${PTXAS_DIR}/*/bin/ -type d -printf "%p:")${PATH:+:${PATH}}
Edit ./etc/conda/deactivate.d/env_vars.sh as follows:
#!/bin/sh

# Restore original LD_LIBRARY_PATH
export LD_LIBRARY_PATH="${ORIGINAL_LD_LIBRARY_PATH}"

# Unset environment variables
unset CUDNN_DIR
unset PTXAS_DIR
Verify the GPU setup: python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Additionally, as I was informed the next version of TensorFlow will hopefully arrive within the next days!

I hope it helps!
Doesn't work for me :/ I even reinstalled completely WSL, but I still get an empty list when showing the available devices... Should CUDA be unistalled on Windows side ? When I use "nvidia-smi", it is written that I have the 12.5 Cuda Version, even if I didn't install anything on WSL... Is that normal ?

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.52.01 Driver Version: 555.99 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+

@ben-jy frankly I have no clue. Did you check the official documentation? Your setup meets the technical requirements? What's the Python version in WSL2? Is it compatible with TensorFlow 2.16.1? What's the name of your NVIDIA GPU? The output of the command nvidia-smi in WSL2 seems normal since your GPU driver is installed in Windows. However you could try reinstalling everything (compatible GPU driver, afterwards WSL2 and then TensorFlow)...

sh-shahrokhi · 2024-06-17T16:46:53Z

Official documentation has not been updated fot TF 2.16 and still refers to cuda 11.8 not 12.

…

________________________________ From: Sotiris Gkouzias ***@***.***> Sent: Monday, June 17, 2024 9:54 AM To: tensorflow/tensorflow ***@***.***> Cc: Shayan Shahrokhi ***@***.***>; Mention ***@***.***> Subject: Re: [tensorflow/tensorflow] TF 2.16.1 Fails to work with GPUs (Issue #63362) Also kindly note that the current issue opened "TF 2.16.1 Fails to work with GPUs" involves Linux Operating Systems and potentially the additional steps to be specified in the official TensorFlow documentation in order to utilize GPUs locally. I started a not very pleasant acquaintance with tensorflow with this version. As I understand it, the specific reason is 2.16.1 and it does not work in wsl. Because nothing worked for me. And the question is which version can be installed so that it works normally in wsl. Also, for the future, I will say that installing anaconda does not help either. You can install a maximum of 2.10 version on it @MrOxMasTer<https://github.com/MrOxMasTer> I totally understand your frustration but I reassure you that TensorFlow version 2.16.1 can actually work with your cuda-enabled GPU. You can try the following: 1. Create a fresh conda virtual environment in WSL and activate it, like this: conda create --name tf python=3.11 conda activate tf 1. Within the fresh conda virtual environment tf created in the previous step run the following commands sequentially: pip install --upgrade pip pip install tensorflow[and-cuda] 1. Set environment variables: Note: This step is required in order to utilize your GPU but not yet included in the official TensorFlow documentation. All NVIDIA libs are installed with TensorFlow due to the fact you ran the command pip install tensorflow[and-cuda] in the previous step! Locate the directory for the conda environment in your terminal window by running in the terminal: echo $CONDA_PREFIX Enter that directory and create these subdirectories and files: cd $CONDA_PREFIX mkdir -p ./etc/conda/activate.d mkdir -p ./etc/conda/deactivate.d touch ./etc/conda/activate.d/env_vars.sh touch ./etc/conda/deactivate.d/env_vars.sh Edit ./etc/conda/activate.d/env_vars.sh as follows: #!/bin/sh # Store original LD_LIBRARY_PATH export ORIGINAL_LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" # Get the CUDNN directory CUDNN_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)"))) # Set LD_LIBRARY_PATH to include CUDNN directory export LD_LIBRARY_PATH=$(find ${CUDNN_DIR}/*/lib/ -type d -printf "%p:")${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} # Get the ptxas directory PTXAS_DIR=$(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)"))) # Set PATH to include the directory containing ptxas export PATH=$(find ${PTXAS_DIR}/*/bin/ -type d -printf "%p:")${PATH:+:${PATH}} Edit ./etc/conda/deactivate.d/env_vars.sh as follows: #!/bin/sh # Restore original LD_LIBRARY_PATH export LD_LIBRARY_PATH="${ORIGINAL_LD_LIBRARY_PATH}" # Unset environment variables unset CUDNN_DIR unset PTXAS_DIR Verify the GPU setup: python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" Additionally, as I was informed the next version of TensorFlow will hopefully arrive within the next days! I hope it helps! Doesn't work for me :/ I even reinstalled completely WSL, but I still get an empty list when showing the available devices... Should CUDA be unistalled on Windows side ? When I use "nvidia-smi", it is written that I have the 12.5 Cuda Version, even if I didn't install anything on WSL... Is that normal ? +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 555.52.01 Driver Version: 555.99 CUDA Version: 12.5 | |-----------------------------------------+------------------------+----------------------+ @ben-jy<https://github.com/ben-jy> frankly I have no clue. Did you check the official documentation<https://www.tensorflow.org/install/pip#windows-wsl2>? Your setup meets the technical requirements? What's the Python version in WSL2? Is it compatible with TensorFlow 2.16.1? What's the name of your NVIDIA GPU? The output of the command nvidia-smi in WSL2 seems normal since your GPU driver is installed in Windows. However you could try reinstalling everything (compatible GPU driver, afterwards WSL2 and then TensorFlow)... — Reply to this email directly, view it on GitHub<#63362 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AZRPJAD4P5CMSQLZQOQO5CTZH4BEJAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZTG43TKOBSGU>. You are receiving this because you were mentioned.Message ID: ***@***.***>

ben-jy · 2024-06-18T06:32:13Z

@sgkouzias I checked the official documentation, but I find it not very clear, and seems a bit contradictory: the software requirements state that CUDA and cuDNN should be installed on the machine, but the pip package should install them automatically with Tensorflow right ? Besides, this medium tutorial explain that CUDA should not be installed on Windows side, neither on WSL side, and be installed using the pip package. Maybe I should try to uninstall all CUDA-related on Windows...
Concerning your other questions:

I have an RTX 3070 Ti, which is in the list of CUDA-enabled product.
I use conda and I tried the install with Python 3.10 and 3.11, which are in the software requirements of the officiel documentation. Those versions are said compatible with TensorFlow 2.16.1, according to the PyPi package tags

I will try to make a clean reinstall of my GPU driver, as well as unistalling CUDA on Windows side. If it doesn't work, I think it is better to install CUDA and cuDNN manually, along with an oldest TensorFlow version. It is still a shame that the official documentation of such a large and important library is so unclear.

tilakrayal · 2024-06-19T07:51:06Z

@learning-to-play

mihaimaruseac · 2024-06-19T16:05:17Z

Can you test the 2.17.0 RC0, please? It is too late to update 2.16, but if 2.17 RC0 doesn't work, maybe there will be a chance to fix by RC1/final

sgkouzias · 2024-06-19T16:14:20Z

Can you test the 2.17.0 RC0, please? It is too late to update 2.16, but if 2.17 RC0 doesn't work, maybe there will be a chance to fix by RC1/final

@mihaimaruseac I just tested but unfortunately it has the same issue.

mihaimaruseac · 2024-06-19T16:20:43Z

@learning-to-play maybe this can get fixed before final release? TF does not work with GPUs, started failing since TF 2.16 release.

sgkouzias · 2024-06-19T16:35:09Z

A tested workaround to utilize GPU for Linux users:

Create a virtual environment with venv:
python3 -m venv tf
Activate the environment
source tf/bin/activate
Upgrade pip
pip install --upgrade pip
Install TensorFlow 2.17.0.rc0
pip install tensorflow[and-cuda]==2.17.0rc0
Create symbolic links to NVIDIA shared libraries

pushd $(dirname $(python -c 'print(__import__("tensorflow").__file__)'))
ln -svf ../nvidia/*/lib/*.so* .
popd

Verify installation
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

I have created a respective pull request still pending review in good faith and for the shake of all users as TensorFlow is "An Open Source Machine Learning Framework for Everyone".

learning-to-play · 2024-06-19T18:08:58Z

@SeeForTwo @poulsbo Could you please take a look? If there is a fix that needs to be cherry picked to 2.16.2 or 2.17.0, please follow these steps:

Submit a fix to TensorFlow HEAD
Ensure nightly builds are green.
Create a cherry pick PR to the corresponding release branches r2.16 and r2.17 and assign to @rtg0795

@sgkouzias Does this issue happen for both TF 2.16.1 and 2.17.0rc0?

sgkouzias · 2024-06-19T18:45:20Z

@SeeForTwo @poulsbo Could you please take a look? If there is a fix that needs to be cherry picked to 2.16.2 or 2.17.0, please follow these steps:

Submit a fix to TensorFlow HEAD

Ensure nightly builds are green.

Create a cherry pick PR to the corresponding release branches r2.16 and r2.17 and assign to @rtg0795

@sgkouzias Does this issue happen for both TF 2.16.1 and 2.17.0rc0?

@learning-to-play yes indeed. The only difference is that on version 2.17.0.rc0 you only need the symlinks to NVIDIA libs in order to utilize GPUs while on version 2.16.1 you should in addition to creating symlinks to NVIDIA libs create symlink to ptxas.

…_deps`. Should fix #63362 Reverts changelist 582804278 PiperOrigin-RevId: 646146985

…_deps`. Should fix #63362 Reverts changelist 582804278 PiperOrigin-RevId: 646182849

google-ml-butler bot added the type:bug Bug label Mar 10, 2024

google-ml-butler bot assigned SuryanarayanaY Mar 10, 2024

SuryanarayanaY added comp:gpu GPU related issues TF 2.16 labels Mar 11, 2024

SuryanarayanaY added the stat:awaiting response Status - Awaiting response from author label Mar 11, 2024

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Mar 11, 2024

SuryanarayanaY mentioned this issue Mar 12, 2024

once gain: tf.2.16.1 fails to recognize GPUs keras-team/keras#19276

Closed

JuanVargas closed this as completed Mar 12, 2024

mihaimaruseac mentioned this issue Jun 12, 2024

TensorFlow 2.16 / Keras 3 have undocumented breaking API changes #63792

Open

eabase mentioned this issue Jun 14, 2024

What is preventing TF to use GPU when used in native windows? #69750

Open

mihaimaruseac changed the title ~~TF 2.16.1 Fails to work with GPUs~~ TF 2.17.0 RC0 Fails to work with GPUs (and TF 2.16 too) Jun 19, 2024

Kylerupinski mentioned this issue Jun 23, 2024

Solution for "Cannot dlopen some GPU libraries." occurring at Step 13 KSerditov/WSL2-Tensorflow-GPU#1

Open

copybara-service bot pushed a commit that referenced this issue Jun 24, 2024

Add back xla/stream_executor:cuda_platform to `tf_additional_binary…

84938b1

…_deps`. Should fix #63362 Reverts changelist 582804278 PiperOrigin-RevId: 646146985

copybara-service bot mentioned this issue Jun 24, 2024

Add back xla/stream_executor:cuda_platform to tf_additional_binary_deps. #70293

Merged

copybara-service bot pushed a commit that referenced this issue Jun 24, 2024

Add back xla/stream_executor:cuda_platform to `tf_additional_binary…

8c976db

…_deps`. Should fix #63362 Reverts changelist 582804278 PiperOrigin-RevId: 646146985

copybara-service bot closed this as completed in b24db0b Jun 24, 2024

belitskiy reopened this Jun 24, 2024

tensorflow deleted a comment from google-ml-butler bot Jun 24, 2024

learning-to-play added the 2.17 Issues related to 2.17 release label Jun 25, 2024

tensorflow-jenkins pushed a commit that referenced this issue Jun 25, 2024

Add back xla/stream_executor:cuda_platform to `tf_additional_binary…

eeee7f8

…_deps`. Should fix #63362 Reverts changelist 582804278 PiperOrigin-RevId: 646182849

tensorflow-jenkins pushed a commit that referenced this issue Jun 25, 2024

Add back xla/stream_executor:cuda_platform to `tf_additional_binary…

ffca2f5

…_deps`. Should fix #63362 Reverts changelist 582804278 PiperOrigin-RevId: 646182849

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TF 2.17.0 RC0 Fails to work with GPUs (and TF 2.16 too) #63362

TF 2.17.0 RC0 Fails to work with GPUs (and TF 2.16 too) #63362

TF 2.17.0 RC0 Fails to work with GPUs (and TF 2.16 too) #63362

TF 2.17.0 RC0 Fails to work with GPUs (and TF 2.16 too) #63362

Comments

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output