[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL2 - TensorFlow Install Issue Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered #63109

Open
vatsalraicha opened this issue Mar 4, 2024 · 26 comments
Assignees
Labels
comp:gpu GPU related issues subtype:windows Windows Build/Installation Issues TF 2.15 For issues related to 2.15.x

Comments

@vatsalraicha
Copy link

Facing this error message when trying to use tensorflow on WSL2 Ubuntu - Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered.
TF-TRT Warning: Could not find TensorRT
could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.

Environment
TensorRT Version: 8.6.1.6
GPU Type: 4080 Laptop GPU
Nvidia Driver Version: NVIDIA-SMI 546.17 Driver Version: 546.17
CUDA Version: 12.3, CUDA Toolkit Version - 11.8
CUDNN Version: v8.6
Operating System + Version: Windows 11
Python Version (if applicable): Python 3.10.13
TensorFlow Version (if applicable): 2.15.0
PyTorch Version (if applicable): N/A
Baremetal or Container (if container which image + tag): N/A
pyenv Version - 2.3.35-11-g9908daf8
WSL version: 2.0.14.0
Kernel version: 5.15.133.1-1

tensorFlowErrors.txt

Steps to reproduce -
Followed the exact steps as mentioned in https://www.tensorflow.org/install/pip?hl=pt

@Venkat6871 Venkat6871 added TF 2.15 For issues related to 2.15.x subtype:windows Windows Build/Installation Issues comp:gpu GPU related issues labels Mar 5, 2024
@Venkat6871
Copy link
Contributor

Hi @vatsalraicha ,

Ensure Correct Installation of CUDA, cuDNN, and TensorRT:
CUDA and cuDNN: Make sure that CUDA and cuDNN are correctly installed and that TensorFlow can detect them. You have mentioned using CUDA 12.3 and cuDNN v8.6, which should be compatible with TensorFlow 2.15.0. Ensure that the CUDA and cuDNN paths are correctly added to your PATH and LD_LIBRARY_PATH environment variables.

export PATH=/usr/local/cuda-12.3/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64:$LD_LIBRARY_PATH

Reinstall TensorFlow:
Consider creating a fresh virtual environment and reinstalling TensorFlow within it. This can help resolve any conflicts or issues with previous installations:

python -m venv tf-venv
source tf-venv/bin/activate
pip install --upgrade pip
pip install tensorflow==2.15.0

Test Your Setup:
After ensuring all configurations and installations are correct, test your TensorFlow setup to see if it can access the GPU:

import tensorflow as tf

print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Thank you!

@Venkat6871 Venkat6871 added the stat:awaiting response Status - Awaiting response from author label Mar 5, 2024
@petervaneijk
Copy link

Same error.

WSL version: 2.0.14.0
Kernel version: 5.15.133.1-1
WSLg version: 1.0.59
MSRDC version: 1.2.4677
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22631.3235

NVIDIA GeForce RTX 4050
Driver Version: 551.61 CUDA Version: 12.4
CUDA sample functions run without errors

python 3.10.12
cuda_12.3.2_545.23.08
tensorrt 8.6.1

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import tensorrt
print(tensorrt.version)
8.6.1
import tensorflow as tf
2024-03-05 16:13:59.938649: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-03-05 16:13:59.960210: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-05 16:13:59.960255: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-05 16:13:59.960774: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-05 16:13:59.964109: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-05 16:14:00.340405: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
print(tf.config.list_physical_devices('GPU'))
2024-03-05 16:14:19.162569: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-05 16:14:19.166217: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-05 16:14:19.166261: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
Num GPUs Available: 1

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Mar 5, 2024
@vatsalraicha
Copy link
Author

@Venkat6871 On WSL2, I dont have CUDA 12.3 director in /usr/local.
So this path is invalid for me - /usr/local/cuda-12.3/
Now as per NVIDIA documentation, I don't need to install any drivers on WSL2 and it should automatically pick from native Windows.
What could be the next step for me here?

@Amit30swgoh
Copy link

Same problem

@jomyp220056cs
Copy link

Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered google colab error ,how to resolve it

@PUNKDONG
Copy link

same
2024-04-26 16:37:38.371348: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-26 16:37:38.371404: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-26 16:37:38.371421: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

@garryyan2
Copy link

I have the same problem. I am using WSL2.
2024-04-25 16:44:46.357075: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-25 16:44:46.357110: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-25 16:44:46.357608: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

@garryyan2
Copy link

After several days struggle, I am able to get rid of this errors.
It's a version compatibility issue. You need to install the package versions according to the webpage.
I was struggled on installing specific CUDA toolkit versions through deb packages, because the version was always updated to the latest version 12.4. In the end, I used the runfiles to install the toolkit.
The other problem I had was that I started with tensorflow 2.15.1, which I could not get it to work.
The versions that work for me are:
CUDA toolkit: 12.2.
cudnn: 8.9.7.29
Tensorflow: 2.16.1

@vatsalraicha
Copy link
Author

@garryyan2 Did you achieve this on WSL2 or on Native Linux?
What was your CUDA version?
Would you be able to share the Version info that you have setup that works?
TensorRT Version:
Nvidia Driver Version:
NVIDIA-SMI Driver Version: 546.17
CUDA Version:
CUDNN Version:
Python Version (if applicable):
WSL version:

@garryyan2
Copy link

@vatsalraicha I am using WSL2.

$nvcc --version
has output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0

$nvidia-smi
has output
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.76.01 Driver Version: 552.22 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
notice that the CUDA version is different from using command $nvcc --version.

cudnn version: 8.9.7.29
Python version: Python 3.10.12
WSL version: 5.15.146.1-microsoft-standard-WSL2

I am not sure about the TensorRT version.

@gurzelai
Copy link
gurzelai commented May 16, 2024

same 2024-04-26 16:37:38.371348: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-04-26 16:37:38.371404: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-04-26 16:37:38.371421: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

how did you solve it? I have the same problem in Colab

@garryyan2
Copy link

@gurzelai, for me, I just tried difference version combinations of the tools/packages. I am not not familiar with Colab. You probably can't have the software versions of your choice.

@lapitskiy
Copy link

for me work
tesla p100, x9dri, ubuntu 22.04

conda
python 3.10

sudo mkdir -p /usr/lib/xorg/modules
sudo apt-get update
sudo apt-get install pkg-config xorg-dev
sudo apt install libvulkan1
sudo apt install dkms

nvidia driver 535.183.06 fo cuda 12.2

CUDA Toolkit 12.3 runfile local

cudnnv8.9.7.29 tar file (instruction)

tensorflow 2.16.1 for gpu for python 3.10 whl (https://storage.googleapis.com/tensorflow/versions/2.16.1/tensorflow-2.16.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl)

echo '/usr/local/cuda-12.3/lib64' | sudo tee -a /etc/ld.so.conf.d/cuda.conf
sudo ldconfig

.bashrc
export PATH=/usr/local/cuda-12.3/bin:$PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-12.3/lib64:/usr/local/cuda-12.3/extras/CUPTI/lib64
export CUDA_HOME=/usr/local/cuda-12.3

sudo reboot

@pak-app
Copy link
pak-app commented Jul 29, 2024

My algorithm uses Pytorch and gets the same error. Is there anybody to solve that?

@dlin95123
Copy link

I have the same error messages with Docker Hub's Tensorflow image. Specifically, I issued the following command to start a Jupyter notebook with the latest Tensorflow: docker run -it --rm -p 8888:8888 --gpus all tensorflow/tensorflow:latest-gpu-jupyter

In the Jupyter notebook, I just did "import tensorflow as tf". Then I see these error messages. So apparently, it is not my local configuration issue as a Docker image should be self contained.

@hanjifeng
Copy link

Same issue found for me.
CUDA12.3, CUDNN 8.9.
change to tensorflow 2.16.1, seems working

@BenIlias
Copy link

After several days struggle, I am able to get rid of this errors. It's a version compatibility issue. You need to install the package versions according to the webpage. I was struggled on installing specific CUDA toolkit versions through deb packages, because the version was always updated to the latest version 12.4. In the end, I used the runfiles to install the toolkit. The other problem I had was that I started with tensorflow 2.15.1, which I could not get it to work. The versions that work for me are: CUDA toolkit: 12.2. cudnn: 8.9.7.29 Tensorflow: 2.16.1

Could you please do me a favour and convert me a model ? i need it urgently if possible please.

@BenIlias
Copy link

Same issue found for me. CUDA12.3, CUDNN 8.9. change to tensorflow 2.16.1, seems working

Could you please do me a favour and convert me a model ? i need it urgently if possible please.

1 similar comment
@BenIlias
Copy link

Same issue found for me. CUDA12.3, CUDNN 8.9. change to tensorflow 2.16.1, seems working

Could you please do me a favour and convert me a model ? i need it urgently if possible please.

@BenIlias
Copy link

@BenIlias Hello, what do you mean by converting you a model? Do you just need a simple example with a model? I have one in #67033 .

First thanks for your reply.
I mean you fixed this error, you are able to use tensorflow with TensorRT correctly.
I have python tensorflow model and want to convert it to JavaScript model.
Could you please help me? I need it urgently for the university tonight . Thank you!

@BenIlias
Copy link

@BenIlias Sorry, I don't know JavaScript coding.

No, it's not coding at all, only python script run it to convert the model and that's all.

@BenIlias
Copy link

@BenIlias Sorry, I don't know JavaScript coding.

One line of python code and that's set

@garryyan2
Copy link

@BenIlias You may want to try google's colab to run it.

@BenIlias
Copy link

@BenIlias You may want to try google's colab to run it.

I tried but same error 😔 and today is the deadline 😢.
I have been trying colab for few days but doesn't work.

@BenIlias
Copy link

@BenIlias You may want to try google's colab to run it.

If you could do me the favour, i would never forget it :)

@vatsalraicha
Copy link
Author
vatsalraicha commented Sep 22, 2024

No one - Whether from NVIDIA or TensorRT team really cares about this issue. Their documentation is terrible. Leaving Non-enterprise developers in lurch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gpu GPU related issues subtype:windows Windows Build/Installation Issues TF 2.15 For issues related to 2.15.x
Projects
None yet
Development

No branches or pull requests