[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorrt==8.5.3.1 from [and-cuda] not available in Python 3.11 #61986

Open
jonas-eschle opened this issue Sep 27, 2023 · 26 comments
Open

tensorrt==8.5.3.1 from [and-cuda] not available in Python 3.11 #61986

jonas-eschle opened this issue Sep 27, 2023 · 26 comments
Assignees
Labels
comp:gpu:tensorrt Issues specific to TensorRT stat:awaiting tensorflower Status - Awaiting response from tensorflower subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF2.14 For issues related to Tensorflow 2.14.x type:build/install Build and install issues

Comments

@jonas-eschle
Copy link
Contributor
jonas-eschle commented Sep 27, 2023

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

binary

TensorFlow version

2.14.0

Custom code

No

OS platform and distribution

Ubuntu 22.04

Python version

3.11

Current behavior?

tensorrt==8.5.3.1, a pinned dependency in TensorFlow[and-cuda], is only available up to Python 3.10 and therefore fails installing with Python 3.11

Standalone code to reproduce the issue

Installation issue
`pip install "tensorflow[and-cuda]>=2.14.0"` with Python 3.11+

Relevant log output

ERROR: Could not find a version that satisfies the requirement tensorrt==8.5.3.1; extra == "and-cuda" (from tensorflow[and-cuda]) (from versions: 0.0.1.dev5, 0.0.1, 8.6.1, 8.6.1.post1, 9.0.0.post11.dev1, 9.0.0.post12.dev1, 9.0.1.post11.dev4, 9.0.1.post12.dev4)
ERROR: No matching distribution found for tensorrt==8.5.3.1; extra == "and-cuda"
@google-ml-butler google-ml-butler bot added the type:build/install Build and install issues label Sep 27, 2023
@SuryanarayanaY SuryanarayanaY added TF2.14 For issues related to Tensorflow 2.14.x subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues comp:gpu:tensorrt Issues specific to TensorRT labels Sep 27, 2023
@SuryanarayanaY
Copy link
Collaborator

Hi @jonas-eschle ,

Thanks for reporting this. I have replicated the reported error with Python3.11 where Tensorrt fails to install with tensorflow[and-cuda] as per attached gist-Py3.11v.

However with Python 3.10v the command works fine and installs tensorrt.Attached gist-Py3.10v for reference.

CC: @learning-to-play

@learning-to-play
Copy link
Collaborator

@SuryanarayanaY Please assign to the TensorFlow GPU team.

@picobyte
Copy link
picobyte commented Sep 30, 2023

When installing it lists Obtaining dependency information for tensorflow[and-cuda]==2.14.0 from <url>, download that, therein is listed what packages are installed for and-cuda

Currently it seems to be:

pip install tensorflow==2.14.0 nvidia-cuda-runtime-cu11==11.8.89 nvidia-cublas-cu11==11.11.3.6 nvidia-cufft-cu11==10.9.0.58 nvidia-cudnn-cu11==8.7.0.84 nvidia-curand-cu11==10.3.0.86 nvidia-cusolver-cu11==11.4.1.48 nvidia-cusparse-cu11==11.7.5.86 nvidia-nccl-cu11==2.16.5 nvidia-cuda-cupti-cu11==11.8.87 nvidia-cuda-nvcc-cu11==11.8.89

# this one seems to fail still:
pip install tensorrt==8.5.3.1

you may want to check version updates in the metadata

@jonas-eschle
Copy link
Contributor Author

pip install tensorrt==8.5.3.1

This is not available for Python 3.11

@picobyte
Copy link
picobyte commented Oct 3, 2023

I think the metadata dependencies source hint is only shown in a recent pip version (I see it after upgrading pip). Now python 3.10 and tensorflow[and-cuda]==2.13.1 also indicates there is no and-cuda extra; the rules for this extra are indeed removed/missing. here: and-cuda rules

@RocketRider
Copy link

I think the metadata dependencies source hint is only shown in a recent pip version (I see it after upgrading pip). Now python 3.10 and tensorflow[and-cuda]==2.13.1 also indicates there is no and-cuda extra; the rules for this extra are indeed removed/missing. But here: and-cuda rules

That option is only available in 2.14.

@itcarroll
Copy link

Can we see anywhere that the "TensorFlow GPU team" has acknowledged this issue? Can we expect 2.15.0 to fix?

@learning-to-play
Copy link
Collaborator

Hi @poulsbo , Could you please help triage this issue to the right person?

@stallam-unb
Copy link

Can we see anywhere that the "TensorFlow GPU team" has acknowledged this issue? Can we expect 2.15.0 to fix?

Doesn't seem like it, even though there have been multiple issues that have been raised that are related to this. I am a bit disappointed to see that it has been over a month since the release of TF2.14 and it is not possible to even install the latest version. I am not sure how it hasn't been caught in tests. This should be considered a high-priority issue, and we have yet to see any indication of an acknowledgement that the team is aware of the issue, or an estimated timeline for the issue to get fixed.

@poulsbo
Copy link
Collaborator
poulsbo commented Nov 2, 2023

@pjannaty is your team aware of this?

@pjannaty
Copy link
Contributor
pjannaty commented Nov 3, 2023

Routing to @cliffwoolley who has purview now.

@RocketRider
Copy link

The best workaround so far is to use "--extra-index-url https://pypi.nvidia.com".

@rstefek
Copy link
rstefek commented Nov 13, 2023

The best workaround so far is to use "--extra-index-url https://pypi.nvidia.com".

Currently even this does not work :(

pip install "tensorflow[and-cuda]>=2.14.0" --extra-index-url https://pypi.nvidia.com
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com
ERROR: Could not find a version that satisfies the requirement tensorflow>=2.14.0 (from versions: 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0rc0, 2.6.0rc1, 2.6.0rc2, 2.6.0, 2.6.1, 2.6.2, 2.6.3, 2.6.4, 2.6.5, 2.7.0rc0, 2.7.0rc1, 2.7.0, 2.7.1, 2.7.2, 2.7.3, 2.7.4, 2.8.0rc0, 2.8.0rc1, 2.8.0, 2.8.1, 2.8.2, 2.8.3, 2.8.4, 2.9.0rc0, 2.9.0rc1, 2.9.0rc2, 2.9.0, 2.9.1, 2.9.2, 2.9.3, 2.10.0rc0, 2.10.0rc1, 2.10.0rc2, 2.10.0rc3, 2.10.0, 2.10.1, 2.11.0rc0, 2.11.0rc1, 2.11.0rc2, 2.11.0, 2.11.1, 2.12.0rc0, 2.12.0rc1, 2.12.0, 2.12.1, 2.13.0rc0, 2.13.0rc1, 2.13.0rc2, 2.13.0, 2.13.1)
ERROR: No matching distribution found for tensorflow>=2.14.0

@cliffwoolley
Copy link
Contributor
cliffwoolley commented Nov 14, 2023

I confirmed with our TensorRT team that TRT 8.5 did not support Python 3.11. This was a matter of timing of release dates: TensorRT 8.5 was first released in early November 2022, and Python 3.11 had only been out for a few days by then, so it didn't get onto that TensorRT release's support matrix in time.

TRT 8.6 does support Python 3.11, and our TensorRT releases generally do maintain binary backward compatibility as per semantic versioning, though as it happens here the python packaging was redone between 8.5 and 8.6, so even if you could force pip to ignore the == dependency baked into TensorFlow (which I didn't find any particularly straightforward way to do), the binaries for TRT are installed to a different path with 8.6 than with 8.5, and the RUNPATH settings baked into libtensorflow_cc.so would need updating to use the 8.6 pip package.

TensorFlow does actually act appropriately if the TRT library isn't present at runtime, even though pip is treating it as a hard dependency. (There's no such thing in pip as an 'optional dependency'.) It's possible to make a dummy package that tricks pip into moving forward without installing TRT, and then you can run TF (just without TF-TRT). Is that useful?


Cliff Woolley
DL Frameworks Engineering, NVIDIA

@okurman
Copy link
okurman commented Nov 14, 2023

@cliffwoolley

It's possible to make a dummy package that tricks pip into moving forward without installing TRT, and then you can run TF (just without TF-TRT).

Would you mind showing how to do this? Thanks!

@stallam-unb
Copy link
stallam-unb commented Nov 14, 2023

As of the latest tensorflow release 2.15, I was able to successfully install it using:

python -m pip install "tensorflow[and-cuda]==2.15" --extra-index-url https://pypi.nvidia.com

I ran a quick test and so far, it seems like everything is working fine. I think that it would be nice to fix the package pinning so that the extra-index doesn't have to be used, but this is farther than I was able to get with TF2.14.

EDIT: This was tested on Python 3.11 under Linux. It appears that tensorrt was bumped to 8.6.x so it appears to work as intended.

@okurman
Copy link
okurman commented Nov 14, 2023

@stallam-unb this is great! Thanks! Do you know what CUDA/CuDNN versions does 2.15 use? I don't see an entry for it on the webpage https://www.tensorflow.org/install/source

@stallam-unb
Copy link
stallam-unb commented Nov 14, 2023

@stallam-unb this is great! Thanks! Do you know what CUDA/CuDNN versions does 2.15 use? I don't see an entry for it on the webpage https://www.tensorflow.org/install/source

@okurman I think it uses 12.2/8.9, seemingly updated from 11.x series CUDA (based on output from conda list):

...
nvidia-cublas-cu12        12.2.5.6                 pypi_0    pypi
nvidia-cuda-cupti-cu12    12.2.142                 pypi_0    pypi
nvidia-cuda-nvcc-cu12     12.2.140                 pypi_0    pypi
nvidia-cuda-nvrtc-cu12    12.2.140                 pypi_0    pypi
nvidia-cuda-runtime-cu12  12.2.140                 pypi_0    pypi
nvidia-cudnn-cu12         8.9.4.25                 pypi_0    pypi
nvidia-cufft-cu12         11.0.8.103               pypi_0    pypi
nvidia-curand-cu12        10.3.3.141               pypi_0    pypi
nvidia-cusolver-cu12      11.5.2.141               pypi_0    pypi
nvidia-cusparse-cu12      12.1.2.141               pypi_0    pypi
nvidia-nccl-cu12          2.16.5                   pypi_0    pypi
nvidia-nvjitlink-cu12     12.2.140                 pypi_0    pypi
...

EDIT: Info about tensorrt in case anyone is curious:

...
tensorrt                  8.6.1.post1              pypi_0    pypi
tensorrt-bindings         8.6.1                    pypi_0    pypi
tensorrt-libs             8.6.1                    pypi_0    pypi
...

@stallam-unb
Copy link
stallam-unb commented Nov 14, 2023

Update:

So I ran a few more tests, and interestingly, given tensorrt is the focus of this topic, it doesn't actually appear to be detected correctly:

2023-11-14 17:21:00.148638: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-14 17:21:00.148688: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-14 17:21:00.149349: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-14 17:21:00.153296: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-14 17:21:00.684660: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

I am seeing all sorts of errors about CUDA and Tensor RT that I didn't see in 2.13.x series. nvidia-smi seems to indicate that the GPUs are being used, but the logs seem to indicate issues. #62075 seems to indicate that this is an old issue.

@okurman
Copy link
okurman commented Nov 14, 2023

@stallam-unb the same happening with my installation.

@cliffwoolley
Copy link
Contributor
cliffwoolley commented Nov 16, 2023

For TensorRT 8.6 Python packages, TF's RUNPATH would need to have been updated; I'm not sure if that happened when updating the TRT version dependency in TF? This can be worked around with either symlinks or patchelf to update the runpath or just LD_LIBRARY_PATH -- definitely easier than the issue with TF 2.14 and TRT 8.5 and Py3.11.

Can we pick one or the other to go after here?

@stallam-unb
Copy link

For TensorRT 8.6 Python packages, TF's RUNPATH would need to have been updated; I'm not sure if that happened when updating the TRT version dependency in TF? This can be worked around with either symlinks or patchelf to update the runpath or just LD_LIBRARY_PATH -- definitely easier than the issue with TF 2.14 and TRT 8.5.

Can we pick one or the other to go after here?

patchelf is unfortunately not available on my servers, and it would take sometime to get it through the chain to get it approved. LD_LIBRARY_PATH OTOH can probably updated with conda envs (or .bashrc) so I am learning towards that. symlinks are also an easy option and I don't mind them either. @cliffwoolley Can you provide the some instructions for both of these? I am not fully familiar with TensorRT, so I don't exactly what files need to be symlinked/LD'ed.

@cliffwoolley
Copy link
Contributor

I'll gather up the symlinks for you as soon as I can get to it, but as far as LD_LIBRARY_PATH approach, all you need to do is to get the tensorrt-libs python package install dir into your library path.

What seems to have happened is that when the TRT dependency was bumped to 8.6 in 3de4416 , they recognized the addition of the tensorrt-libs package, but didn't follow the pattern of https://github.com/tensorflow/tensorflow/pull/59825/files and add tensorrt-libs to the search paths at https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/tf2tensorrt/BUILD#L73 and https://github.com/tensorflow/tensorflow/blob/master/third_party/xla/third_party/tsl/tsl/cuda/BUILD.bazel#L166 . (@meena-at-work , FYI).

@stallam-unb
Copy link

I'll gather up the symlinks for you as soon as I can get to it, but as far as LD_LIBRARY_PATH approach, all you need to do is to get the tensorrt-libs python package install dir into your library path.

What seems to have happened is that when the TRT dependency was bumped to 8.6 in 3de4416 , they recognized the addition of the tensorrt-libs package, but didn't follow the pattern of https://github.com/tensorflow/tensorflow/pull/59825/files and add tensorrt-libs to the search paths at https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/tf2tensorrt/BUILD#L73 and https://github.com/tensorflow/tensorflow/blob/master/third_party/xla/third_party/tsl/tsl/cuda/BUILD.bazel#L166 . (@meena-at-work , FYI).

@cliffwoolley I've not been quite successful with the LD_LIBRARY_PATH approach. I have a conda environment, so I did the following:

echo 'TENSORRT_LIBS_PATH=$(dirname $(python -c "import tensorrt_libs;print(tensorrt_libs.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$TENSORRT_LIBS_PATH:$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

Then sourced (also tried deactivate/activate). Can confirm that LD_LIBRARY_PATH has valid paths:

(tf215) stallam@lambda-scalar:~$ ls $TENSORRT_LIBS_PATH
drwxrwxr-x stallam stallam 4.0 KB Tue Nov 14 16:32:22 2023  .
drwxrwxr-x stallam stallam  16 KB Tue Nov 14 16:33:44 2023  ..
.rw-rw-r-- stallam stallam 1.0 KB Tue Nov 14 16:32:16 2023  __init__.py
drwxrwxr-x stallam stallam 4.0 KB Tue Nov 14 16:32:22 2023 󰌠 __pycache__
.rw-rw-r-- stallam stallam 226 MB Tue Nov 14 16:32:17 2023  libnvinfer.so.8
.rw-rw-r-- stallam stallam 957 MB Tue Nov 14 16:32:22 2023  libnvinfer_builder_resource.so.8.6.1
.rw-rw-r-- stallam stallam  37 MB Tue Nov 14 16:32:22 2023  libnvinfer_plugin.so.8
.rw-rw-r-- stallam stallam 2.7 MB Tue Nov 14 16:32:22 2023  libnvonnxparser.so.8
.rw-rw-r-- stallam stallam 3.3 MB Tue Nov 14 16:32:22 2023  libnvparsers.so.8

Re-running my test scripts still returns "Tensor RT not found" warning unfortunately.

@pijyoi
Copy link
pijyoi commented Jan 8, 2024

Using something similar to the following:
$ strace python -c "import tensorflow" 2>&1 | grep libnvinfer,
it appears that TF2.15 is searching for libnvinfer.so.8.6.1 and libnvinfer_plugin.so.8.6.1.

When I manually created the symbolic links libnvinfer.so.8.6.1 -> libnvinfer.so.8 and libnvinfer_plugin.so.8.6.1 -> libnvinfer_plugin.so.8, the warning went away. (LD_LIBRARY_PATH still needs to be set)

@srstsavage
Copy link

Thanks all for the clues into this issue. I wrote a helper script to set up symlinks to allow tensorflow to discover the missing tensorrt files, this helped me using tensorflow 2.15.1 and tensorrt 8.6.1.

#!/bin/bash
# Set up symlinks to allow tensorflow to find tensorrt library files
# https://github.com/tensorflow/tensorflow/issues/61986

echo "Getting linked tensorrt version"
TENSORRT_VERSION=$(python3 -c "import tensorflow.compiler as tf_cc; print('.'.join(map(str, tf_cc.tf2tensorrt._pywrap_py_utils.get_linked_tensorrt_version())))" 2> /dev/null)
if [ -z "$TENSORRT_VERSION" ]; then
  echo "Linked tensorrt version not detected" >&2
  exit 1
fi
echo $TENSORRT_VERSION

echo "Getting tensorrt lib dir (where tensorflow is looking)"
TENSORRT_FILE="$(python3 -c "import tensorrt; print(tensorrt.__file__)" 2>/dev/null)"
if [ -z "$TENSORRT_FILE" ]; then
  echo "tensorrt dir not found (is tensorrt installed?)" >&2
  exit 1
fi
TENSORRT_DIR="$(dirname "$TENSORRT_FILE")"
echo $TENSORRT_DIR

echo "Getting tensorrt_libs dir (where .so files actually are)"
TENSORRT_LIBS_FILE="$(python3 -c "import tensorrt_libs; print(tensorrt_libs.__file__)" 2>/dev/null)"
if [ -z "$TENSORRT_LIBS_FILE" ]; then
  echo "tensorrt_libs dir not found (is tensorrt installed?)" >&2
  exit 1
fi
TENSORRT_LIBS_DIR="$(dirname "$TENSORRT_LIBS_FILE")"
echo $TENSORRT_LIBS_DIR

echo "Creating links"
ln -srf "${TENSORRT_LIBS_DIR}/libnvinfer.so.8" "${TENSORRT_DIR}/libnvinfer.so.${TENSORRT_VERSION}"
ln -srf "${TENSORRT_LIBS_DIR}/libnvinfer_plugin.so.8" "${TENSORRT_DIR}/libnvinfer_plugin.so.${TENSORRT_VERSION}"

echo "tensorrt lib dir (${TENSORRT_DIR}) contents:"
ls -l "${TENSORRT_DIR}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gpu:tensorrt Issues specific to TensorRT stat:awaiting tensorflower Status - Awaiting response from tensorflower subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF2.14 For issues related to Tensorflow 2.14.x type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests