[NVIDIA TF] Support building against CUDA 12.0 #58867

nluehr · 2022-12-13T00:48:44Z

This PR updates TensorFlow to build against CUDA 12.0. Most changes are minor with the exception of the replacing the csrGemmV2 APIs with SpGEMM, since the former was removed from cusparse 12.0.

Attn: @hawkinsp

…= 12

tensorflow/core/kernels/sparse/sparse_mat_mul_op.cc

As of CUDA 12, CUSPARSE_MM_ALG_DEFAULT is replaced by CUSPARSE_SPMM_ALG_DEFAULT.

gbaned · 2022-12-28T10:56:49Z

Hi @cantonios Can you please review this PR ? Thank you!

Fix #2176. See also tensorflow/tensorflow#58867. Note that CUDA Toolkit 12.0 requires CUDA driver 525.60.13. Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

Fixes issue where minor version was incorrectly included in dso name with cuda 12.

nluehr · 2023-01-10T19:41:50Z

Added a fix to use cupti_version (major version only for cuda 12 and later) rather than cuda_version (major.minor version) to load the libcupti DSO.

nluehr · 2023-01-13T23:13:53Z

@reedwm is this blocked? Anything I can do on my side to help?

reedwm · 2023-01-13T23:17:11Z

It was blocked but now it simply needs to be approved internally. It probably will be merged Monday.

Fix deepmodeling#2176. See also tensorflow/tensorflow#58867. Note that CUDA Toolkit 12.0 requires CUDA driver 525.60.13. Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

@hawkinsp

Imported from GitHub PR tensorflow/tensorflow#58867 This PR updates TensorFlow to build against CUDA 12.0. Most changes are minor with the exception of the replacing the csrGemmV2 APIs with SpGEMM, since the former was removed from cusparse 12.0. Attn: @hawkinsp Copybara import of the project: -- be94c459eaffd51e9cf1f96c13385f6fed9d6752 by Nathan Luehr <nluehr@nvidia.com>: Use major version for cupti lib with CUDA>=12 -- 9145a60e65165a517a3789bcb49b79c871f67641 by Nathan Luehr <nluehr@nvidia.com>: Update CUDA stub libraries for CUDA 12 -- dee09e90a6a88caf94a1df7f6ca16bf5e5a1336f by Nathan Luehr <nluehr@nvidia.com>: Migrate remaining calls to cusparseCsrmvEx to cusparseSpMV for CUDA 12 -- 6481f1a35a5b153222ba280d1d74ee5239e90180 by Nathan Luehr <nluehr@nvidia.com>: Switch from CUSPARSE_CSR2CSC_ALG2 to CUSPARSE_CSR2CSC_ALG1 for CUDA >= 12 -- 4ff3164858c6b7741bfa494b1a40c65eae0fe171 by Nathan Luehr <nluehr@nvidia.com>: Replace csrGemmV2 with calls to SpGEMM APIs when compiling for CUDA 12. -- 4aa8b6fab9a1b19c9ca8936aee5ec54eaeed54b1 by Nathan Luehr <nluehr@nvidia.com>: Update algorithm enum sparse mat_mul_op for CUDA 12 As of CUDA 12, CUSPARSE_MM_ALG_DEFAULT is replaced by CUSPARSE_SPMM_ALG_DEFAULT. -- 25656bd776c759316db7ded528d50f6cb4c04266 by Nathan Luehr <nluehr@nvidia.com>: Bump NCCL version to 2.16.2 to support CUDA 12 and NVIDIA Hopper GPUs. -- c3b2dbbea466b54309c677b639387031c1e48604 by Nathan Luehr <nluehr@nvidia.com>: Update cudaGraph APIs for CUDA 12. -- 51cb95a2ce37988f0d6bb6f100ffb0cfdfaa8291 by Nathan Luehr <nluehr@nvidia.com>: Reduce memory overheads in sparse-sparse matmul. Memory reduction comes at the cost of an additional device-side copy to concat the gemm results across the batch. -- ae70777421a2c7f603171484a530bfa2143eedec by Nathan Luehr <nluehr@nvidia.com>: Guard cuda_blas_utils include to fix ROCM build. -- 4a04c65383f333fc23d70dc72e8a76b605ccc465 by Nathan Luehr <nluehr@nvidia.com>: Load cupti dso using correct version. Fixes issue where minor version was incorrectly included in dso name with cuda 12. Merging this change closes #58867 PiperOrigin-RevId: 502803087

@hawkinsp

Imported from GitHub PR tensorflow/tensorflow#58867 This PR updates TensorFlow to build against CUDA 12.0. Most changes are minor with the exception of the replacing the csrGemmV2 APIs with SpGEMM, since the former was removed from cusparse 12.0. Attn: @hawkinsp Copybara import of the project: -- be94c459eaffd51e9cf1f96c13385f6fed9d6752 by Nathan Luehr <nluehr@nvidia.com>: Use major version for cupti lib with CUDA>=12 -- 9145a60e65165a517a3789bcb49b79c871f67641 by Nathan Luehr <nluehr@nvidia.com>: Update CUDA stub libraries for CUDA 12 -- dee09e90a6a88caf94a1df7f6ca16bf5e5a1336f by Nathan Luehr <nluehr@nvidia.com>: Migrate remaining calls to cusparseCsrmvEx to cusparseSpMV for CUDA 12 -- 6481f1a35a5b153222ba280d1d74ee5239e90180 by Nathan Luehr <nluehr@nvidia.com>: Switch from CUSPARSE_CSR2CSC_ALG2 to CUSPARSE_CSR2CSC_ALG1 for CUDA >= 12 -- 4ff3164858c6b7741bfa494b1a40c65eae0fe171 by Nathan Luehr <nluehr@nvidia.com>: Replace csrGemmV2 with calls to SpGEMM APIs when compiling for CUDA 12. -- 4aa8b6fab9a1b19c9ca8936aee5ec54eaeed54b1 by Nathan Luehr <nluehr@nvidia.com>: Update algorithm enum sparse mat_mul_op for CUDA 12 As of CUDA 12, CUSPARSE_MM_ALG_DEFAULT is replaced by CUSPARSE_SPMM_ALG_DEFAULT. -- 25656bd776c759316db7ded528d50f6cb4c04266 by Nathan Luehr <nluehr@nvidia.com>: Bump NCCL version to 2.16.2 to support CUDA 12 and NVIDIA Hopper GPUs. -- c3b2dbbea466b54309c677b639387031c1e48604 by Nathan Luehr <nluehr@nvidia.com>: Update cudaGraph APIs for CUDA 12. -- 51cb95a2ce37988f0d6bb6f100ffb0cfdfaa8291 by Nathan Luehr <nluehr@nvidia.com>: Reduce memory overheads in sparse-sparse matmul. Memory reduction comes at the cost of an additional device-side copy to concat the gemm results across the batch. -- ae70777421a2c7f603171484a530bfa2143eedec by Nathan Luehr <nluehr@nvidia.com>: Guard cuda_blas_utils include to fix ROCM build. -- 4a04c65383f333fc23d70dc72e8a76b605ccc465 by Nathan Luehr <nluehr@nvidia.com>: Load cupti dso using correct version. Fixes issue where minor version was incorrectly included in dso name with cuda 12. Merging this change closes #58867 PiperOrigin-RevId: 502803087

luckeyca · 2023-01-22T22:56:09Z

does this cover CUDA 12 in WSL2 as I ran into issue with tensorflow right after pip install. verification command was looking for CUDA 11 library instead of 12 installed on the WSL2 instance. #59413

nluehr · 2023-01-23T15:29:22Z

This PR enables building TF from source against CUDA 12. The nightly and release builds available from PyPI continue to be built at present against CUDA 11.8.

yangtj207 · 2023-01-23T19:24:17Z

Is there a plan to provide docker images that work with CUDA 12.0? Thank you.

aminalaee · 2023-03-09T13:06:07Z

Can anyone share the plan for when this will be released?

alanwilter · 2023-03-10T23:26:06Z

I tried pip install tensorflow==2.12.rc1 and still not working with CUDA 12. Which release should see TF working with CUDA 12?

nluehr · 2023-03-14T15:45:10Z

@alanwilter Presently you need to build TensorFlow from source to use it with CUDA 12.x. Either the master or the r2.12 release branches will build against CUDA 12.
Alternatively, you could use NVIDIA's NGC containers here which are pre-built against CUDA 12.

ddelange · 2023-03-16T07:24:55Z

This PR enables building TF from source against CUDA 12. The nightly and release builds available from PyPI continue to be built at present against CUDA 11.8.

Is there a timeline for building the official wheels against CUDA 12.x?

Talador12 · 2023-04-19T21:20:03Z

This PR enables building TF from source against CUDA 12. The nightly and release builds available from PyPI continue to be built at present against CUDA 11.8.

These 12.x packages need to be built on PyPI. CUDA 11.x is deprecated in some systems

nluehr · 2023-04-20T15:57:10Z

@Talador12 can you provide more information about where CUDA 11.8 is deprecated?

Talador12 · 2023-05-01T16:07:54Z

It could be a few reasons, but Fedramp compliance and using current Debian versions. Namely, debian Bookworm.

I was surprised that Tensorflow did not have prebuilt CUDA 12 support on PyPi.

Talador12 · 2023-05-24T17:25:58Z

Is there an update on this issue? There is still a need for a CUDA 12 build of tensorflow on Pypi

reedwm · 2023-05-24T19:05:45Z

We unfortunately do not yet have official pip wheels with CUDA 12. It's possible TensorFlow 2.14 will be built with CUDA 12 but not guaranteed.

Talador12 · 2023-06-19T14:54:14Z

Could we re-open this issue? This has not been resolved yet

reedwm · 2023-06-20T19:14:14Z

This is a PR that has been merged, not an issue, so it cannot be reopened.

We have not yet released pip packages with CUDA 12 support, but are working on this. Feel free to file a new GitHub issue to have CUDA 12 pip packages (please CC me on the issue if you file it).

Talador12 · 2023-06-21T20:27:27Z

Apologies - I thought the python package would be built using the merged code in this pull request. I created a separate issue for the python package at #60943

nluehr added 4 commits December 12, 2022 16:03

Use major version for cupti lib with CUDA>=12

be94c45

Update CUDA stub libraries for CUDA 12

9145a60

Migrate remaining calls to cusparseCsrmvEx to cusparseSpMV for CUDA 12

dee09e9

Switch from CUSPARSE_CSR2CSC_ALG2 to CUSPARSE_CSR2CSC_ALG1 for CUDA >…

6481f1a

…= 12

nluehr requested a review from penpornk as a code owner December 13, 2022 00:48

google-ml-butler bot added the size:XL CL Change Size:Extra Large label Dec 13, 2022

google-ml-butler bot assigned gbaned Dec 13, 2022

google-ml-butler bot requested a review from r4nt December 13, 2022 00:48

google-ml-butler bot added the awaiting review Pull request awaiting review label Dec 13, 2022

gbaned added this to Assigned Reviewer in PR Queue via automation Dec 13, 2022

cheshire approved these changes Dec 13, 2022

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Dec 13, 2022

PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer Dec 13, 2022

kokoro-team removed the kokoro:force-run Tests on submitted change label Dec 13, 2022

hawkinsp mentioned this pull request Dec 13, 2022

Add support for CUDA 12 google/jax#13637

Closed

cantonios reviewed Dec 14, 2022

View reviewed changes

tensorflow/core/kernels/sparse/sparse_mat_mul_op.cc Outdated Show resolved Hide resolved

gbaned removed the ready to pull PR ready for merge process label Dec 15, 2022

nluehr added 4 commits December 15, 2022 09:51

Replace csrGemmV2 with calls to SpGEMM APIs when compiling for CUDA 12.

4ff3164

Update algorithm enum sparse mat_mul_op for CUDA 12

4aa8b6f

As of CUDA 12, CUSPARSE_MM_ALG_DEFAULT is replaced by CUSPARSE_SPMM_ALG_DEFAULT.

Bump NCCL version to 2.16.2 to support CUDA 12 and NVIDIA Hopper GPUs.

25656bd

Update cudaGraph APIs for CUDA 12.

c3b2dbb

nluehr force-pushed the cuda12 branch from 060d209 to c3b2dbb Compare December 15, 2022 17:52

gbaned requested a review from cantonios December 16, 2022 08:08

gbaned added the comp:core issues related to core part of tensorflow label Dec 16, 2022

njzjz mentioned this pull request Dec 29, 2022

support CUDA 12.0 deepmodeling/deepmd-kit#2205

Merged

gbaned requested review from cantonios and removed request for cantonios December 30, 2022 09:38

google-ml-butler bot added awaiting review Pull request awaiting review and removed ready to pull PR ready for merge process labels Jan 9, 2023

Load cupti dso using correct version.

4a04c65

Fixes issue where minor version was incorrectly included in dso name with cuda 12.

reedwm added the ready to pull PR ready for merge process label Jan 11, 2023

copybara-service bot merged commit 2b29314 into tensorflow:master Jan 18, 2023

PR Queue automation moved this from Reviewer Requested Changes to Merged Jan 18, 2023

nluehr deleted the cuda12 branch January 18, 2023 18:01

Talador12 mentioned this pull request Jun 21, 2023

Request: pip packages with CUDA 12 support #60943

Closed

jakirkham mentioned this pull request Jun 25, 2023

Rebuild for CUDA 12 conda-forge/tensorflow-feedstock#322

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA TF] Support building against CUDA 12.0 #58867

[NVIDIA TF] Support building against CUDA 12.0 #58867

[NVIDIA TF] Support building against CUDA 12.0 #58867

[NVIDIA TF] Support building against CUDA 12.0 #58867

Conversation