[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NVIDIA TF] Support building against CUDA 12.0 #58867

Merged
merged 11 commits into from
Jan 18, 2023

Conversation

nluehr
Copy link
Contributor
@nluehr nluehr commented Dec 13, 2022

This PR updates TensorFlow to build against CUDA 12.0. Most changes are minor with the exception of the replacing the csrGemmV2 APIs with SpGEMM, since the former was removed from cusparse 12.0.

Attn: @hawkinsp

@google-ml-butler google-ml-butler bot added the size:XL CL Change Size:Extra Large label Dec 13, 2022
@google-ml-butler google-ml-butler bot requested a review from r4nt December 13, 2022 00:48
@google-ml-butler google-ml-butler bot added the awaiting review Pull request awaiting review label Dec 13, 2022
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Dec 13, 2022
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Dec 13, 2022
PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer Dec 13, 2022
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Dec 13, 2022
@gbaned gbaned removed the ready to pull PR ready for merge process label Dec 15, 2022
@gbaned gbaned added the comp:core issues related to core part of tensorflow label Dec 16, 2022
@gbaned
Copy link
Contributor
gbaned commented Dec 28, 2022

Hi @cantonios Can you please review this PR ? Thank you!

wanghan-iapcm pushed a commit to deepmodeling/deepmd-kit that referenced this pull request Dec 30, 2022
Fix #2176.
See also tensorflow/tensorflow#58867.
Note that CUDA Toolkit 12.0 requires CUDA driver 525.60.13.

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
@gbaned gbaned requested review from cantonios and removed request for cantonios December 30, 2022 09:38
@google-ml-butler google-ml-butler bot added awaiting review Pull request awaiting review and removed ready to pull PR ready for merge process labels Jan 9, 2023
Fixes issue where minor version was incorrectly included in dso
name with cuda 12.
@nluehr
Copy link
Contributor Author
nluehr commented Jan 10, 2023

Added a fix to use cupti_version (major version only for cuda 12 and later) rather than cuda_version (major.minor version) to load the libcupti DSO.

@reedwm reedwm added the ready to pull PR ready for merge process label Jan 11, 2023
@nluehr
Copy link
Contributor Author
nluehr commented Jan 13, 2023

@reedwm is this blocked? Anything I can do on my side to help?

@reedwm
Copy link
Member
reedwm commented Jan 13, 2023

It was blocked but now it simply needs to be approved internally. It probably will be merged Monday.

mingzhong15 pushed a commit to mingzhong15/deepmd-kit that referenced this pull request Jan 15, 2023
Fix deepmodeling#2176.
See also tensorflow/tensorflow#58867.
Note that CUDA Toolkit 12.0 requires CUDA driver 525.60.13.

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
copybara-service bot pushed a commit to google/tsl that referenced this pull request Jan 18, 2023
Imported from GitHub PR tensorflow/tensorflow#58867

This PR updates TensorFlow to build against CUDA 12.0. Most changes are minor with the exception of the replacing the csrGemmV2 APIs with SpGEMM, since the former was removed from cusparse 12.0.

Attn: @hawkinsp
Copybara import of the project:

--
be94c459eaffd51e9cf1f96c13385f6fed9d6752 by Nathan Luehr <nluehr@nvidia.com>:

Use major version for cupti lib with CUDA>=12

--
9145a60e65165a517a3789bcb49b79c871f67641 by Nathan Luehr <nluehr@nvidia.com>:

Update CUDA stub libraries for CUDA 12

--
dee09e90a6a88caf94a1df7f6ca16bf5e5a1336f by Nathan Luehr <nluehr@nvidia.com>:

Migrate remaining calls to cusparseCsrmvEx to cusparseSpMV for CUDA 12

--
6481f1a35a5b153222ba280d1d74ee5239e90180 by Nathan Luehr <nluehr@nvidia.com>:

Switch from CUSPARSE_CSR2CSC_ALG2 to CUSPARSE_CSR2CSC_ALG1 for CUDA >= 12

--
4ff3164858c6b7741bfa494b1a40c65eae0fe171 by Nathan Luehr <nluehr@nvidia.com>:

Replace csrGemmV2 with calls to SpGEMM APIs when compiling for CUDA 12.

--
4aa8b6fab9a1b19c9ca8936aee5ec54eaeed54b1 by Nathan Luehr <nluehr@nvidia.com>:

Update algorithm enum sparse mat_mul_op for CUDA 12

As of CUDA 12, CUSPARSE_MM_ALG_DEFAULT is replaced by CUSPARSE_SPMM_ALG_DEFAULT.

--
25656bd776c759316db7ded528d50f6cb4c04266 by Nathan Luehr <nluehr@nvidia.com>:

Bump NCCL version to 2.16.2 to support CUDA 12 and NVIDIA Hopper GPUs.

--
c3b2dbbea466b54309c677b639387031c1e48604 by Nathan Luehr <nluehr@nvidia.com>:

Update cudaGraph APIs for CUDA 12.

--
51cb95a2ce37988f0d6bb6f100ffb0cfdfaa8291 by Nathan Luehr <nluehr@nvidia.com>:

Reduce memory overheads in sparse-sparse matmul.

Memory reduction comes at the cost of an additional device-side copy
to concat the gemm results across the batch.

--
ae70777421a2c7f603171484a530bfa2143eedec by Nathan Luehr <nluehr@nvidia.com>:

Guard cuda_blas_utils include to fix ROCM build.

--
4a04c65383f333fc23d70dc72e8a76b605ccc465 by Nathan Luehr <nluehr@nvidia.com>:

Load cupti dso using correct version.

Fixes issue where minor version was incorrectly included in dso
name with cuda 12.

Merging this change closes #58867

PiperOrigin-RevId: 502803087
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Jan 18, 2023
Imported from GitHub PR tensorflow/tensorflow#58867

This PR updates TensorFlow to build against CUDA 12.0. Most changes are minor with the exception of the replacing the csrGemmV2 APIs with SpGEMM, since the former was removed from cusparse 12.0.

Attn: @hawkinsp
Copybara import of the project:

--
be94c459eaffd51e9cf1f96c13385f6fed9d6752 by Nathan Luehr <nluehr@nvidia.com>:

Use major version for cupti lib with CUDA>=12

--
9145a60e65165a517a3789bcb49b79c871f67641 by Nathan Luehr <nluehr@nvidia.com>:

Update CUDA stub libraries for CUDA 12

--
dee09e90a6a88caf94a1df7f6ca16bf5e5a1336f by Nathan Luehr <nluehr@nvidia.com>:

Migrate remaining calls to cusparseCsrmvEx to cusparseSpMV for CUDA 12

--
6481f1a35a5b153222ba280d1d74ee5239e90180 by Nathan Luehr <nluehr@nvidia.com>:

Switch from CUSPARSE_CSR2CSC_ALG2 to CUSPARSE_CSR2CSC_ALG1 for CUDA >= 12

--
4ff3164858c6b7741bfa494b1a40c65eae0fe171 by Nathan Luehr <nluehr@nvidia.com>:

Replace csrGemmV2 with calls to SpGEMM APIs when compiling for CUDA 12.

--
4aa8b6fab9a1b19c9ca8936aee5ec54eaeed54b1 by Nathan Luehr <nluehr@nvidia.com>:

Update algorithm enum sparse mat_mul_op for CUDA 12

As of CUDA 12, CUSPARSE_MM_ALG_DEFAULT is replaced by CUSPARSE_SPMM_ALG_DEFAULT.

--
25656bd776c759316db7ded528d50f6cb4c04266 by Nathan Luehr <nluehr@nvidia.com>:

Bump NCCL version to 2.16.2 to support CUDA 12 and NVIDIA Hopper GPUs.

--
c3b2dbbea466b54309c677b639387031c1e48604 by Nathan Luehr <nluehr@nvidia.com>:

Update cudaGraph APIs for CUDA 12.

--
51cb95a2ce37988f0d6bb6f100ffb0cfdfaa8291 by Nathan Luehr <nluehr@nvidia.com>:

Reduce memory overheads in sparse-sparse matmul.

Memory reduction comes at the cost of an additional device-side copy
to concat the gemm results across the batch.

--
ae70777421a2c7f603171484a530bfa2143eedec by Nathan Luehr <nluehr@nvidia.com>:

Guard cuda_blas_utils include to fix ROCM build.

--
4a04c65383f333fc23d70dc72e8a76b605ccc465 by Nathan Luehr <nluehr@nvidia.com>:

Load cupti dso using correct version.

Fixes issue where minor version was incorrectly included in dso
name with cuda 12.

Merging this change closes #58867

PiperOrigin-RevId: 502803087
@copybara-service copybara-service bot merged commit 2b29314 into tensorflow:master Jan 18, 2023
PR Queue automation moved this from Reviewer Requested Changes to Merged Jan 18, 2023
@nluehr nluehr deleted the cuda12 branch January 18, 2023 18:01
@luckeyca
Copy link

does this cover CUDA 12 in WSL2 as I ran into issue with tensorflow right after pip install. verification command was looking for CUDA 11 library instead of 12 installed on the WSL2 instance. #59413

@nluehr
Copy link
Contributor Author
nluehr commented Jan 23, 2023

This PR enables building TF from source against CUDA 12. The nightly and release builds available from PyPI continue to be built at present against CUDA 11.8.

@yangtj207
Copy link

Is there a plan to provide docker images that work with CUDA 12.0? Thank you.

@aminalaee
Copy link

Can anyone share the plan for when this will be released?

@alanwilter
Copy link

I tried pip install tensorflow==2.12.rc1 and still not working with CUDA 12. Which release should see TF working with CUDA 12?

@nluehr
Copy link
Contributor Author
nluehr commented Mar 14, 2023

@alanwilter Presently you need to build TensorFlow from source to use it with CUDA 12.x. Either the master or the r2.12 release branches will build against CUDA 12.
Alternatively, you could use NVIDIA's NGC containers here which are pre-built against CUDA 12.

@ddelange
Copy link

This PR enables building TF from source against CUDA 12. The nightly and release builds available from PyPI continue to be built at present against CUDA 11.8.

Is there a timeline for building the official wheels against CUDA 12.x?

@Talador12
Copy link

This PR enables building TF from source against CUDA 12. The nightly and release builds available from PyPI continue to be built at present against CUDA 11.8.

These 12.x packages need to be built on PyPI. CUDA 11.x is deprecated in some systems

@nluehr
Copy link
Contributor Author
nluehr commented Apr 20, 2023

@Talador12 can you provide more information about where CUDA 11.8 is deprecated?

@Talador12
Copy link
Talador12 commented May 1, 2023

It could be a few reasons, but Fedramp compliance and using current Debian versions. Namely, debian Bookworm.

I was surprised that Tensorflow did not have prebuilt CUDA 12 support on PyPi.

@Talador12
Copy link

Is there an update on this issue? There is still a need for a CUDA 12 build of tensorflow on Pypi

@reedwm
Copy link
Member
reedwm commented May 24, 2023

We unfortunately do not yet have official pip wheels with CUDA 12. It's possible TensorFlow 2.14 will be built with CUDA 12 but not guaranteed.

@Talador12
Copy link

Could we re-open this issue? This has not been resolved yet

@reedwm
Copy link
Member
reedwm commented Jun 20, 2023

This is a PR that has been merged, not an issue, so it cannot be reopened.

We have not yet released pip packages with CUDA 12 support, but are working on this. Feel free to file a new GitHub issue to have CUDA 12 pip packages (please CC me on the issue if you file it).

@Talador12
Copy link

Apologies - I thought the python package would be built using the merged code in this pull request. I created a separate issue for the python package at #60943

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review Pull request awaiting review comp:core issues related to core part of tensorflow ready to pull PR ready for merge process size:XL CL Change Size:Extra Large
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet