[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf.device context manager does not restore cudaCurrentDevice under some conditions #61911

Open
isVoid opened this issue Sep 19, 2023 · 0 comments
Assignees
Labels
comp:apis Highlevel API related issues comp:gpu GPU related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.13 For issues related to Tensorflow 2.13 type:bug Bug

Comments

@isVoid
Copy link
isVoid commented Sep 19, 2023

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

binary

TensorFlow version

2.13

Custom code

No

OS platform and distribution

Linux Ubuntu 20.04.5 LTS

Mobile device

No response

Python version

3.10

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

12.0

GPU model and memory

No response

Current behavior?

When using tf.device context manager, the current device of cuda runtime remains "dirty" even after exiting the context manager. This happens when: 1. tensorflow is initializing GPU context on this line (tf.device), 2. there is no materialization of tensors on GPU.

For context, keeping a clean state of current device context is important to keep tensorflow in sync with other GPU based libraries such as cuDF. RMM memory allocators also depends on the assumption that the context stays the same throughout the lifetime of allocations.

Standalone code to reproduce the issue

https://gist.github.com/isVoid/9eded87fca35e86a2c2dc85f603383c2

Relevant log output

# Log output of the first cell. The second and third current device context should be 0.

(<cudaError_t.cudaSuccess: 0>, 0)
(<cudaError_t.cudaSuccess: 0>, 7)
(<cudaError_t.cudaSuccess: 0>, 7)
(<cudaError_t.cudaSuccess: 0>, 0)
@google-ml-butler google-ml-butler bot added the type:bug Bug label Sep 19, 2023
@tilakrayal tilakrayal added TF 2.13 For issues related to Tensorflow 2.13 comp:apis Highlevel API related issues comp:gpu GPU related issues labels Sep 20, 2023
@sachinprasadhs sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Oct 17, 2023
@reedwm reedwm assigned cantonios and unassigned reedwm Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:apis Highlevel API related issues comp:gpu GPU related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.13 For issues related to Tensorflow 2.13 type:bug Bug
Projects
None yet
Development

No branches or pull requests

5 participants