[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf.map_fn on RaggedTensors crash during gradient computation on a GPU #55475

Closed
foxik opened this issue Apr 3, 2022 · 5 comments
Closed

tf.map_fn on RaggedTensors crash during gradient computation on a GPU #55475

foxik opened this issue Apr 3, 2022 · 5 comments
Assignees
Labels
comp:ops OPs related issues TF 2.8 type:bug Bug

Comments

@foxik
Copy link
Contributor
foxik commented Apr 3, 2022

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Colab
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.8
  • Python version: 3.7

Describe the current behavior

When some loss (tf.losses.SparseCategoricalCrossentropy, tf.losses.CategoricalCrossentropy, tf.losses.BinaryCrossentropy, or tf.losses.MeanSquaredError) is used on Ragged tensors, which is computed via a tf.map_fn on a RaggedTensor, that the gradient computation on a GPU crashes with

Node: 'Adam/gradients/zeros_like_2'
2 root error(s) found.
  (0) INTERNAL:  No unary variant unary_op function found for op ZEROS_LIKE Variant type_name: RaggedTensorVariant for device type: GPU
	 [[{{node Adam/gradients/zeros_like_2}}]]
	 [[binary_crossentropy/map/while/loop_body_control/_124/_67]]
  (1) INTERNAL:  No unary variant unary_op function found for op ZEROS_LIKE Variant type_name: RaggedTensorVariant for device type: GPU
	 [[{{node Adam/gradients/zeros_like_2}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_16690]

The computation does not crash on a CPU and it does not crash when tf.functions are executed eagerly.

Also, if the tf.map_fn is circumvented by using the following argument to compile

  loss=lambda yt, yp: tf.losses.BinaryCrossentropy()(yt. values, yp.values)

it works on GPU without a crash.

Describe the expected behavior

The code does not crash on a GPU.

  • Do you want to contribute a PR? (yes/no): no

Standalone code to reproduce the issue

A simple Colab reproducing the error is here: https://colab.research.google.com/drive/1OELAhvpQHhaz3sOYabf4SdBqKlQCjNjs?usp=sharing

Other info / logs

The map_fn used is here: https://github.com/keras-team/keras/blob/2db5acf3e3c5904b014cb409d3c514bef44f9640/keras/losses.py#L1408

@foxik
Copy link
Contributor Author
foxik commented Apr 3, 2022

Note that I also opened an issue in the Keras repository keras-team/tf-keras#638 , where we discuss whether we should avoid the tf.map_fn on the RaggedTensors, because it can probably be avoided -- the metrics with ragged tensors take a different approach, and instead of a ragged map, they use flat_values, see https://github.com/keras-team/keras/blob/2db5acf3e3c5904b014cb409d3c514bef44f9640/keras/utils/metrics_utils.py#L800 .

@sushreebarsa
Copy link
Contributor

@chunduriv I was able to reproduce the issue on colab using TF v2.8.0 ,tf-nightly on both gpu and cpu , please find the attached gists for reference.Thanks!

@foxik
Copy link
Contributor Author
foxik commented Apr 7, 2022

Oh, I was just pointed to me (by djoshea) that this is a duplicate of 46635, so closing.

@foxik
Copy link
Contributor Author
foxik commented Apr 7, 2022

Closing as a duplicate of #46635 .

@foxik foxik closed this as completed Apr 7, 2022
@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:ops OPs related issues TF 2.8 type:bug Bug
Projects
None yet
Development

No branches or pull requests

3 participants