[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No registered 'Const' OpKernel for GPU devices with constant folding #52200

Open
albertz opened this issue Sep 30, 2021 · 1 comment
Open

No registered 'Const' OpKernel for GPU devices with constant folding #52200

albertz opened this issue Sep 30, 2021 · 1 comment
Assignees
Labels
2.6.0 comp:gpu GPU related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug

Comments

@albertz
Copy link
Contributor
albertz commented Sep 30, 2021

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): pip binary
  • TensorFlow version (use command below): v2.6.0-rc2-32-g919f693420e 2.6.0
  • Python version: 3.8.10
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: 11.4 / 8.2.4.15
  • GPU model and memory: NVIDIA GeForce RTX 2070

Describe the current behavior

The code below fails with an exception.
This is the full output:

TF: 2.6.0
2021-09-30 15:52:24.159169: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 15:52:24.162278: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 15:52:24.162637: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 15:52:24.163155: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-30 15:52:24.163754: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 15:52:24.164103: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 15:52:24.164431: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 15:52:24.456691: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 15:52:24.457036: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 15:52:24.457342: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 15:52:24.457640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5732 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:09:00.0, compute capability: 7.5
2021-09-30 15:52:24.466132: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'Const' OpKernel for GPU devices compatible with node {{node ConstantFolding/Const_enter}}
         (OpKernel was found, but attributes didn't match) Requested Attributes: dtype=DT_STRING, value=Tensor<type: string shape: [] values: foo>, _device="/job:localhost/replica:0/task:0/device:GPU:0"
        .  Registered:  device='XLA_GPU'; dtype in [DT_UINT8, DT_QUINT8, DT_UINT16, DT_INT8, DT_QINT8, DT_INT16, DT_INT32, DT_QINT32, DT_INT64, DT_HALF, DT_FLOAT, DT_DOUBLE, DT_COMPLEX64, DT_COMPLEX128, DT_BOOL, DT_BFLOAT16]
  device='XLA_CPU'; dtype in [DT_UINT8, DT_QUINT8, DT_UINT16, DT_INT8, DT_QINT8, DT_INT16, DT_INT32, DT_QINT32, DT_INT64, DT_HALF, DT_FLOAT, DT_DOUBLE, DT_COMPLEX64, DT_COMPLEX128, DT_BOOL, DT_BFLOAT16]
  device='DEFAULT'; dtype in [DT_VARIANT]
  device='DEFAULT'; dtype in [DT_BOOL]
  device='DEFAULT'; dtype in [DT_QUINT16]
  device='DEFAULT'; dtype in [DT_QINT16]
  device='DEFAULT'; dtype in [DT_QINT32]
  device='DEFAULT'; dtype in [DT_QUINT8]
  device='DEFAULT'; dtype in [DT_QINT8]
  device='DEFAULT'; dtype in [DT_COMPLEX128]
  device='DEFAULT'; dtype in [DT_COMPLEX64]
  device='DEFAULT'; dtype in [DT_INT8]
  device='DEFAULT'; dtype in [DT_UINT8]
  device='DEFAULT'; dtype in [DT_INT16]
  device='DEFAULT'; dtype in [DT_UINT16]
  device='DEFAULT'; dtype in [DT_UINT32]
  device='DEFAULT'; dtype in [DT_INT64]
  device='DEFAULT'; dtype in [DT_UINT64]
  device='DEFAULT'; dtype in [DT_DOUBLE]
  device='DEFAULT'; dtype in [DT_FLOAT]
  device='DEFAULT'; dtype in [DT_BFLOAT16]
  device='DEFAULT'; dtype in [DT_HALF]
  device='DEFAULT'; dtype in [DT_INT32]
  device='CPU'
  device='TPU_SYSTEM'
  device='GPU'; dtype in [DT_VARIANT]
  device='GPU'; dtype in [DT_BOOL]
  device='GPU'; dtype in [DT_COMPLEX128]
  device='GPU'; dtype in [DT_COMPLEX64]
  device='GPU'; dtype in [DT_UINT64]
  device='GPU'; dtype in [DT_INT64]
  device='GPU'; dtype in [DT_QINT32]
  device='GPU'; dtype in [DT_UINT32]
  device='GPU'; dtype in [DT_QUINT16]
  device='GPU'; dtype in [DT_QINT16]
  device='GPU'; dtype in [DT_INT16]
  device='GPU'; dtype in [DT_UINT16]
  device='GPU'; dtype in [DT_QINT8]
  device='GPU'; dtype in [DT_INT8]
  device='GPU'; dtype in [DT_UINT8]
  device='GPU'; dtype in [DT_DOUBLE]
  device='GPU'; dtype in [DT_FLOAT]
  device='GPU'; dtype in [DT_BFLOAT16]
  device='GPU'; dtype in [DT_HALF]

Traceback (most recent call last):
  File "/home/az/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1375, in _do_call
    return fn(*args)
  File "/home/az/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1359, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/home/az/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1451, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.NotFoundError: No registered 'Const' OpKernel for 'GPU' devices compatible with node {{node ConstantFolding/Const_enter}}
         (OpKernel was found, but attributes didn't match) Requested Attributes: _XlaHasReferenceVars=false, dtype=DT_STRING, value=Tensor<type: string shape: [] values: foo>, _device="/job:localhost/replica:0/task:0/device:GPU:0"
        .  Registered:  device='XLA_CPU_JIT'; dtype in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_STRING]
  device='XLA_GPU_JIT'; dtype in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_STRING]
  device='XLA_GPU'; dtype in [DT_UINT8, DT_QUINT8, DT_UINT16, DT_INT8, DT_QINT8, DT_INT16, DT_INT32, DT_QINT32, DT_INT64, DT_HALF, DT_FLOAT, DT_DOUBLE, DT_COMPLEX64, DT_COMPLEX128, DT_BOOL, DT_BFLOAT16]
  device='XLA_CPU'; dtype in [DT_UINT8, DT_QUINT8, DT_UINT16, DT_INT8, DT_QINT8, DT_INT16, DT_INT32, DT_QINT32, DT_INT64, DT_HALF, DT_FLOAT, DT_DOUBLE, DT_COMPLEX64, DT_COMPLEX128, DT_BOOL, DT_BFLOAT16]
  device='DEFAULT'; dtype in [DT_VARIANT]
  device='DEFAULT'; dtype in [DT_BOOL]
  device='DEFAULT'; dtype in [DT_QUINT16]
  device='DEFAULT'; dtype in [DT_QINT16]
  device='DEFAULT'; dtype in [DT_QINT32]
  device='DEFAULT'; dtype in [DT_QUINT8]
  device='DEFAULT'; dtype in [DT_QINT8]
  device='DEFAULT'; dtype in [DT_COMPLEX128]
  device='DEFAULT'; dtype in [DT_COMPLEX64]
  device='DEFAULT'; dtype in [DT_INT8]
  device='DEFAULT'; dtype in [DT_UINT8]
  device='DEFAULT'; dtype in [DT_INT16]
  device='DEFAULT'; dtype in [DT_UINT16]
  device='DEFAULT'; dtype in [DT_UINT32]
  device='DEFAULT'; dtype in [DT_INT64]
  device='DEFAULT'; dtype in [DT_UINT64]
  device='DEFAULT'; dtype in [DT_DOUBLE]
  device='DEFAULT'; dtype in [DT_FLOAT]
  device='DEFAULT'; dtype in [DT_BFLOAT16]
  device='DEFAULT'; dtype in [DT_HALF]
  device='DEFAULT'; dtype in [DT_INT32]
  device='CPU'
  device='TPU_SYSTEM'
  device='GPU'; dtype in [DT_VARIANT]
  device='GPU'; dtype in [DT_BOOL]
  device='GPU'; dtype in [DT_COMPLEX128]
  device='GPU'; dtype in [DT_COMPLEX64]
  device='GPU'; dtype in [DT_UINT64]
  device='GPU'; dtype in [DT_INT64]
  device='GPU'; dtype in [DT_QINT32]
  device='GPU'; dtype in [DT_UINT32]
  device='GPU'; dtype in [DT_QUINT16]
  device='GPU'; dtype in [DT_QINT16]
  device='GPU'; dtype in [DT_INT16]
  device='GPU'; dtype in [DT_UINT16]
  device='GPU'; dtype in [DT_QINT8]
  device='GPU'; dtype in [DT_INT8]
  device='GPU'; dtype in [DT_UINT8]
  device='GPU'; dtype in [DT_DOUBLE]
  device='GPU'; dtype in [DT_FLOAT]
  device='GPU'; dtype in [DT_BFLOAT16]
  device='GPU'; dtype in [DT_HALF]

         [[ConstantFolding/Const_enter]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tf-const-gpu.py", line 18, in <module>
    session.run(n)
  File "/home/az/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 967, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "/home/az/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1190, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "/home/az/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1368, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "/home/az/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1394, in _do_call
    raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.NotFoundError: No registered 'Const' OpKernel for 'GPU' devices compatible with node {{node ConstantFolding/Const_enter}}
         (OpKernel was found, but attributes didn't match) Requested Attributes: _XlaHasReferenceVars=false, dtype=DT_STRING, value=Tensor<type: string shape: [] values: foo>, _device="/job:localhost/replica:0/task:0/device:GPU:0"
        .  Registered:  device='XLA_CPU_JIT'; dtype in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_STRING]
  device='XLA_GPU_JIT'; dtype in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_STRING]
  device='XLA_GPU'; dtype in [DT_UINT8, DT_QUINT8, DT_UINT16, DT_INT8, DT_QINT8, DT_INT16, DT_INT32, DT_QINT32, DT_INT64, DT_HALF, DT_FLOAT, DT_DOUBLE, DT_COMPLEX64, DT_COMPLEX128, DT_BOOL, DT_BFLOAT16]
  device='XLA_CPU'; dtype in [DT_UINT8, DT_QUINT8, DT_UINT16, DT_INT8, DT_QINT8, DT_INT16, DT_INT32, DT_QINT32, DT_INT64, DT_HALF, DT_FLOAT, DT_DOUBLE, DT_COMPLEX64, DT_COMPLEX128, DT_BOOL, DT_BFLOAT16]
  device='DEFAULT'; dtype in [DT_VARIANT]
  device='DEFAULT'; dtype in [DT_BOOL]
  device='DEFAULT'; dtype in [DT_QUINT16]
  device='DEFAULT'; dtype in [DT_QINT16]
  device='DEFAULT'; dtype in [DT_QINT32]
  device='DEFAULT'; dtype in [DT_QUINT8]
  device='DEFAULT'; dtype in [DT_QINT8]
  device='DEFAULT'; dtype in [DT_COMPLEX128]
  device='DEFAULT'; dtype in [DT_COMPLEX64]
  device='DEFAULT'; dtype in [DT_INT8]
  device='DEFAULT'; dtype in [DT_UINT8]
  device='DEFAULT'; dtype in [DT_INT16]
  device='DEFAULT'; dtype in [DT_UINT16]
  device='DEFAULT'; dtype in [DT_UINT32]
  device='DEFAULT'; dtype in [DT_INT64]
  device='DEFAULT'; dtype in [DT_UINT64]
  device='DEFAULT'; dtype in [DT_DOUBLE]
  device='DEFAULT'; dtype in [DT_FLOAT]
  device='DEFAULT'; dtype in [DT_BFLOAT16]
  device='DEFAULT'; dtype in [DT_HALF]
  device='DEFAULT'; dtype in [DT_INT32]
  device='CPU'
  device='TPU_SYSTEM'
  device='GPU'; dtype in [DT_VARIANT]
  device='GPU'; dtype in [DT_BOOL]
  device='GPU'; dtype in [DT_COMPLEX128]
  device='GPU'; dtype in [DT_COMPLEX64]
  device='GPU'; dtype in [DT_UINT64]
  device='GPU'; dtype in [DT_INT64]
  device='GPU'; dtype in [DT_QINT32]
  device='GPU'; dtype in [DT_UINT32]
  device='GPU'; dtype in [DT_QUINT16]
  device='GPU'; dtype in [DT_QINT16]
  device='GPU'; dtype in [DT_INT16]
  device='GPU'; dtype in [DT_UINT16]
  device='GPU'; dtype in [DT_QINT8]
  device='GPU'; dtype in [DT_INT8]
  device='GPU'; dtype in [DT_UINT8]
  device='GPU'; dtype in [DT_DOUBLE]
  device='GPU'; dtype in [DT_FLOAT]
  device='GPU'; dtype in [DT_BFLOAT16]
  device='GPU'; dtype in [DT_HALF]

         [[ConstantFolding/Const_enter]]

Describe the expected behavior

The code below should work without error on a GPU.

Standalone code to reproduce the issue

import tensorflow as tf


print("TF:", tf.__version__)
tf.compat.v1.disable_eager_execution()
tf.compat.v1.disable_control_flow_v2()


with tf.compat.v1.Session() as session:
  x = tf.constant("foo")

  def body(i):
    with tf.control_dependencies([tf.print(x)]):
      return i + 1

  n = tf.while_loop(cond=lambda i: tf.less(i, 1), body=body, loop_vars=[0])
  session.run(n)
@tilakrayal
Copy link
Contributor

@sachinprasadhs ,
I was able to reproduce the issue in tf v2.5,v2.6 and nightly.Please find the gist of it here.

@tilakrayal tilakrayal added 2.6.0 comp:gpu GPU related issues labels Sep 30, 2021
@sachinprasadhs sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Oct 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.6.0 comp:gpu GPU related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug
Projects
None yet
Development

No branches or pull requests

4 participants