failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE #57359

Co1lin · 2022-08-22T04:27:34Z

Issue Type

Bug

Source

binary

Tensorflow Version

v2.9.0-18-gd8ce9f9c301 2.9.1

Custom Code

No

OS Platform and Distribution

Linux Ubuntu 20.04.4 LTS

Mobile device

No response

Python version

No response

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current Behaviour?

I have a dynamic keras.Model named symbol_net. When executing forward computation (call call method), sometimes it crashes as follows if there's a Dense layer in the model.

I have searched on the Internet and tries so many solutions including combining them, like

import tensorflow as tf  # type: ignore
from tensorflow import keras
from keras import layers  # type: ignore
from keras import backend as K
physical_devices = tf.config.list_physical_devices("GPU")
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.333
session = tf.compat.v1.Session(config=config)
K.set_session(session)

But all of them don't work. I have a GPU with 12 GiB. On the multi-user machine, when I was running the code, there remains 12000 MiB for me, so it's enough. My model is quite small, like this , which won't take a lot of mem.

2022-08-21 23:09:42.546282: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2022-08-21 23:09:42.546307: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
2022-08-21 23:09:42.546320: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:438 : INTERNAL: Failed initializing math mode
	outputs= (shape=(2, 2, 2, 2) dtype=<dtype: 'float32'>)
Traceback (most recent call last):
  File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
    ic(net(*input_list))
  File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/colin/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Exception encountered when calling layer "symbol_net" (type SymbolNet).

Graph execution error:

Detected at node 'dense/Tensordot/MatMul' defined at (most recent call last):
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
      ic(net(*input_list))
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/training.py", line 490, in __call__
      return super().__call__(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 547, in call
      for inst, inps, outs, op, node_id in self.instructions.data:
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 576, in call
      outputs = inst(*input_tensors)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/layers/core/dense.py", line 224, in call
      outputs = tf.tensordot(inputs, self.kernel, [[rank - 1], [0]])
Node: 'dense/Tensordot/MatMul'
Failed initializing math mode
	 [[{{node dense/Tensordot/MatMul}}]] [Op:__inference_call_146]

Call arguments received by layer "symbol_net" (type SymbolNet):
  • args=('tf.Tensor(shape=(2, 2, 2, 2), dtype=float32)', 'tf.Tensor(shape=(1, 1, 1, 1), dtype=float32)')
  • kwargs={'training': 'None'}

Standalone code to reproduce the issue

Currently my code is large. Sorry.

Relevant log output

2022-08-21 23:09:55.580410: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.601460: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.601638: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.602081: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-21 23:09:55.603250: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.603399: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.603554: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.915740: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.915925: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.916011: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.916113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4013 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
2022-08-21 23:09:56.068318: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068541: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068654: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068796: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068904: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4013 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
2022-08-21 23:09:56.183640: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.183809: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.183889: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.184001: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.184083: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.184142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4013 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6

2022-08-21 23:09:57.669085: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2022-08-21 23:09:57.669107: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
2022-08-21 23:09:57.669119: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:438 : INTERNAL: Failed initializing math mode
	outputs= (shape=(1, 1) dtype=<dtype: 'float32'>)
Traceback (most recent call last):
  File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
    ic(net(*input_list))
  File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/colin/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Exception encountered when calling layer "symbol_net" (type SymbolNet).

Graph execution error:

Detected at node 'dense/MatMul' defined at (most recent call last):
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
      ic(net(*input_list))
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/training.py", line 490, in __call__
      return super().__call__(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 547, in call
      for inst, inps, outs, op, node_id in self.instructions.data:
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 576, in call
      outputs = inst(*input_tensors)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/layers/core/dense.py", line 221, in call
      outputs = tf.matmul(a=inputs, b=self.kernel)
Node: 'dense/MatMul'
Failed initializing math mode
	 [[{{node dense/MatMul}}]] [Op:__inference_call_156]

Call arguments received by layer "symbol_net" (type SymbolNet):
  • args=('tf.Tensor(shape=(2, 2, 2, 1), dtype=float32)', 'tf.Tensor(shape=(1,), dtype=float32)')
  • kwargs={'training': 'None'}

The text was updated successfully, but these errors were encountered:

sushreebarsa · 2022-08-23T10:48:43Z

@Co1lin
In order to expedite the trouble-shooting process, please provide a complete code snippet to reproduce the issue reported here.
Thank you!

Co1lin · 2022-08-23T10:53:05Z

@sushreebarsa I understand. Currently I use a dynamic model generation technique, and the code is really complex. I will try to manually build the same model (so the code will simple) as the one leading to crash now and see whether it can reproduce the same issue.

sushreebarsa · 2022-08-23T10:54:42Z

@Co1lin Thank you for the response!
Please keep us informed if there will be any update. Thank you!

jhuus · 2022-08-23T20:31:03Z

I'm getting the same error, on 2.9.0 but I reproduced it in 2.8.0 and 2.9.1 too. I'll see if I can create a small enough example to post.

2022-08-23 16:26:25.102620: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
2022-08-23 16:26:25.102641: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:438 : INTERNAL: Failed initializing math mode

jhuus · 2022-08-23T20:34:55Z

Here's more of my trace:

  predictions = self.model.predict(self.specs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
  return fn(*args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/training.py", line 2033, in predict
  tmp_batch_outputs = self.predict_function(iterator)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/training.py", line 1845, in predict_function
  return step_function(self, iterator)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/training.py", line 1834, in step_function
  outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/training.py", line 1823, in run_step
  outputs = model.predict_step(data)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/training.py", line 1791, in predict_step
  return self(x, training=False)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
  return fn(*args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/training.py", line 490, in __call__
  return super().__call__(*args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
  return fn(*args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
  outputs = call_fn(inputs, *args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
  return fn(*args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/functional.py", line 458, in call
  return self._run_internal_graph(
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/functional.py", line 596, in _run_internal_graph
  outputs = node.layer(*args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
  return fn(*args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
  outputs = call_fn(inputs, *args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
  return fn(*args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/layers/core/dense.py", line 221, in call
  outputs = tf.matmul(a=inputs, b=self.kernel)

Node: 'EfficientNet/predictions/MatMul'
Failed initializing math mode
[[{{node EfficientNet/predictions/MatMul}}]] [Op:__inference_predict_function_13833]

Co1lin · 2022-08-25T20:27:54Z

I'm sorry that currently I am not able to provide a minimal example for reproduction. Now I use a dynamic graph generation technique, and the code is not publicly available, though it will be public later. I tried to build the same graph manually and statically to see whether it can reproduce the same issue, but unfortunately it cannot.

However, I find a workaround which works for me. Below are how I find it, and I hope this can provide information for you to fix this issue.

First, let's focus on the most useful error information among those outputs:

2022-08-21 23:09:42.546282: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2022-08-21 23:09:42.546307: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
2022-08-21 23:09:42.546320: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:438 : INTERNAL: Failed initializing math mode
	outputs= (shape=(2, 2, 2, 2) dtype=<dtype: 'float32'>)

Then we can find the source code and the position reporting the error here, though the file path is not exactly the same as what is logged.

From the code near the position above, we can know it is the failure of cublasSetMathMode that causes this error.

Some posts like this say this function can be used to accelerate "TF32 Tensor Core operations", which looks related to this line:

2022-08-21 23:09:42.546282: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.

So this error is caused by TF32 related optimizations. Therefore we can disable this function following this, which is adding this line to the python code.

tf.config.experimental.enable_tensor_float_32_execution(False)

From the cublas source code shown above, we can also see another common error message here. This is the one discussed in #9489, and it can be solved by methods listed here. Note that it is not the same issue as the one we discussed here.

To sum up, we can add these lines to avoid two common issues caused by cublas. But I hope the internal issue can be fixed in the future.

tf.config.experimental.enable_tensor_float_32_execution(False)
for gpu in tf.config.experimental.list_physical_devices('GPU'):
    tf.config.experimental.set_memory_growth(gpu, True)

@jhuus Maybe you can have a try.

Co1lin · 2022-08-28T09:31:15Z

I have a new discovery. In my environment, if I remove import torch when using TensorFlow, this issue will disappear.

sushreebarsa · 2022-08-28T11:16:48Z

@Co1lin Thank you for the update!
Please move this issue to closed status if it is resolved for you?
Thank you!

Co1lin · 2022-08-28T11:44:44Z

@sushreebarsa Hi! I am wondering if it's better to output a more friendly error message for this assertion error? Only logging

Node: 'dense/MatMul'
Failed initializing math mode
	 [[{{node dense/MatMul}}]] [Op:__inference_call_156]

is quite confusing. If it's ok, I would like to add some extra information here., like:

Please check if there's some conflicts, like another deep learning framework (e.g. torch) is imported.
Or consider to disable TF32 optimization by `tf.config.experimental.enable_tensor_float_32_execution(False)`.

jhuus · 2022-08-28T11:56:11Z

In my case, I am trying to use torchaudio with tensorflow. If I use librosa it works but if I use torchaudio I get the error. Since torchaudio uses GPU it is much faster. If this can’t be fixed I’ll just rewrite the whole thing to use torch I guess.

…

On Sun, Aug 28, 2022 at 7:17 AM sushreebarsa ***@***.***> wrote: @Co1lin <https://github.com/Co1lin> Thank you for the update! Please move this issue to closed status if it is resolved for you? Thank you! — Reply to this email directly, view it on GitHub <#57359 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AR43UYBKGNHBWPUXATNYJRDV3NDC5ANCNFSM57GFHK4A> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Co1lin · 2022-08-28T11:58:40Z

@jhuus Could you try tf.config.experimental.enable_tensor_float_32_execution(False)? I think it only sacrifices a little performance but enables you to use torch and tensorflow at the same time. And temporarily you don't need to wait for this issue being fixed.

jhuus · 2022-08-28T12:53:26Z

Actually, just importing tensorflow before I import torchaudio fixed the problem! It makes me a little worried about other possible compatibility issues between torchaudio and tensorflow though.

…

On Sun, Aug 28, 2022 at 7:58 AM Colin ***@***.***> wrote: @jhuus <https://github.com/jhuus> Could you try tf.config.experimental.enable_tensor_float_32_execution(False)? I think it only sacrifices a little performance but enables you to use torch and tensorflow at the same time. — Reply to this email directly, view it on GitHub <#57359 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AR43UYH2XPY3RWVPWE3ZM53V3NH75ANCNFSM57GFHK4A> . You are receiving this because you were mentioned.Message ID: ***@***.***>

chaturv3di · 2022-10-06T22:54:26Z

Updated 10min later: The problem went away after restarting the kernel.

I'm facing the same errors.

E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:442 : INTERNAL: Failed initializing math mode

And it started to appear only after I did

$ pip install setfit

Now I can reproduce this problem with this simple snippet. The exact same code below was working fine before I installed setfit.

from sklearn.base import BaseEstimator, TransformerMixin
import tensorflow_hub as hub

class UseEmbedder(TransformerMixin, BaseEstimator):
    def __init__(self):
        self._embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")
        
    def fit(self, X, y=None, sample_weight=None):
        return self
    
    def transform(self, X):
        return self._embed(X).numpy()
    
    def fit_transform(self, X, y=None, sample_weight=None):
        return self.transform(X)


embedding_transformer = UseEmbedder()
embedding_transformer.transform(['why did this just break'])

FrickTobias · 2023-01-12T04:41:28Z

@jhuus Could you try tf.config.experimental.enable_tensor_float_32_execution(False)? I think it only sacrifices a little performance but enables you to use torch and tensorflow at the same time. And temporarily you don't need to wait for this issue being fixed.

It worked for me, big thanks!

Removing the torch import did not work for me.

Since I haven't posted in this thread before: I was having the same issue (I think anyway). Let me know if you want me to post my entire traceback.

End of traceback:

<...>
Node: 'model/dense/MatMul'
Failed initializing math mode
	 [[{{node model/dense/MatMul}}]] [Op:__inference_train_function_9024]
2023-01-12 04:31:12.404345: W tensorflow/core/kernels/data/generator_dataset_op.cc:108] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
	 [[{{node PyFunc}}]]

kzhai · 2023-02-11T01:02:19Z

I am running into this error with TF 2.11.0. wondering if there is any concrete solution?

FrickTobias · 2023-02-13T00:49:32Z

I am running into this error with TF 2.11.0. wondering if there is any concrete solution?

There are several suggested above.

hebiao064 · 2023-07-20T21:58:04Z

I am having the exact same issue when running it with tf 2.11, the workaround works for me:

tf.config.experimental.enable_tensor_float_32_execution(False)

Would like to know the root cause and plan to fix it if possible.

google-ml-butler bot added the type:bug Bug label Aug 22, 2022

google-ml-butler bot assigned sushreebarsa Aug 22, 2022

sushreebarsa added the TF 2.9 Issues found in the TF 2.9 release (or RCs) label Aug 23, 2022

sushreebarsa added the stat:awaiting response Status - Awaiting response from author label Aug 23, 2022

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Aug 25, 2022

Co1lin mentioned this issue Aug 25, 2022

failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE ise-uiuc/nnsmith#34

Closed

sushreebarsa added the stat:awaiting response Status - Awaiting response from author label Aug 28, 2022

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Aug 28, 2022

sushreebarsa added the comp:apis Highlevel API related issues label Aug 29, 2022

sushreebarsa assigned gadagashwini and unassigned sushreebarsa Aug 29, 2022

gadagashwini assigned gowthamkpr and unassigned gadagashwini Sep 2, 2022

bluenote10 mentioned this issue Sep 4, 2022

Work-around for preprocessing failing with: INTERNAL: Failed initializing math mode acids-ircam/ddsp_pytorch#35

Open

shenw000 mentioned this issue Oct 5, 2022

dnnf installation errors dlshriver/dnnf#13

Open

shenw000 mentioned this issue Oct 24, 2022

Error in running DNNF-GHPR benchmark dlshriver/dnnf#14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE #57359

failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE #57359

failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE #57359

failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE #57359

Comments

Issue Type

Source

Tensorflow Version

Custom Code

OS Platform and Distribution

Mobile device

Python version

Bazel version

GCC/Compiler version

CUDA/cuDNN version

GPU model and memory

Current Behaviour?

Standalone code to reproduce the issue

Relevant log output