[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE #57359

Open
Co1lin opened this issue Aug 22, 2022 · 17 comments
Open

failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE #57359

Co1lin opened this issue Aug 22, 2022 · 17 comments
Assignees
Labels
comp:apis Highlevel API related issues TF 2.9 Issues found in the TF 2.9 release (or RCs) type:bug Bug

Comments

@Co1lin
Copy link
Co1lin commented Aug 22, 2022

Issue Type

Bug

Source

binary

Tensorflow Version

v2.9.0-18-gd8ce9f9c301 2.9.1

Custom Code

No

OS Platform and Distribution

Linux Ubuntu 20.04.4 LTS

Mobile device

No response

Python version

No response

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current Behaviour?

I have a dynamic keras.Model named symbol_net. When executing forward computation (call call method), sometimes it crashes as follows if there's a Dense layer in the model.

I have searched on the Internet and tries so many solutions including combining them, like

import tensorflow as tf  # type: ignore
from tensorflow import keras
from keras import layers  # type: ignore
from keras import backend as K
physical_devices = tf.config.list_physical_devices("GPU")
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.333
session = tf.compat.v1.Session(config=config)
K.set_session(session)

But all of them don't work. I have a GPU with 12 GiB. On the multi-user machine, when I was running the code, there remains 12000 MiB for me, so it's enough. My model is quite small, like this , which won't take a lot of mem.

2022-08-21 23:09:42.546282: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2022-08-21 23:09:42.546307: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
2022-08-21 23:09:42.546320: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:438 : INTERNAL: Failed initializing math mode
	outputs= (shape=(2, 2, 2, 2) dtype=<dtype: 'float32'>)
Traceback (most recent call last):
  File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
    ic(net(*input_list))
  File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/colin/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Exception encountered when calling layer "symbol_net" (type SymbolNet).

Graph execution error:

Detected at node 'dense/Tensordot/MatMul' defined at (most recent call last):
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
      ic(net(*input_list))
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/training.py", line 490, in __call__
      return super().__call__(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 547, in call
      for inst, inps, outs, op, node_id in self.instructions.data:
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 576, in call
      outputs = inst(*input_tensors)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/layers/core/dense.py", line 224, in call
      outputs = tf.tensordot(inputs, self.kernel, [[rank - 1], [0]])
Node: 'dense/Tensordot/MatMul'
Failed initializing math mode
	 [[{{node dense/Tensordot/MatMul}}]] [Op:__inference_call_146]

Call arguments received by layer "symbol_net" (type SymbolNet):
  • args=('tf.Tensor(shape=(2, 2, 2, 2), dtype=float32)', 'tf.Tensor(shape=(1, 1, 1, 1), dtype=float32)')
  • kwargs={'training': 'None'}

Standalone code to reproduce the issue

Currently my code is large. Sorry.

Relevant log output

2022-08-21 23:09:55.580410: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.601460: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.601638: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.602081: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-21 23:09:55.603250: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.603399: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.603554: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.915740: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.915925: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.916011: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:55.916113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4013 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
2022-08-21 23:09:56.068318: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068541: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068654: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068796: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068904: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.068997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4013 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
2022-08-21 23:09:56.183640: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.183809: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.183889: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.184001: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.184083: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-21 23:09:56.184142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4013 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6

2022-08-21 23:09:57.669085: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2022-08-21 23:09:57.669107: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
2022-08-21 23:09:57.669119: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:438 : INTERNAL: Failed initializing math mode
	outputs= (shape=(1, 1) dtype=<dtype: 'float32'>)
Traceback (most recent call last):
  File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
    ic(net(*input_list))
  File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/colin/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Exception encountered when calling layer "symbol_net" (type SymbolNet).

Graph execution error:

Detected at node 'dense/MatMul' defined at (most recent call last):
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 1899, in <module>
      ic(net(*input_list))
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/training.py", line 490, in __call__
      return super().__call__(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 547, in call
      for inst, inps, outs, op, node_id in self.instructions.data:
    File "/home/colin/code/nnsmith/nnsmith/graph_gen_2.py", line 576, in call
      outputs = inst(*input_tensors)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "/home/colin/miniconda3/lib/python3.10/site-packages/keras/layers/core/dense.py", line 221, in call
      outputs = tf.matmul(a=inputs, b=self.kernel)
Node: 'dense/MatMul'
Failed initializing math mode
	 [[{{node dense/MatMul}}]] [Op:__inference_call_156]

Call arguments received by layer "symbol_net" (type SymbolNet):
  • args=('tf.Tensor(shape=(2, 2, 2, 1), dtype=float32)', 'tf.Tensor(shape=(1,), dtype=float32)')
  • kwargs={'training': 'None'}
@google-ml-butler google-ml-butler bot added the type:bug Bug label Aug 22, 2022
@sushreebarsa sushreebarsa added the TF 2.9 Issues found in the TF 2.9 release (or RCs) label Aug 23, 2022
@sushreebarsa
Copy link
Contributor

@Co1lin
In order to expedite the trouble-shooting process, please provide a complete code snippet to reproduce the issue reported here.
Thank you!

@Co1lin
Copy link
Author
Co1lin commented Aug 23, 2022

@sushreebarsa I understand. Currently I use a dynamic model generation technique, and the code is really complex. I will try to manually build the same model (so the code will simple) as the one leading to crash now and see whether it can reproduce the same issue.

@sushreebarsa
Copy link
Contributor

@Co1lin Thank you for the response!
Please keep us informed if there will be any update. Thank you!

@sushreebarsa sushreebarsa added the stat:awaiting response Status - Awaiting response from author label Aug 23, 2022
@jhuus
Copy link
jhuus commented Aug 23, 2022

I'm getting the same error, on 2.9.0 but I reproduced it in 2.8.0 and 2.9.1 too. I'll see if I can create a small enough example to post.

2022-08-23 16:26:25.102620: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
2022-08-23 16:26:25.102641: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:438 : INTERNAL: Failed initializing math mode

@jhuus
Copy link
jhuus commented Aug 23, 2022

Here's more of my trace:

  predictions = self.model.predict(self.specs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
  return fn(*args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/training.py", line 2033, in predict
  tmp_batch_outputs = self.predict_function(iterator)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/training.py", line 1845, in predict_function
  return step_function(self, iterator)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/training.py", line 1834, in step_function
  outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/training.py", line 1823, in run_step
  outputs = model.predict_step(data)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/training.py", line 1791, in predict_step
  return self(x, training=False)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
  return fn(*args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/training.py", line 490, in __call__
  return super().__call__(*args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
  return fn(*args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
  outputs = call_fn(inputs, *args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
  return fn(*args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/functional.py", line 458, in call
  return self._run_internal_graph(
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/functional.py", line 596, in _run_internal_graph
  outputs = node.layer(*args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
  return fn(*args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1014, in __call__
  outputs = call_fn(inputs, *args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
  return fn(*args, **kwargs)
File "/home/jhuus/.local/lib/python3.10/site-packages/keras/layers/core/dense.py", line 221, in call
  outputs = tf.matmul(a=inputs, b=self.kernel)

Node: 'EfficientNet/predictions/MatMul'
Failed initializing math mode
[[{{node EfficientNet/predictions/MatMul}}]] [Op:__inference_predict_function_13833]

@Co1lin
Copy link
Author
Co1lin commented Aug 25, 2022

I'm sorry that currently I am not able to provide a minimal example for reproduction. Now I use a dynamic graph generation technique, and the code is not publicly available, though it will be public later. I tried to build the same graph manually and statically to see whether it can reproduce the same issue, but unfortunately it cannot.

However, I find a workaround which works for me. Below are how I find it, and I hope this can provide information for you to fix this issue.

First, let's focus on the most useful error information among those outputs:

2022-08-21 23:09:42.546282: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2022-08-21 23:09:42.546307: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
2022-08-21 23:09:42.546320: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:438 : INTERNAL: Failed initializing math mode
	outputs= (shape=(2, 2, 2, 2) dtype=<dtype: 'float32'>)

Then we can find the source code and the position reporting the error here, though the file path is not exactly the same as what is logged.

From the code near the position above, we can know it is the failure of cublasSetMathMode that causes this error.

Some posts like this say this function can be used to accelerate "TF32 Tensor Core operations", which looks related to this line:

2022-08-21 23:09:42.546282: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.

So this error is caused by TF32 related optimizations. Therefore we can disable this function following this, which is adding this line to the python code.

tf.config.experimental.enable_tensor_float_32_execution(False)

From the cublas source code shown above, we can also see another common error message here. This is the one discussed in #9489, and it can be solved by methods listed here. Note that it is not the same issue as the one we discussed here.

To sum up, we can add these lines to avoid two common issues caused by cublas. But I hope the internal issue can be fixed in the future.

tf.config.experimental.enable_tensor_float_32_execution(False)
for gpu in tf.config.experimental.list_physical_devices('GPU'):
    tf.config.experimental.set_memory_growth(gpu, True)

@jhuus Maybe you can have a try.

@Co1lin
Copy link
Author
Co1lin commented Aug 28, 2022

I have a new discovery. In my environment, if I remove import torch when using TensorFlow, this issue will disappear.

@sushreebarsa
Copy link
Contributor

@Co1lin Thank you for the update!
Please move this issue to closed status if it is resolved for you?
Thank you!

@sushreebarsa sushreebarsa added the stat:awaiting response Status - Awaiting response from author label Aug 28, 2022
@Co1lin
Copy link
Author
Co1lin commented Aug 28, 2022

@sushreebarsa Hi! I am wondering if it's better to output a more friendly error message for this assertion error? Only logging

Node: 'dense/MatMul'
Failed initializing math mode
	 [[{{node dense/MatMul}}]] [Op:__inference_call_156]

is quite confusing. If it's ok, I would like to add some extra information here., like:

Please check if there's some conflicts, like another deep learning framework (e.g. torch) is imported.
Or consider to disable TF32 optimization by `tf.config.experimental.enable_tensor_float_32_execution(False)`.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Aug 28, 2022
@jhuus
Copy link
jhuus commented Aug 28, 2022 via email

@Co1lin
Copy link
Author
Co1lin commented Aug 28, 2022

@jhuus Could you try tf.config.experimental.enable_tensor_float_32_execution(False)? I think it only sacrifices a little performance but enables you to use torch and tensorflow at the same time. And temporarily you don't need to wait for this issue being fixed.

@jhuus
Copy link
jhuus commented Aug 28, 2022 via email

@chaturv3di
Copy link
chaturv3di commented Oct 6, 2022

Updated 10min later: The problem went away after restarting the kernel.


I'm facing the same errors.

E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:442 : INTERNAL: Failed initializing math mode

And it started to appear only after I did

$ pip install setfit

Now I can reproduce this problem with this simple snippet. The exact same code below was working fine before I installed setfit.

from sklearn.base import BaseEstimator, TransformerMixin
import tensorflow_hub as hub

class UseEmbedder(TransformerMixin, BaseEstimator):
    def __init__(self):
        self._embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")
        
    def fit(self, X, y=None, sample_weight=None):
        return self
    
    def transform(self, X):
        return self._embed(X).numpy()
    
    def fit_transform(self, X, y=None, sample_weight=None):
        return self.transform(X)


embedding_transformer = UseEmbedder()
embedding_transformer.transform(['why did this just break'])

@FrickTobias
Copy link

@jhuus Could you try tf.config.experimental.enable_tensor_float_32_execution(False)? I think it only sacrifices a little performance but enables you to use torch and tensorflow at the same time. And temporarily you don't need to wait for this issue being fixed.

It worked for me, big thanks!

Removing the torch import did not work for me.

Since I haven't posted in this thread before: I was having the same issue (I think anyway). Let me know if you want me to post my entire traceback.

End of traceback:

<...>
Node: 'model/dense/MatMul'
Failed initializing math mode
	 [[{{node model/dense/MatMul}}]] [Op:__inference_train_function_9024]
2023-01-12 04:31:12.404345: W tensorflow/core/kernels/data/generator_dataset_op.cc:108] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
	 [[{{node PyFunc}}]]

@kzhai
Copy link
kzhai commented Feb 11, 2023

I am running into this error with TF 2.11.0. wondering if there is any concrete solution?

@FrickTobias
Copy link

I am running into this error with TF 2.11.0. wondering if there is any concrete solution?

There are several suggested above.

@hebiao064
Copy link

I am having the exact same issue when running it with tf 2.11, the workaround works for me:

tf.config.experimental.enable_tensor_float_32_execution(False)

Would like to know the root cause and plan to fix it if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:apis Highlevel API related issues TF 2.9 Issues found in the TF 2.9 release (or RCs) type:bug Bug
Projects
None yet
Development

No branches or pull requests

9 participants