Aborted (core dumped) in `tf.raw_ops.BatchFunction` #69701

x0w3n · 2024-06-13T14:46:23Z

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

tf 2.16

Custom code

Yes

OS platform and distribution

Linux Ubuntu 22.04.3 LTS (x86_64)

Mobile device

No response

Python version

3.9.13

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

When num_batch_threads is too large, tf.raw_ops.BatchFunction triggers crash.

Standalone code to reproduce the issue

import tensorflow as tf

@tf.function
def simple_fn(x, y):
    return x + y

input_tensor = tf.constant([1.0, 2.0, 3.0], dtype=tf.float32)
captured_tensor = tf.constant([4.0], dtype=tf.float32)

def wrapped_fn(in_tensors, captured_tensors):
    return simple_fn(in_tensors[0], captured_tensors[0])

defun_func = tf.function(wrapped_fn).get_concrete_function([input_tensor], [captured_tensor])


result = tf.raw_ops.BatchFunction(
    in_tensors=[input_tensor],
    captured_tensors=[captured_tensor],
    f=defun_func,
    num_batch_threads=tf.constant(93389718, dtype=tf.int32),
    max_batch_size=10,
    batch_timeout_micros=1000,
    Tout=[tf.float32],
    max_enqueued_batches=10,
    allowed_batch_sizes=[5, 10],
    container='',
    shared_name='',
    batching_queue='',
    low_priority_max_batch_size=0,
    low_priority_batch_timeout_micros=0,
    low_priority_allowed_batch_sizes=[],
    low_priority_max_enqueued_batches=0,
    enable_large_batch_splitting=False,
    name=None
)

print(result)

Relevant log output

2024-06-13 14:42:13.314670: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-13 14:42:19.076124: F external/local_tsl/tsl/platform/default/env.cc:74] Check failed: ret == 0 (11 vs. 0)Thread batch_threads_ creation via pthread_create() failed.
Aborted (core dumped)

sushreebarsa · 2024-06-19T09:37:22Z

@x0w3n I was able to replicate the issue reported here. One workaround could be to use tf.raw_ops.BatchFunction that offers low-level control, a simpler approach for batching a function might be using tf.data.Dataset.map with vectorization?
Could you please let us know if it helps?
Thank you!

github-actions · 2024-06-27T01:50:52Z

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler bot added the type:bug Bug label Jun 13, 2024

google-ml-butler bot assigned sushreebarsa Jun 13, 2024

sushreebarsa added comp:ops OPs related issues TF 2.16 labels Jun 19, 2024

sushreebarsa added the stat:awaiting response Status - Awaiting response from author label Jun 19, 2024

github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aborted (core dumped) in `tf.raw_ops.BatchFunction` #69701

Aborted (core dumped) in `tf.raw_ops.BatchFunction` #69701

Aborted (core dumped) in tf.raw_ops.BatchFunction #69701

Aborted (core dumped) in tf.raw_ops.BatchFunction #69701

Comments

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

Aborted (core dumped) in `tf.raw_ops.BatchFunction` #69701

Aborted (core dumped) in `tf.raw_ops.BatchFunction` #69701