[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf.data.Dataset prefetch not fetching data asynchronously #61084

Open
zackwohl opened this issue Jun 26, 2023 · 4 comments
Open

tf.data.Dataset prefetch not fetching data asynchronously #61084

zackwohl opened this issue Jun 26, 2023 · 4 comments
Assignees
Labels
comp:data tf.data related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.11 Issues related to TF 2.11 type:bug Bug type:performance Performance Issue

Comments

@zackwohl
Copy link
zackwohl commented Jun 26, 2023
Click to expand!

Issue Type

Bug

Have you reproduced the bug with TF nightly?

No

Source

source

Tensorflow Version

2.11

Custom Code

Yes

OS Platform and Distribution

Debian/Linux 11

Mobile device

No response

Python version

3.7

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current Behaviour?

After implementing a data pipeline using tf.data.Dataset to pull image data from Google Cloud Storage, TensorBoard profiler shows that the GPU compute and CPU prefetch are running synchronously. I used data.Dataset.AUTOTUNE to determine the appropriate prefetch batch size. Monitoring GPU usage while the model is running confirms this with the GPU at 0% utilization to actually computing something for about a 2:1 ratio, which is reflected in the profiler. CPU usage when monitored does not appear to max out.

I expected the prefetch to occur concurrently with GPU processing as described in the data.Dataset documentation and tutorials.

ch

cp

gp

Standalone code to reproduce the issue

os.environ["CUDA_VISIBLE_DEVICES"] = "0"
os.environ['TF_GPU_ALLOCATOR'] = "cuda_malloc_async"
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

def get_label(file_path):
    parts = tf.strings.split(file_path, os.path.sep)
    one_hot = parts[-2] == class_names
    return tf.argmax(one_hot)

def decode_img(img):
    img = tf.io.decode_image(img, channels=3, expand_animations = False)
    img = tf.image.resize(img, [244, 244])
    img = tf.cast(img, tf.float32)
    return img

def process_path(file_path):
    label = get_label(file_path)
    img = tf.io.read_file(file_path)
    img = decode_img(img)
    return img, label

def configure_for_performance(ds):
    ds = ds.batch(128)
    ds = ds.prefetch(buffer_size=tf.data.AUTOTUNE)
    return ds

files = tf.data.Dataset.list_files((data_dir + '/*/*.png'), shuffle=False)
files = files.shuffle(image_count, reshuffle_each_iteration=False)

val_size = int(image_count * 0.2)

train_files = files.skip(val_size)
val_files = files.take(val_size)

train_ds = train_files.interleave(lambda x: tf.data.Dataset.from_tensor_slices([x]), cycle_length=4, num_parallel_calls=tf.data.AUTOTUNE)
train_ds = train_ds.map(process_path, num_parallel_calls=tf.data.AUTOTUNE)

val_ds = val_files.interleave(lambda x: tf.data.Dataset.from_tensor_slices([x]), cycle_length=4, num_parallel_calls=tf.data.AUTOTUNE)
val_ds = val_ds.map(process_path, num_parallel_calls=tf.data.AUTOTUNE)

train_ds = configure_for_performance(train_ds)
val_ds = configure_for_performance(val_ds)

Relevant log output

No response

@google-ml-butler google-ml-butler bot added the type:bug Bug label Jun 26, 2023
@SuryanarayanaY SuryanarayanaY added TF 2.11 Issues related to TF 2.11 comp:data tf.data related issues type:performance Performance Issue labels Jun 27, 2023
@SuryanarayanaY
Copy link
Collaborator

Hi @zackwohl ,

Thanks for reaching us. Could you able to submit a colab gist replicating the reported behaviour with an image dataset.

Also can you confirm the behaviour with buffer_size=1 or 2 instead of tf.data.AUTOTUNE just to cross check the behaviour.

Thanks!

@SuryanarayanaY SuryanarayanaY added the stat:awaiting response Status - Awaiting response from author label Jun 27, 2023
@zackwohl
Copy link
Author
zackwohl commented Jun 28, 2023

Hi @SuryanarayanaY, I tried running this with buffer_size=2, and it continued to run synchronously. I've attached images of the tensorboard profiler trace viewer.

gpu_2
pf_2

How would I submit colab gist and what would you need in terms of data? I currently have my code in a jupyter notebook.

Thanks

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Jun 28, 2023
@zackwohl
Copy link
Author

Hi @SuryanarayanaY, just wanted to follow up on next steps here.

@zackwohl
Copy link
Author
zackwohl commented Jul 5, 2023

Hi @SuryanarayanaY, I'm still awaiting a response.

@SuryanarayanaY SuryanarayanaY added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:data tf.data related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.11 Issues related to TF 2.11 type:bug Bug type:performance Performance Issue
Projects
None yet
Development

No branches or pull requests

2 participants