tf.keras.model fit() significantly slower when using weighted validation data in comparison to tf2.1.0 #39588

sirvincent · 2020-05-15T19:33:26Z

Please make sure that this is an issue related to performance of TensorFlow.
As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:performance_template

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
ArchLinux & Ubuntu 18.04 LTS
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
No
TensorFlow installed from (source or binary):
binary
TensorFlow version (use command below):
v2.2.0-rc4-8-g2b96f3662b 2.2.0
(compared to: v2.1.0-rc2-17-ge5bf8de 2.1.0)
Python version:
3.7.5

The ArchLinux machine runs on CPU
The Ubuntu machine runs on GPU with:

CUDA/cuDNN version:
10.1.243
GPU model and memory:
GeForce GTX 1080 with 7126 MB memory

Describe the current behavior
When training a simple tf.keras.model multilayer perceptron with a call to .fit() containing a validation_data that contains weights results in a significant slower fit() then in comparison to TensorFlow 2.1.0 with the exact same code.

Describe the expected behavior
Similar performance between TensorFlow 2.1.0 and 2.2.0 when training a tf.keras.model with a weighted validation data set.

Standalone code to reproduce the issue
Package requirements for code snippet using python 3.7.5:

numpy= "==1.18.2"
tensorflow = "==2.2.0"
tensorflow-datasets = "==3.1.0"

import typing

import numpy as np
from tensorflow import keras
import tensorflow_datasets as tfds


def build_neural_network(input_dimension: int, number_of_classes: int, compile_options: dict):
    model = keras.Sequential()
    model.add(keras.layers.Dense(112, activation='relu', input_dim=input_dimension))
    model.add(keras.layers.Dense(112, activation='relu'))
    model.add(keras.layers.Dense(number_of_classes, activation='softmax'))

    model.compile(**compile_options)

    print(model.summary())

    return model

def load_in_images_and_labels_and_reshape(dataset) -> typing.Tuple[np.ndarray, np.ndarray]:
    images = []
    labels = []
    for image, label in tfds.as_numpy(dataset):
        new_image_shape = image.shape[0] * image.shape[1]
        images.append(image.reshape(new_image_shape))
        labels.append(label)

    return np.array(images), np.array(labels)


def train_neural_network(is_random_weighing: bool):
    dataset_train      = tfds.load('emnist', split='train', as_supervised=True)
    dataset_validation = tfds.load('emnist', split='test', as_supervised=True)

    train_images, train_labels           = load_in_images_and_labels_and_reshape(dataset_train)
    validation_images, validation_labels = load_in_images_and_labels_and_reshape(dataset_validation)
    train_labels      = keras.utils.to_categorical(train_labels)
    validation_labels = keras.utils.to_categorical(validation_labels)

    print("load")
    compile_options =  {
        "loss": "categorical_crossentropy",
        "optimizer": "adam",
        "metrics": ["categorical_accuracy"],
        "weighted_metrics": ["categorical_accuracy"]
    }
    network = build_neural_network(train_images.shape[-1], len(train_labels[0]), compile_options)

    fit_options = {    
        "batch_size": 2048,
        "epochs": 10,
        "verbose": 1,
        "workers": 1
    }
    if is_random_weighing:
        random_weights = np.random.rand(len(validation_images))
        validation_data_tuple = (validation_images, validation_labels, random_weights)
    else:
        validation_data_tuple = (validation_images, validation_labels)
    history = network.fit(train_images, train_labels, validation_data=validation_data_tuple, **fit_options)


if __name__ == "__main__":
    is_random_weighing = True
    train_neural_network(is_random_weighing)

Other info / logs
Running the above code snippet on the ArchLinux machine, run on CPU:
takes roughly 19 seconds per epoch. When the same code is run in TensorFlow 2.1.0 it takes roughly 5 seconds per epoch. When the weighing off the validation dataset is turned off with TensorFlow 2.2.0 (is_random_weighing = False) the performance becomes similar to TensorFlow 2.1.0; roughly 5 seconds per epoch.
The slowdown is also seen on the Ubuntu machine, run on GPU, but then due to likely different hardware, tf 2.2.0 is 7 times as slow as tf 2.1.0.

The effect was not seen (but maybe it was not measurable) when using mnist in place of emnist.

The issue seems related to: #39039
In which the comment by @romanovzky brought to light that it might be due to the validation data or validation split. Although that is in the context of comparing a tensorflow estimator to keras.

This issue also seems related to: #39434
In which also from tf2.1 to tf.2.2 a significant performance drop is seen.

It seems like another small puzzle piece in a larger puzzle (or I do something simple wrong on both machines).

amahendrakar · 2020-05-18T10:21:01Z

Was able to reproduce the issue. TF v2.2 and TF-nightly take more time for each epoch when compared to TF v2.1. Please find the attached gist. Thanks!

jarednielsen · 2020-05-30T06:32:10Z

Any update on these performance problems in TF 2.2? This issue is one of many; see also #39665 and #38675 and #39574 and #39434.

What is the status?

edwardyehuang · 2020-06-01T00:27:37Z

Tensorflow 2.2 takes much more time than 2.1/2.0 to start training, after called "keras.fit".

2020-06-01 10:16:44.991459: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-01 10:16:46.235945: W tensorflow/stream_executor/gpu/asm_compiler.cc:81] Running ptxas --version returned 256
2020-06-01 10:16:46.328871: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code 256, output: 
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
2020-06-01 10:16:48.148004: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-01 10:23:36.473814: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session started.

It stucks about 7 mins to start training.

edwardyehuang · 2020-06-01T00:30:19Z

Log from 2.1

INFO:tensorflow:batch_all_reduce: 436 all-reduces with algorithm = nccl, num_packs = 1, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
I0531 20:55:03.956965 139684401002304 cross_device_ops.py:760] batch_all_reduce: 436 all-reduces with algorithm = nccl, num_packs = 1, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
INFO:tensorflow:batch_all_reduce: 436 all-reduces with algorithm = nccl, num_packs = 1, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
I0531 20:55:14.695299 139684401002304 cross_device_ops.py:760] batch_all_reduce: 436 all-reduces with algorithm = nccl, num_packs = 1, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
2020-05-31 20:55:39.932592: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-31 20:55:41.811100: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-31 20:55:48.718710: I tensorflow/core/profiler/lib/profiler_session.cc:225] Profiler session started.

jarednielsen · 2020-06-01T00:47:11Z

Interesting that the random weighing causes the performance slowdown. In my case, turning on dropout layers (even with dropout_prob=0) causes the performance slowdown. Could it be something in the tensorflow randomness modules?

sirvincent · 2020-06-09T17:02:28Z

What is the status of this issue? Is someone actively looking into it? If not is there any estimation when someone might look at this issue?
It is understandable that handling issues might take a while, especially due to the huge quantity tensorflow receive!

Is there something I can do to help?

Currently this issue circumvents us from updating to tensorflow 2.2 and thus updating to python 3.8. Luckily python 3.8 or tensorflow 2.2 is not yet a requirement.

goldiegadde · 2020-06-17T22:38:33Z

@sirvincent thanks for reporting the issue, a fix was submitted in 1d2d05f. that is available in the latest nightly.

sirvincent · 2020-06-19T16:05:15Z

Thanks @goldiegadde I have tested tf-nightly 2.3.0.dev20200619 and the issue seems to be fixed.
Thank you!

romanovzky · 2020-08-02T13:25:51Z

This regretion is not completely fixed with 2.3.0. It seems that, for whatever reason, the first epoch needs a long time to start, and the first validation step is also very slow. From the 2nd epoch onward, the epoch times are comparable.
~~This can be reproduced in this colab~~. EDIT: Colab link removed as it is pointing to another colab, I have lost (probably deleted) the original one.

jvishnuvardhan · 2020-08-03T23:53:18Z

@romanovzky Can you please open a new issue with the gist (you already have one above). Thanks!

MarioTro · 2021-02-15T09:18:19Z

I had the same issue and was able to circumvent it by converting my weights numpy-array into a pandas series. Training now starts immediately and I do not have to wait anymore.
pd.Series(my_weights)

meni432 · 2021-02-18T14:03:13Z

I had the same issue and was able to circumvent it by converting my weights numpy-array into a pandas series. Training now starts immediately and I do not have to wait anymore.
pd.Series(my_weights)

This is works for me, but how that actual works? the API isn't only numpy array?

romanovzky · 2021-02-18T14:38:40Z

The problem is also fixed if you use a generator (keras Sequence), which is what I have been using.

LucaCappelletti94 · 2021-06-27T11:51:52Z

I had the same issue and was able to circumvent it by converting my weights numpy-array into a pandas series. Training now starts immediately and I do not have to wait anymore.
pd.Series(my_weights)

If this work, I call sorcery. Thank you!

Brentbin · 2021-11-03T08:44:26Z

I had the same issue and was able to circumvent it by converting my weights numpy-array into a pandas series. Training now starts immediately and I do not have to wait anymore. pd.Series(my_weights)

It's work for me

nershman · 2022-03-05T05:50:01Z

New issue has been opened recently: #48965

sirvincent added the type:performance Performance Issue label May 15, 2020

google-ml-butler bot assigned amahendrakar May 15, 2020

sirvincent changed the title ~~tf.keras.model fit() significantly slower when using using weighted validation data in comparison to tf2.1.0~~ tf.keras.model fit() significantly slower when using weighted validation data in comparison to tf2.1.0 May 15, 2020

amahendrakar added comp:keras Keras related issues TF 2.2 Issues related to TF 2.2 labels May 18, 2020

amahendrakar assigned jvishnuvardhan and unassigned amahendrakar May 18, 2020

jvishnuvardhan added the regression issue To spot regression issues in latest version label May 19, 2020

jvishnuvardhan assigned pavithrasv and unassigned jvishnuvardhan May 19, 2020

jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label May 19, 2020

jvishnuvardhan assigned omalleyt12 Jun 9, 2020

goldiegadde assigned yhliang2018 and unassigned pavithrasv and omalleyt12 Jun 17, 2020

goldiegadde added this to In progress in TensorFlow 2.3.0 Jun 17, 2020

sirvincent closed this as completed Jun 19, 2020

TensorFlow 2.3.0 automation moved this from In progress to Done Jun 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tf.keras.model fit() significantly slower when using weighted validation data in comparison to tf2.1.0 #39588

tf.keras.model fit() significantly slower when using weighted validation data in comparison to tf2.1.0 #39588

tf.keras.model fit() significantly slower when using weighted validation data in comparison to tf2.1.0 #39588

tf.keras.model fit() significantly slower when using weighted validation data in comparison to tf2.1.0 #39588

Comments