Performance: Training is much slower in TF v2.0.0 VS v1.14.0 when using `Tf.Keras` and `model.fit_generator` #33024

Raukk · 2019-10-03T15:53:21Z

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information
NOTE: I have provided Google Colab' notebooks to reproduce the slowness.

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): sortof, but is a basically and MNIST example.
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Google Colab and Windows
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: NA
TensorFlow installed from (source or binary): pip install tensorflow-gpu
TensorFlow version (use command below): 2.0.0
Python version: 3
Bazel version (if compiling from source): NA
GCC/Compiler version (if compiling from source): NA
CUDA/cuDNN version: 10 or Google Colab
GPU model and memory: 1080 Ti, or Google Colab

You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
It happens on the standard Colab GPU instance

Describe the current behavior
Version 2.0.0 is SLOW compared to the identical code running v1.14.0.
The code I have used to demonstrate it is very simple, and very similar to most existing Keras examples.
A Larger NN on MNIST is going from 10s per epoch to ~20s and that is a very major slowdown.

Describe the expected behavior
A new version should have similar or better performance than the previous version.
If user error or a new limitation/feature is causing the problem, it should be warned about in Update Notes/Quick Start. This code was perfectly normal in TF 1.X

Code to reproduce the issue
See this (GPU) Colab Notebook example with MNIST Data:
https://colab.research.google.com/gist/Raukk/f0927a5e2a357f2d80c9aeef1202e6ee/example_slow_tf2.ipynb

See this (GPU) Colab Notebook example with numpy random for Data:
https://colab.research.google.com/gist/Raukk/518d3d21e08ad02089429529bd6c67d4/simplified_example_slow_tf2.ipynb

See this (GPU) Colab Notebook example using standard Conv2D (not DepthwiseConv2D):
https://colab.research.google.com/gist/Raukk/4f102e192f47a6dc144b890925b652f8/standardconv_example_slow_tf2.ipynb

Please notify me if you cannot access any of these notebooks, or if they do not run, or don't sufficiently reproduce the issue.

Other info / logs
Each example above starts with a TLDR; that gives a very basic summary of results.

Thank you!

The text was updated successfully, but these errors were encountered:

DanMinhNguyen · 2019-10-03T19:09:32Z

Can confirm from my own experience with it. I had a similar issue with my own project when switching to TF2 (stable. I waited for the official release a few days ago), with a 2x to 3x decrease in training time for the same data and code, as compared to TF1. After some Google searching and reading, I then proceeded to implement the code using tf.data.Dataset.from_generator(), instead, which allows me to use model.fit().

Unfortunately there was 0 performance benefit either way.

As for some pseudocode (posting here just in case someone can point out something fundamentally wrong with my setup), my fit_generator version of my code went something like this below. All my code uses the internal tf.keras instead of the external one:

def datagen(args):
    while True:
        #some code here to load and manipulate data into x and y. Mostly numpy functions
        yield x,y

#some code here to create and compile model 

model.fit_generator(datagen(args), . . . )

For the pseudocode using tf.data.Dataset.from_generator():

from tensorflow.compat.v2.data import Dataset

def datagen(args):
    while True:
        #some code here to load and manipulate data into x and y. Mostly numpy functions
        yield x,y

#some code here to create and compile model 

train_data = Dataset.from_generator(generator=lambda: datagen(args), . . . )
model.fit(train_data , . . . )

Szubie · 2019-10-03T21:54:00Z

Have also experienced a large (4x) increase in training times for Keras models when using fit_generator after upgrading to TensorFlow 2.0.

Execution times became comparable to TF 1.14 when disabling eager execution by running:
tf.compat.v1.disable_eager_execution().

Raukk · 2019-10-03T23:12:18Z

@Szubie Thanks, that does seem to improve the performance back to standard levels, or at least close enough.

I wonder if there is a better fix for this issue?

tf.compat.v1.disable_eager_execution() seems like a workaround more than a fix, especially since I doubt V1 compatibility going to be maintained forever.

DanMinhNguyen · 2019-10-04T21:53:43Z

@Szubie Thanks for you input!

Unfortunately running tf.compat.v1.disable_eager_execution() does not seem to to work any better for me, as compared to TF1.10 that I am using. Running 1 epoch of 100 update iterations takes, on average, 43 seconds with TF2, but only 20 seconds with TF1.

Marmotte06 · 2019-10-06T12:40:14Z

I have the same issue. I just installed Tensorflow-gpu 2.0 and modified my Keras code to use the "native" Keras module in Tensorflow.
My model that used to train in 12 sec/epoch with TF 1.14.0 now takes 83 sec/epoch.
And my RTX 2070 GPU is used at only 3% of its power !
I also used a fit_generator for the training.

I am on Windows 10 x64, CUDA 10.0
CPU Core i9 9900K
32 GB RAM for the CPU, 8GB RAM for the GPU
1TB NVME SSD

TF-GPU 2.0 as it is is just unusable, I roll back to 1.14
Hope that will be fixed soon.

mihaimaruseac · 2019-10-07T15:35:02Z

Can you test the 1.15 release candidate and tell us if you still see the slowdown, as we're trying to identify the root cause?

martinwicke · 2019-10-07T15:48:08Z

@karmel, @robieta, this looks like a problem with plain numpy input and fit_generator, both CPU and GPU. Can you take a look?

robieta · 2019-10-07T16:31:48Z

First of all, thank you for the wonderful repro. I can't tell you how much easier it makes all of this.

It looks like fit_generator is incorrectly falling back to the eager path, which is why training is slower. I will look into why, but in the mean time can you try using model.fit? It actually also supports generators (we plan to deprecate the fit_generator endpoint at some point as it is now obsolete), and in my testing is actually faster than the 1.14 baseline.

DanMinhNguyen · 2019-10-07T20:02:38Z

@robieta I had also attempted model.fit() without any improvement to performance. It may have to do with my current implementation, so I'm pasting some pseudo code below. I am hoping there is something fundamentally wrong with it (I'm thinking its the usage of lambda):

For the pseudocode using tf.data.Dataset.from_generator():

from tensorflow.compat.v2.data import Dataset

def datagen(args):
    while True:
        #some code here to load and manipulate data into x and y. Mostly numpy functions
        yield x,y

#some code here to create and compile model 

#I'm thinking the performance issue here is in using lambda. However, without this I get a 
#"'generator' must be callable" error

train_data = Dataset.from_generator(generator=lambda: datagen(args), . . . )
model.fit(train_data , . . . )

DanMinhNguyen · 2019-10-07T20:22:20Z

@mihaimaruseac I just tested my code on TF version 1.15.0rc2. It seems to be equally as slow as TF2.0, I hope this helps with your debugging!

EDIT: TF1.15.0rc2 seems to be faster by a few seconds (35-40s per epoch) as compared to TF2.0 (40-45s per epochs).

robieta · 2019-10-07T20:33:07Z

It sounds like @Raukk and @Szubie (and maybe @Marmotte06) are hitting the issue I described above with fit_generator running eagerly, while @DanMinhNguyen's issue is likely different. A simple way to check is to pass the generator function directly into Model.fit. It will also call Dataset.from_generator under the hood, but with best practice optimizations like prefetching. I would also suggest that you change your datagen to:

def datagen(args):
    while True:
        #some code here to load and manipulate data into x and y. Mostly numpy functions
        while True:  # Rule out that the generator itself is the bottleneck by repeating one batch.
            yield x,y

If that doesn't help you'll need to create a colab which demonstrates the difference in Model.fit between 1.14 and 1.15 / 2.0

DanMinhNguyen · 2019-10-07T20:38:45Z

@robieta Thanks for the comment! I'll try it out later this evening or tomorrow and get back to you. If I'm still having issues, will make a colab to share and demonstrate it. Currently I was using some internal data which I cannot share, hence the pseudocode.

DanMinhNguyen · 2019-10-07T22:56:40Z

@robieta So I just ran my code by passing the generator function directly into model.fit(), and that seemed to fix the issue completely!

Basically my pseudo code now looks like this:

def datagen(args):
    while True:
        #some code here to load and manipulate data into x and y. Mostly numpy functions
        yield x,y

#some code here to create and compile model 

model.fit(datagen(args) , . . . )

So basically what I learned is:

don't use model.fit_generator() anymore
don't call Dataset.from_generator() separately
just use model.fit() and pass the generator directly into it.

Thanks so much for everything!

(As a side note for anyone else reading: the validation_data argument in model.fit() can also take a generator directly as input)

robieta · 2019-10-07T23:09:08Z

Excellent! I'm planning on just aliasing (fit / evaluate / predict)_generator to (fit / evaluate / predict), as those methods are now strictly superior.

Raukk · 2019-10-08T01:58:35Z

I have ran the Colab's I posted on version 1.15.0-rc2 and got the exact same results as on version 1.14.0 for performance.

On version 2.0.0 I get comparable performance when I use model.fit( instead of model.fit_generator( OR if I use tf.compat.v1.disable_eager_execution().
Switching to model.fit( is the solution I will use for all my code.

I vote that .fit_generator( become an Alias for .fit( because that would resolve the performance issues without breaking any existing examples (and allowing those examples to work on all versions of TF). I'm sure .fit_generator( will go away eventually, but since TF 2.0 just released, I'd want to keep it backwards compatible (especially for writing examples).

max1mn · 2019-10-08T06:25:32Z

Hello everyone!

I have the same problem and was looking on a solution. I was also using fit_generator but on a Sequence class. The proposed change to simply use fit method is giving me errors, is there some workaround in sequence class case? I'm using tf 2.0.0

`
class LSTMSequence(Sequence):

def __init__(self, x, subnet_x, y, batch_size):
    self.batch_size = batch_size
    self.x, self.subnet_x, self.y = x, subnet_x, y

def __len__(self):
    return math.ceil(len(self.x) / self.batch_size)

def __getitem__(self, idx):
    batch_x = list()
    # x
    batch_x.append(np.array(pad_sequences(self.x[idx * self.batch_size:(idx + 1) *
                                                                       self.batch_size],
                                          padding='post', dtype='float32')))
    # subnet x
    for subnet_x_data in self.subnet_x:
        batch_x.append(np.array(pad_sequences(subnet_x_data[idx * self.batch_size:(idx + 1) *
                                                                                  self.batch_size],
                                              padding='post', dtype='int16')))
    # y
    batch_y = np.array(self.y[idx * self.batch_size:(idx + 1) *
                                                    self.batch_size])
    return batch_x, batch_y

`

and errors i have when use fit instead of fit_generator:

Traceback (most recent call last): File "train.py", line 111, in <module> model.fit(sequence, epochs=NUM_EPOCHS, verbose=2, validation_data=val_sequence, File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 728, in fit use_multiprocessing=use_multiprocessing) File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 224, in fit distribution_strategy=strategy) File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 547, in _process_training_inputs use_multiprocessing=use_multiprocessing) File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 606, in _process_inputs use_multiprocessing=use_multiprocessing) File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\keras\engine\data_adapter.py", line 613, in __init__ output_shapes=nested_shape) File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\data\ops\dataset_ops.py", line 540, in from_generator output_types, tensor_shape.as_shape, output_shapes) File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\data\util\nest.py", line 471, in map_structure_up_to results = [func(*tensors) for tensors in zip(*all_flattened_up_to)] File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\data\util\nest.py", line 471, in <listcomp> results = [func(*tensors) for tensors in zip(*all_flattened_up_to)] File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py", line 1216, in as_shape return TensorShape(shape) File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py", line 776, in __init__ self._dims = [as_dimension(d) for d in dims_iter] File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py", line 776, in <listcomp> self._dims = [as_dimension(d) for d in dims_iter] File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py", line 718, in as_dimension return Dimension(value) File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py", line 193, in __init__ self._value = int(value) TypeError: int() argument must be a string, a bytes-like object or a number, not 'tuple'

robieta · 2019-10-08T07:19:33Z

@max1mn If you replace return batch_x, batch_y with return tuple(batch_x), batch_y your code should work. This stems from a historic decision in tf.data about how to treat lists. I will make fit robust to this, but adding that tuple() will immediately unblock you. Sorry for the inconvenience.

max1mn · 2019-10-08T08:17:42Z

@robieta Thank you very much, the code passes with than change. However, another problem appeared - it seemes the model is running on cpu only now. The training time is 254s per epoch, with 2.0 fit generator (and gpu) it was about 70s and with 1.14 gpu it was 20s. There are warnings, dont know whether they are related. Anyway, simply aliasing fit_generator to fit can break using gpu as in my case

`
Train for 678 steps, validate for 42 steps
Epoch 1/150

2019-10-08 11:05:39.314124: W tensorflow/core/grappler/optimizers/implementation_selector.cc:310] Skipping optimization due to error while loading function libraries: Invalid argument: Functions '__inference___backward_cudnn_lstm_with_fallback_6231_7688' and '__inference___backward_cudnn_lstm_with_fallback_6231_7688_specialized_for_StatefulPartitionedCall_at___inference_distributed_function_8683' both implement 'lstm_1489aaa8-07c9-4313-8db7-7c40df79c8a8' but their signatures do not match.
2019-10-08 11:09:42.401028: W tensorflow/core/grappler/optimizers/implementation_selector.cc:310] Skipping optimization due to error while loading function libraries: Invalid argument: Functions '__inference_cudnn_lstm_with_fallback_10665' and '__inference_cudnn_lstm_with_fallback_10665_specialized_for_model_concatenate_lstm2_StatefulPartitionedCall_at___inference_distributed_function_12157' both implement 'lstm_6f710079-b289-4f03-b16c-816fc6d27388' but their signatures do not match.

678/678 - 254s - loss: 4.6264 - val_loss: 0.6066
`

robieta · 2019-10-08T15:17:51Z

@max1mn I think that is a separate issue. Can you create a new issue with a minimal repro and cc @qlzh727? (The error you're seeing is in the part of the LSTM that tries to use CuDNN if applicable.)

qlzh727 · 2019-10-08T16:00:10Z

The warning message above actually means the function has been optimized for cudnn backend. I have suppress this warning in some previous change, but might not be in 2.0. You can ignore it for the moment.

Seterplus · 2019-11-07T17:35:02Z

Excellent! I'm planning on just aliasing (fit / evaluate / predict)_generator to (fit / evaluate / predict), as those methods are now strictly superior.

I've also encountered this performance issue:

	fit	fit_generator
tf1	25s	13s
tf2	4s	28s

After using tf.compat.v1.disable_eager_execution(), the training time of fit_generator in tf2 reduces to 14s. It's comparable to tf1 but still 3x slower than fit in tf2.
model.fit(x=sequence, ...) also completes the training in 14s but it seems to load all data into memory and log "Filling up shuffle buffer (this may take a while)" if I set shuffle=True.
Any ideas?

robieta · 2019-11-07T17:58:59Z

I had a commit that aliased fit_generator to fit, but I had to roll it back as it broke some use cases. I'm rolling it forward today now that those issues are resolved. Part of that fix includes better handling of sequences (make sure you use the Sequence class in tf.keras, not keras-team/keras) which will not pull all of the data into memory when shuffling.

ychervonyi · 2019-11-09T05:27:10Z

Excellent! I'm planning on just aliasing (fit / evaluate / predict)_generator to (fit / evaluate / predict), as those methods are now strictly superior.

I've also encountered this performance issue:
fit fit_generator
tf1 25s 13s
tf2 4s 28s

After using tf.compat.v1.disable_eager_execution(), the training time of fit_generator in tf2 reduces to 14s. It's comparable to tf1 but still 3x slower than fit in tf2.
model.fit(x=sequence, ...) also completes the training in 14s but it seems to load all data into memory and log "Filling up shuffle buffer (this may take a while)" if I set shuffle=True.
Any ideas?

I see the same performance. My question: is there a way to set buffer size and not load all data when calling .fit() on Sequence right now? @robieta Thank you!

robieta · 2019-11-09T08:05:49Z

@ychervonyi Are you using the latest tf-nightly? ac20030 is the relevant change, and should be in tf-nightly==2.1.0.dev20191109 Feel free to post a repro colab if you're seeing Sequence shuffling handled inefficiently.

Seterplus · 2019-11-09T17:19:10Z

@ychervonyi Are you using the latest tf-nightly? ac20030 is the relevant change, and should be in tf-nightly==2.1.0.dev20191109 Feel free to post a repro colab if you're seeing Sequence shuffling handled inefficiently.

I'm still using tensorflow-gpu, whose latest version is 2.0.0. When I use model.fit(x=generator, shuffle=False, workers=8, ...), it seems that there is still only one worker whether I set multiprocessing=True or not. Could you please verify this behavior?

goldiegadde · 2019-12-05T22:25:40Z

@Seterplus Could you please try with tensorflow-gpu==2.1.0rc0, this has the fix ac20030
that @robieta mentioned above.

Dr-Gandalf · 2019-12-11T20:19:01Z

First of all, thank you for the wonderful repro. I can't tell you how much easier it makes all of this.

It looks like fit_generator is incorrectly falling back to the eager path, which is why training is slower. I will look into why, but in the mean time can you try using model.fit? It actually also supports generators (we plan to deprecate the fit_generator endpoint at some point as it is now obsolete), and in my testing is actually faster than the 1.14 baseline.

@max1mn If you replace return batch_x, batch_y with return tuple(batch_x), batch_y your code should work. This stems from a historic decision in tf.data about how to treat lists. I will make fit robust to this, but adding that tuple() will immediately unblock you. Sorry for the inconvenience.

I have similar issue, when I tried fit instead of fit_generato, and when I tried tuple thing the error disappeared, but I got a different issue, my dataset is made of 300000 images, and it appears that now keras is trying to load all the images to memory before start training, and obviously it does not work, any walk around this problem?

my data generator is feeding a multi input model, that receives 2 images and two element numerical vector

robieta · 2019-12-11T20:22:12Z

@Dr-Gandalf what is your exact version of TensorFlow? And can you provide a minimal repro of a case where keras is loading too much into memory?

Dr-Gandalf · 2019-12-11T20:47:42Z

@robieta I am using tf 2.0.0 inside a docker from docker hub, "tensorflow/tensorflow:latest-gpu-py3-jupyter".

these are my imports

from tensorflow import keras
from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Input, Dense, Flatten, concatenate, Dropout, GlobalAveragePooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import Iterator
from tensorflow.keras.applications.densenet import DenseNet121, preprocess_input
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard, EarlyStopping, ReduceLROnPlateau
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from tensorflow.keras.utils import Sequence
import tensorflow as tf
from datetime import datetime
import io
from sklearn.metrics import roc_curve,roc_auc_score

#this is the return line of my generator:

return (X1i[0], X2i[0], X3i[0]), X1i[1]

#this is the training line I am using, that it used to work fine with fit_generator:

T_history = classification_model.fit(trainGenerator, steps_per_epoch=steps_per_epoch,
                                              validation_data=validation_generator,
                                              validation_steps=validation_steps, 
                                              callbacks=callbacks_list,
                                              epochs=6,
                                              use_multiprocessing=True,
                                              workers=8,
                                              max_queue_size=50)

the console message I am getting is :

Filling up shuffle buffer (this may take a while): 10 of 5802

robieta · 2019-12-11T22:07:48Z

Ah, ok I see what is happening. fit takes a shuffle argument which defaults to True. (Since that is generally what is desired for arrays or Datasets.) However it doesn't really make sense for generators. In order to provide shuffling tf.keras currently drains the entire generator so it can shuffle the batches, whereas it should just drop the shuffle arg and use the elements as they are yielded. I will fix this and make sure it makes it into TF 2.1.

Dr-Gandalf · 2019-12-12T14:08:32Z

thanks for your time, you are very kind :)

martinwicke · 2019-12-12T15:49:37Z

@robieta we have to make sure that this behavior (shuffle is true except for generators) makes it into the documentation as well.

robieta · 2019-12-12T23:36:30Z

@Dr-Gandalf No, no, thank you. This is an important performance detail and I'm very happy that it's now going to make it into 2.1. Thanks for reporting.

robieta · 2019-12-15T20:34:42Z

FYI this is fixed by 7d533c2, and will be cherrypicked into TF 2.1

robieta · 2019-12-15T20:36:27Z

I'm going to go ahead and close this since the appropriate fixes are now in tf.

bafonso · 2019-12-31T16:48:43Z

I also notice that when replacing the fit_generator for fit and even when using use_multiprocessing and workers I do not observe multi-threading.

farhodfm · 2020-01-07T07:25:02Z

Hello there!

I was using TF 1.14.0 along with Keras 2.3.1. But then, to use keract, I moved to TF 2.0.0.
Everything works fine so far, except for the training time. Training is extremely slow compared to when I was using TF 1.14.0.
Here, I found a discussion on fit() function, but how about train_on_batch (I stick with this func.).
@robieta any considerations

robieta · 2020-01-08T01:53:38Z

@farhodfm TF 2.0 has known issues with both fit_generator and train_on_batch. Can you try tf-nightly or tensorflow==2.1.0rc2?

farhodfm · 2020-01-08T06:03:25Z

@robieta thanks for the reply!

Let me try to upgrade to tensorflow==2.1.0rc2, and do checking on training time using train_on_batch.
I will inform you asap.

phiwei · 2020-03-24T18:08:53Z

Ah, ok I see what is happening. fit takes a shuffle argument which defaults to True. (Since that is generally what is desired for arrays or Datasets.) However it doesn't really make sense for generators. In order to provide shuffling tf.keras currently drains the entire generator so it can shuffle the batches, whereas it should just drop the shuffle arg and use the elements as they are yielded. I will fix this and make sure it makes it into TF 2.1.

@robieta Is there a known workaround for this in TF 2.0? I am using it with keras-tuner, so I have limited control over the training code and I cannot easily upgrade to TF 2.1 (driver requirements collide with me not having admin rights on the machine).

robieta · 2020-03-24T21:33:00Z

@phiwei I would be skeptical that the change would be back ported since it is a non-trivial behavior change, and putting it into a point release could cause other models in 2.0 to silently train differently. (Which is why point releases are generally reserved for absolutely critical fixes.) Although I am no longer a member of the TensorFlow team, so that is pure speculation on my part.

shtse8 · 2020-08-06T19:06:57Z

I have the same issue while using TF 2.0. I stick with model.fit and model.predict. the performace is very slow and after disabling the eager execution using tf.compat.v1.disable_eager_execution(), it will be 3x - 4x faster. any updates on this issue?

libinruan · 2020-08-20T16:41:02Z

@shtse8 Thanks for the tips. I am using TF 2.0.0 on a virtual machine and I can confirm that disabling the eager execution does resolve the sluggishness and dispel the annoying message "Filling up shuffle buffer ..." Based on my experiments, the performance of TF 2.0.0 on a GCP virtual machine with Tesla T4 and 4 vCPUs is on par with the performance on google Colab where TF is of version 2.3.0 with Tesla T4 and 2 CPUs.

mihaimaruseac assigned alextp Oct 3, 2019

martinwicke assigned karmel Oct 7, 2019

robieta mentioned this issue Oct 8, 2019

Training with GPU on TF 2.0 is much slower than on TF 1.14 if set a large number to input_dim of tf.keras.layers.Embedding #32104

Closed

This was referenced Oct 12, 2019

Efficiency of model.fit_generator() are greatly reduced in 2.0.0 #33177

Closed

TF2.0 OOM error training imagenet with vgg, fine when eager execution off #33174

Closed

ResourceExhaustedError problem in fit_generator. keras-team/keras#13376

Closed

robieta self-assigned this Dec 15, 2019

robieta closed this as completed Dec 15, 2019

TimSC mentioned this issue Apr 17, 2020

fit_generator with ImageDataGenerator is much slow than fit keras-team/keras#12683

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: Training is much slower in TF v2.0.0 VS v1.14.0 when using `Tf.Keras` and `model.fit_generator` #33024

Performance: Training is much slower in TF v2.0.0 VS v1.14.0 when using `Tf.Keras` and `model.fit_generator` #33024

Performance: Training is much slower in TF v2.0.0 VS v1.14.0 when using Tf.Keras and model.fit_generator #33024

Performance: Training is much slower in TF v2.0.0 VS v1.14.0 when using Tf.Keras and model.fit_generator #33024

Comments

Performance: Training is much slower in TF v2.0.0 VS v1.14.0 when using `Tf.Keras` and `model.fit_generator` #33024

Performance: Training is much slower in TF v2.0.0 VS v1.14.0 when using `Tf.Keras` and `model.fit_generator` #33024