[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Training is much slower in TF v2.0.0 VS v1.14.0 when using Tf.Keras and model.fit_generator #33024

Closed
Raukk opened this issue Oct 3, 2019 · 43 comments
Assignees

Comments

@Raukk
Copy link
Raukk commented Oct 3, 2019

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information
NOTE: I have provided Google Colab' notebooks to reproduce the slowness.

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): sortof, but is a basically and MNIST example.
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Google Colab and Windows
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: NA
  • TensorFlow installed from (source or binary): pip install tensorflow-gpu
  • TensorFlow version (use command below): 2.0.0
  • Python version: 3
  • Bazel version (if compiling from source): NA
  • GCC/Compiler version (if compiling from source): NA
  • CUDA/cuDNN version: 10 or Google Colab
  • GPU model and memory: 1080 Ti, or Google Colab

You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
It happens on the standard Colab GPU instance

Describe the current behavior
Version 2.0.0 is SLOW compared to the identical code running v1.14.0.
The code I have used to demonstrate it is very simple, and very similar to most existing Keras examples.
A Larger NN on MNIST is going from 10s per epoch to ~20s and that is a very major slowdown.

Describe the expected behavior
A new version should have similar or better performance than the previous version.
If user error or a new limitation/feature is causing the problem, it should be warned about in Update Notes/Quick Start. This code was perfectly normal in TF 1.X

Code to reproduce the issue
See this (GPU) Colab Notebook example with MNIST Data:
https://colab.research.google.com/gist/Raukk/f0927a5e2a357f2d80c9aeef1202e6ee/example_slow_tf2.ipynb

See this (GPU) Colab Notebook example with numpy random for Data:
https://colab.research.google.com/gist/Raukk/518d3d21e08ad02089429529bd6c67d4/simplified_example_slow_tf2.ipynb

See this (GPU) Colab Notebook example using standard Conv2D (not DepthwiseConv2D):
https://colab.research.google.com/gist/Raukk/4f102e192f47a6dc144b890925b652f8/standardconv_example_slow_tf2.ipynb

Please notify me if you cannot access any of these notebooks, or if they do not run, or don't sufficiently reproduce the issue.

Other info / logs
Each example above starts with a TLDR; that gives a very basic summary of results.

Thank you!

@DanMinhNguyen
Copy link
DanMinhNguyen commented Oct 3, 2019

Can confirm from my own experience with it. I had a similar issue with my own project when switching to TF2 (stable. I waited for the official release a few days ago), with a 2x to 3x decrease in training time for the same data and code, as compared to TF1. After some Google searching and reading, I then proceeded to implement the code using tf.data.Dataset.from_generator(), instead, which allows me to use model.fit().

Unfortunately there was 0 performance benefit either way.

As for some pseudocode (posting here just in case someone can point out something fundamentally wrong with my setup), my fit_generator version of my code went something like this below. All my code uses the internal tf.keras instead of the external one:

def datagen(args):
    while True:
        #some code here to load and manipulate data into x and y. Mostly numpy functions
        yield x,y

#some code here to create and compile model 

model.fit_generator(datagen(args), . . . )

For the pseudocode using tf.data.Dataset.from_generator():

from tensorflow.compat.v2.data import Dataset

def datagen(args):
    while True:
        #some code here to load and manipulate data into x and y. Mostly numpy functions
        yield x,y

#some code here to create and compile model 

train_data = Dataset.from_generator(generator=lambda: datagen(args), . . . )
model.fit(train_data , . . . )

@Szubie
Copy link
Szubie commented Oct 3, 2019

Have also experienced a large (4x) increase in training times for Keras models when using fit_generator after upgrading to TensorFlow 2.0.

Execution times became comparable to TF 1.14 when disabling eager execution by running:
tf.compat.v1.disable_eager_execution().

@Raukk
Copy link
Author
Raukk commented Oct 3, 2019

@Szubie Thanks, that does seem to improve the performance back to standard levels, or at least close enough.

I wonder if there is a better fix for this issue?

tf.compat.v1.disable_eager_execution() seems like a workaround more than a fix, especially since I doubt V1 compatibility going to be maintained forever.

@DanMinhNguyen
Copy link

@Szubie Thanks for you input!

Unfortunately running tf.compat.v1.disable_eager_execution() does not seem to to work any better for me, as compared to TF1.10 that I am using. Running 1 epoch of 100 update iterations takes, on average, 43 seconds with TF2, but only 20 seconds with TF1.

@Marmotte06
Copy link

I have the same issue. I just installed Tensorflow-gpu 2.0 and modified my Keras code to use the "native" Keras module in Tensorflow.
My model that used to train in 12 sec/epoch with TF 1.14.0 now takes 83 sec/epoch.
And my RTX 2070 GPU is used at only 3% of its power !
I also used a fit_generator for the training.

I am on Windows 10 x64, CUDA 10.0
CPU Core i9 9900K
32 GB RAM for the CPU, 8GB RAM for the GPU
1TB NVME SSD

TF-GPU 2.0 as it is is just unusable, I roll back to 1.14
Hope that will be fixed soon.

@mihaimaruseac
Copy link
Collaborator

Can you test the 1.15 release candidate and tell us if you still see the slowdown, as we're trying to identify the root cause?

@martinwicke
Copy link
Member

@karmel, @robieta, this looks like a problem with plain numpy input and fit_generator, both CPU and GPU. Can you take a look?

@robieta
Copy link
robieta commented Oct 7, 2019

First of all, thank you for the wonderful repro. I can't tell you how much easier it makes all of this.

It looks like fit_generator is incorrectly falling back to the eager path, which is why training is slower. I will look into why, but in the mean time can you try using model.fit? It actually also supports generators (we plan to deprecate the fit_generator endpoint at some point as it is now obsolete), and in my testing is actually faster than the 1.14 baseline.

@DanMinhNguyen
Copy link
DanMinhNguyen commented Oct 7, 2019

@robieta I had also attempted model.fit() without any improvement to performance. It may have to do with my current implementation, so I'm pasting some pseudo code below. I am hoping there is something fundamentally wrong with it (I'm thinking its the usage of lambda):

For the pseudocode using tf.data.Dataset.from_generator():

from tensorflow.compat.v2.data import Dataset

def datagen(args):
    while True:
        #some code here to load and manipulate data into x and y. Mostly numpy functions
        yield x,y

#some code here to create and compile model 

#I'm thinking the performance issue here is in using lambda. However, without this I get a 
#"'generator' must be callable" error

train_data = Dataset.from_generator(generator=lambda: datagen(args), . . . )
model.fit(train_data , . . . )

@DanMinhNguyen
Copy link
DanMinhNguyen commented Oct 7, 2019

@mihaimaruseac I just tested my code on TF version 1.15.0rc2. It seems to be equally as slow as TF2.0, I hope this helps with your debugging!

EDIT: TF1.15.0rc2 seems to be faster by a few seconds (35-40s per epoch) as compared to TF2.0 (40-45s per epochs).

@robieta
Copy link
robieta commented Oct 7, 2019

It sounds like @Raukk and @Szubie (and maybe @Marmotte06) are hitting the issue I described above with fit_generator running eagerly, while @DanMinhNguyen's issue is likely different. A simple way to check is to pass the generator function directly into Model.fit. It will also call Dataset.from_generator under the hood, but with best practice optimizations like prefetching. I would also suggest that you change your datagen to:

def datagen(args):
    while True:
        #some code here to load and manipulate data into x and y. Mostly numpy functions
        while True:  # Rule out that the generator itself is the bottleneck by repeating one batch.
            yield x,y

If that doesn't help you'll need to create a colab which demonstrates the difference in Model.fit between 1.14 and 1.15 / 2.0

@DanMinhNguyen
Copy link

@robieta Thanks for the comment! I'll try it out later this evening or tomorrow and get back to you. If I'm still having issues, will make a colab to share and demonstrate it. Currently I was using some internal data which I cannot share, hence the pseudocode.

@DanMinhNguyen
Copy link
DanMinhNguyen commented Oct 7, 2019

@robieta So I just ran my code by passing the generator function directly into model.fit(), and that seemed to fix the issue completely!

Basically my pseudo code now looks like this:

def datagen(args):
    while True:
        #some code here to load and manipulate data into x and y. Mostly numpy functions
        yield x,y

#some code here to create and compile model 

model.fit(datagen(args) , . . . )

So basically what I learned is:

  1. don't use model.fit_generator() anymore
  2. don't call Dataset.from_generator() separately
  3. just use model.fit() and pass the generator directly into it.

Thanks so much for everything!

(As a side note for anyone else reading: the validation_data argument in model.fit() can also take a generator directly as input)

@robieta
Copy link
robieta commented Oct 7, 2019

Excellent! I'm planning on just aliasing (fit / evaluate / predict)_generator to (fit / evaluate / predict), as those methods are now strictly superior.

@Raukk
Copy link
Author
Raukk commented Oct 8, 2019

I have ran the Colab's I posted on version 1.15.0-rc2 and got the exact same results as on version 1.14.0 for performance.

On version 2.0.0 I get comparable performance when I use model.fit( instead of model.fit_generator( OR if I use tf.compat.v1.disable_eager_execution().
Switching to model.fit( is the solution I will use for all my code.

I vote that .fit_generator( become an Alias for .fit( because that would resolve the performance issues without breaking any existing examples (and allowing those examples to work on all versions of TF). I'm sure .fit_generator( will go away eventually, but since TF 2.0 just released, I'd want to keep it backwards compatible (especially for writing examples).

@max1mn
Copy link
max1mn commented Oct 8, 2019

Hello everyone!

I have the same problem and was looking on a solution. I was also using fit_generator but on a Sequence class. The proposed change to simply use fit method is giving me errors, is there some workaround in sequence class case? I'm using tf 2.0.0

`
class LSTMSequence(Sequence):

def __init__(self, x, subnet_x, y, batch_size):
    self.batch_size = batch_size
    self.x, self.subnet_x, self.y = x, subnet_x, y

def __len__(self):
    return math.ceil(len(self.x) / self.batch_size)

def __getitem__(self, idx):
    batch_x = list()
    # x
    batch_x.append(np.array(pad_sequences(self.x[idx * self.batch_size:(idx + 1) *
                                                                       self.batch_size],
                                          padding='post', dtype='float32')))
    # subnet x
    for subnet_x_data in self.subnet_x:
        batch_x.append(np.array(pad_sequences(subnet_x_data[idx * self.batch_size:(idx + 1) *
                                                                                  self.batch_size],
                                              padding='post', dtype='int16')))
    # y
    batch_y = np.array(self.y[idx * self.batch_size:(idx + 1) *
                                                    self.batch_size])
    return batch_x, batch_y

`

and errors i have when use fit instead of fit_generator:

Traceback (most recent call last): File "train.py", line 111, in <module> model.fit(sequence, epochs=NUM_EPOCHS, verbose=2, validation_data=val_sequence, File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 728, in fit use_multiprocessing=use_multiprocessing) File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 224, in fit distribution_strategy=strategy) File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 547, in _process_training_inputs use_multiprocessing=use_multiprocessing) File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 606, in _process_inputs use_multiprocessing=use_multiprocessing) File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\keras\engine\data_adapter.py", line 613, in __init__ output_shapes=nested_shape) File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\data\ops\dataset_ops.py", line 540, in from_generator output_types, tensor_shape.as_shape, output_shapes) File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\data\util\nest.py", line 471, in map_structure_up_to results = [func(*tensors) for tensors in zip(*all_flattened_up_to)] File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\data\util\nest.py", line 471, in <listcomp> results = [func(*tensors) for tensors in zip(*all_flattened_up_to)] File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py", line 1216, in as_shape return TensorShape(shape) File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py", line 776, in __init__ self._dims = [as_dimension(d) for d in dims_iter] File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py", line 776, in <listcomp> self._dims = [as_dimension(d) for d in dims_iter] File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py", line 718, in as_dimension return Dimension(value) File "C:\WPy64-3740\python-3.7.4.amd64\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py", line 193, in __init__ self._value = int(value) TypeError: int() argument must be a string, a bytes-like object or a number, not 'tuple'

@robieta
Copy link
robieta commented Oct 8, 2019

@max1mn If you replace return batch_x, batch_y with return tuple(batch_x), batch_y your code should work. This stems from a historic decision in tf.data about how to treat lists. I will make fit robust to this, but adding that tuple() will immediately unblock you. Sorry for the inconvenience.

@max1mn
Copy link
max1mn commented Oct 8, 2019

@robieta Thank you very much, the code passes with than change. However, another problem appeared - it seemes the model is running on cpu only now. The training time is 254s per epoch, with 2.0 fit generator (and gpu) it was about 70s and with 1.14 gpu it was 20s. There are warnings, dont know whether they are related. Anyway, simply aliasing fit_generator to fit can break using gpu as in my case

`
Train for 678 steps, validate for 42 steps
Epoch 1/150

2019-10-08 11:05:39.314124: W tensorflow/core/grappler/optimizers/implementation_selector.cc:310] Skipping optimization due to error while loading function libraries: Invalid argument: Functions '__inference___backward_cudnn_lstm_with_fallback_6231_7688' and '__inference___backward_cudnn_lstm_with_fallback_6231_7688_specialized_for_StatefulPartitionedCall_at___inference_distributed_function_8683' both implement 'lstm_1489aaa8-07c9-4313-8db7-7c40df79c8a8' but their signatures do not match.
2019-10-08 11:09:42.401028: W tensorflow/core/grappler/optimizers/implementation_selector.cc:310] Skipping optimization due to error while loading function libraries: Invalid argument: Functions '__inference_cudnn_lstm_with_fallback_10665' and '__inference_cudnn_lstm_with_fallback_10665_specialized_for_model_concatenate_lstm2_StatefulPartitionedCall_at___inference_distributed_function_12157' both implement 'lstm_6f710079-b289-4f03-b16c-816fc6d27388' but their signatures do not match.

678/678 - 254s - loss: 4.6264 - val_loss: 0.6066
`

@robieta
Copy link
robieta commented Oct 8, 2019

@max1mn I think that is a separate issue. Can you create a new issue with a minimal repro and cc @qlzh727? (The error you're seeing is in the part of the LSTM that tries to use CuDNN if applicable.)

@qlzh727
Copy link
Member
qlzh727 commented Oct 8, 2019

The warning message above actually means the function has been optimized for cudnn backend. I have suppress this warning in some previous change, but might not be in 2.0. You can ignore it for the moment.

@Seterplus
Copy link

Excellent! I'm planning on just aliasing (fit / evaluate / predict)_generator to (fit / evaluate / predict), as those methods are now strictly superior.

I've also encountered this performance issue:

fit fit_generator
tf1 25s 13s
tf2 4s 28s

After using tf.compat.v1.disable_eager_execution(), the training time of fit_generator in tf2 reduces to 14s. It's comparable to tf1 but still 3x slower than fit in tf2.
model.fit(x=sequence, ...) also completes the training in 14s but it seems to load all data into memory and log "Filling up shuffle buffer (this may take a while)" if I set shuffle=True.
Any ideas?

@robieta
Copy link
robieta commented Nov 7, 2019

I had a commit that aliased fit_generator to fit, but I had to roll it back as it broke some use cases. I'm rolling it forward today now that those issues are resolved. Part of that fix includes better handling of sequences (make sure you use the Sequence class in tf.keras, not keras-team/keras) which will not pull all of the data into memory when shuffling.

@ychervonyi
Copy link
ychervonyi commented Nov 9, 2019

Excellent! I'm planning on just aliasing (fit / evaluate / predict)_generator to (fit / evaluate / predict), as those methods are now strictly superior.

I've also encountered this performance issue:
fit fit_generator
tf1 25s 13s
tf2 4s 28s

After using tf.compat.v1.disable_eager_execution(), the training time of fit_generator in tf2 reduces to 14s. It's comparable to tf1 but still 3x slower than fit in tf2.
model.fit(x=sequence, ...) also completes the training in 14s but it seems to load all data into memory and log "Filling up shuffle buffer (this may take a while)" if I set shuffle=True.
Any ideas?

I see the same performance. My question: is there a way to set buffer size and not load all data when calling .fit() on Sequence right now? @robieta Thank you!

@robieta
Copy link
robieta commented Nov 9, 2019

@ychervonyi Are you using the latest tf-nightly? ac20030 is the relevant change, and should be in tf-nightly==2.1.0.dev20191109 Feel free to post a repro colab if you're seeing Sequence shuffling handled inefficiently.

@Seterplus
Copy link
Seterplus commented Nov 9, 2019

@ychervonyi Are you using the latest tf-nightly? ac20030 is the relevant change, and should be in tf-nightly==2.1.0.dev20191109 Feel free to post a repro colab if you're seeing Sequence shuffling handled inefficiently.

I'm still using tensorflow-gpu, whose latest version is 2.0.0. When I use model.fit(x=generator, shuffle=False, workers=8, ...), it seems that there is still only one worker whether I set multiprocessing=True or not. Could you please verify this behavior?

@goldiegadde
Copy link
Contributor

@Seterplus Could you please try with tensorflow-gpu==2.1.0rc0, this has the fix ac20030
that @robieta mentioned above.

@Dr-Gandalf
Copy link
Dr-Gandalf commented Dec 11, 2019

First of all, thank you for the wonderful repro. I can't tell you how much easier it makes all of this.

It looks like fit_generator is incorrectly falling back to the eager path, which is why training is slower. I will look into why, but in the mean time can you try using model.fit? It actually also supports generators (we plan to deprecate the fit_generator endpoint at some point as it is now obsolete), and in my testing is actually faster than the 1.14 baseline.

@max1mn If you replace return batch_x, batch_y with return tuple(batch_x), batch_y your code should work. This stems from a historic decision in tf.data about how to treat lists. I will make fit robust to this, but adding that tuple() will immediately unblock you. Sorry for the inconvenience.

I have similar issue, when I tried fit instead of fit_generato, and when I tried tuple thing the error disappeared, but I got a different issue, my dataset is made of 300000 images, and it appears that now keras is trying to load all the images to memory before start training, and obviously it does not work, any walk around this problem?

my data generator is feeding a multi input model, that receives 2 images and two element numerical vector

@robieta
Copy link
robieta commented Dec 11, 2019

@Dr-Gandalf what is your exact version of TensorFlow? And can you provide a minimal repro of a case where keras is loading too much into memory?

@Dr-Gandalf
Copy link
Dr-Gandalf commented Dec 11, 2019

@robieta I am using tf 2.0.0 inside a docker from docker hub, "tensorflow/tensorflow:latest-gpu-py3-jupyter".

these are my imports

from tensorflow import keras
from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Input, Dense, Flatten, concatenate, Dropout, GlobalAveragePooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import Iterator
from tensorflow.keras.applications.densenet import DenseNet121, preprocess_input
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard, EarlyStopping, ReduceLROnPlateau
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from tensorflow.keras.utils import Sequence
import tensorflow as tf
from datetime import datetime
import io
from sklearn.metrics import roc_curve,roc_auc_score

#this is the return line of my generator:

return (X1i[0], X2i[0], X3i[0]), X1i[1]

#this is the training line I am using, that it used to work fine with fit_generator:

T_history = classification_model.fit(trainGenerator, steps_per_epoch=steps_per_epoch,
                                              validation_data=validation_generator,
                                              validation_steps=validation_steps, 
                                              callbacks=callbacks_list,
                                              epochs=6,
                                              use_multiprocessing=True,
                                              workers=8,
                                              max_queue_size=50)

the console message I am getting is :

Filling up shuffle buffer (this may take a while): 10 of 5802

@robieta
Copy link
robieta commented Dec 11, 2019

Ah, ok I see what is happening. fit takes a shuffle argument which defaults to True. (Since that is generally what is desired for arrays or Datasets.) However it doesn't really make sense for generators. In order to provide shuffling tf.keras currently drains the entire generator so it can shuffle the batches, whereas it should just drop the shuffle arg and use the elements as they are yielded. I will fix this and make sure it makes it into TF 2.1.

@Dr-Gandalf
Copy link

thanks for your time, you are very kind :)

@martinwicke
Copy link
Member
martinwicke commented Dec 12, 2019 via email

@robieta
Copy link
robieta commented Dec 12, 2019

@Dr-Gandalf No, no, thank you. This is an important performance detail and I'm very happy that it's now going to make it into 2.1. Thanks for reporting.

@robieta
Copy link
robieta commented Dec 15, 2019

FYI this is fixed by 7d533c2, and will be cherrypicked into TF 2.1

@robieta robieta self-assigned this Dec 15, 2019
@robieta
Copy link
robieta commented Dec 15, 2019

I'm going to go ahead and close this since the appropriate fixes are now in tf.

@robieta robieta closed this as completed Dec 15, 2019
@bafonso
Copy link
bafonso commented Dec 31, 2019

I also notice that when replacing the fit_generator for fit and even when using use_multiprocessing and workers I do not observe multi-threading.

@farhodfm
Copy link
farhodfm commented Jan 7, 2020

Hello there!

I was using TF 1.14.0 along with Keras 2.3.1. But then, to use keract, I moved to TF 2.0.0.
Everything works fine so far, except for the training time. Training is extremely slow compared to when I was using TF 1.14.0.
Here, I found a discussion on fit() function, but how about train_on_batch (I stick with this func.).
@robieta any considerations

@robieta
Copy link
robieta commented Jan 8, 2020

@farhodfm TF 2.0 has known issues with both fit_generator and train_on_batch. Can you try tf-nightly or tensorflow==2.1.0rc2?

@farhodfm
Copy link
farhodfm commented Jan 8, 2020

@robieta thanks for the reply!

Let me try to upgrade to tensorflow==2.1.0rc2, and do checking on training time using train_on_batch.
I will inform you asap.

@phiwei
Copy link
phiwei commented Mar 24, 2020

Ah, ok I see what is happening. fit takes a shuffle argument which defaults to True. (Since that is generally what is desired for arrays or Datasets.) However it doesn't really make sense for generators. In order to provide shuffling tf.keras currently drains the entire generator so it can shuffle the batches, whereas it should just drop the shuffle arg and use the elements as they are yielded. I will fix this and make sure it makes it into TF 2.1.

@robieta Is there a known workaround for this in TF 2.0? I am using it with keras-tuner, so I have limited control over the training code and I cannot easily upgrade to TF 2.1 (driver requirements collide with me not having admin rights on the machine).

@robieta
Copy link
robieta commented Mar 24, 2020

@phiwei I would be skeptical that the change would be back ported since it is a non-trivial behavior change, and putting it into a point release could cause other models in 2.0 to silently train differently. (Which is why point releases are generally reserved for absolutely critical fixes.) Although I am no longer a member of the TensorFlow team, so that is pure speculation on my part.

@shtse8
Copy link
shtse8 commented Aug 6, 2020

I have the same issue while using TF 2.0. I stick with model.fit and model.predict. the performace is very slow and after disabling the eager execution using tf.compat.v1.disable_eager_execution(), it will be 3x - 4x faster. any updates on this issue?

@libinruan
Copy link
libinruan commented Aug 20, 2020

@shtse8 Thanks for the tips. I am using TF 2.0.0 on a virtual machine and I can confirm that disabling the eager execution does resolve the sluggishness and dispel the annoying message "Filling up shuffle buffer ..." Based on my experiments, the performance of TF 2.0.0 on a GCP virtual machine with Tesla T4 and 4 vCPUs is on par with the performance on google Colab where TF is of version 2.3.0 with Tesla T4 and 2 CPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests