-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance: Training is much slower in TF v2.0.0 VS v1.14.0 when using Tf.Keras
and model.fit_generator
#33024
Comments
Can confirm from my own experience with it. I had a similar issue with my own project when switching to TF2 (stable. I waited for the official release a few days ago), with a 2x to 3x decrease in training time for the same data and code, as compared to TF1. After some Google searching and reading, I then proceeded to implement the code using tf.data.Dataset.from_generator(), instead, which allows me to use model.fit(). Unfortunately there was 0 performance benefit either way. As for some pseudocode (posting here just in case someone can point out something fundamentally wrong with my setup), my fit_generator version of my code went something like this below. All my code uses the internal tf.keras instead of the external one:
For the pseudocode using tf.data.Dataset.from_generator():
|
Have also experienced a large (4x) increase in training times for Keras models when using fit_generator after upgrading to TensorFlow 2.0. Execution times became comparable to TF 1.14 when disabling eager execution by running: |
@Szubie Thanks, that does seem to improve the performance back to standard levels, or at least close enough. I wonder if there is a better fix for this issue?
|
@Szubie Thanks for you input! Unfortunately running tf.compat.v1.disable_eager_execution() does not seem to to work any better for me, as compared to TF1.10 that I am using. Running 1 epoch of 100 update iterations takes, on average, 43 seconds with TF2, but only 20 seconds with TF1. |
I have the same issue. I just installed Tensorflow-gpu 2.0 and modified my Keras code to use the "native" Keras module in Tensorflow. I am on Windows 10 x64, CUDA 10.0 TF-GPU 2.0 as it is is just unusable, I roll back to 1.14 |
Can you test the 1.15 release candidate and tell us if you still see the slowdown, as we're trying to identify the root cause? |
First of all, thank you for the wonderful repro. I can't tell you how much easier it makes all of this. It looks like |
@robieta I had also attempted model.fit() without any improvement to performance. It may have to do with my current implementation, so I'm pasting some pseudo code below. I am hoping there is something fundamentally wrong with it (I'm thinking its the usage of lambda): For the pseudocode using tf.data.Dataset.from_generator():
|
@mihaimaruseac I just tested my code on TF version 1.15.0rc2. It seems to be equally as slow as TF2.0, I hope this helps with your debugging! EDIT: TF1.15.0rc2 seems to be faster by a few seconds (35-40s per epoch) as compared to TF2.0 (40-45s per epochs). |
It sounds like @Raukk and @Szubie (and maybe @Marmotte06) are hitting the issue I described above with
If that doesn't help you'll need to create a colab which demonstrates the difference in |
@robieta Thanks for the comment! I'll try it out later this evening or tomorrow and get back to you. If I'm still having issues, will make a colab to share and demonstrate it. Currently I was using some internal data which I cannot share, hence the pseudocode. |
@robieta So I just ran my code by passing the generator function directly into model.fit(), and that seemed to fix the issue completely! Basically my pseudo code now looks like this:
So basically what I learned is:
Thanks so much for everything! (As a side note for anyone else reading: the validation_data argument in model.fit() can also take a generator directly as input) |
Excellent! I'm planning on just aliasing |
I have ran the Colab's I posted on version On version I vote that |
Hello everyone! I have the same problem and was looking on a solution. I was also using fit_generator but on a Sequence class. The proposed change to simply use fit method is giving me errors, is there some workaround in sequence class case? I'm using tf 2.0.0 `
` and errors i have when use fit instead of fit_generator:
|
@max1mn If you replace |
@robieta Thank you very much, the code passes with than change. However, another problem appeared - it seemes the model is running on cpu only now. The training time is 254s per epoch, with 2.0 fit generator (and gpu) it was about 70s and with 1.14 gpu it was 20s. There are warnings, dont know whether they are related. Anyway, simply aliasing fit_generator to fit can break using gpu as in my case ` 2019-10-08 11:05:39.314124: W tensorflow/core/grappler/optimizers/implementation_selector.cc:310] Skipping optimization due to error while loading function libraries: Invalid argument: Functions '__inference___backward_cudnn_lstm_with_fallback_6231_7688' and '__inference___backward_cudnn_lstm_with_fallback_6231_7688_specialized_for_StatefulPartitionedCall_at___inference_distributed_function_8683' both implement 'lstm_1489aaa8-07c9-4313-8db7-7c40df79c8a8' but their signatures do not match. 678/678 - 254s - loss: 4.6264 - val_loss: 0.6066 |
The warning message above actually means the function has been optimized for cudnn backend. I have suppress this warning in some previous change, but might not be in 2.0. You can ignore it for the moment. |
I've also encountered this performance issue:
After using |
I had a commit that aliased fit_generator to fit, but I had to roll it back as it broke some use cases. I'm rolling it forward today now that those issues are resolved. Part of that fix includes better handling of sequences (make sure you use the Sequence class in |
I see the same performance. My question: is there a way to set buffer size and not load all data when calling |
@ychervonyi Are you using the latest tf-nightly? ac20030 is the relevant change, and should be in |
I'm still using tensorflow-gpu, whose latest version is 2.0.0. When I use |
@Seterplus Could you please try with tensorflow-gpu==2.1.0rc0, this has the fix ac20030 |
I have similar issue, when I tried fit instead of fit_generato, and when I tried tuple thing the error disappeared, but I got a different issue, my dataset is made of 300000 images, and it appears that now keras is trying to load all the images to memory before start training, and obviously it does not work, any walk around this problem? my data generator is feeding a multi input model, that receives 2 images and two element numerical vector |
@Dr-Gandalf what is your exact version of TensorFlow? And can you provide a minimal repro of a case where keras is loading too much into memory? |
@robieta I am using tf 2.0.0 inside a docker from docker hub, "tensorflow/tensorflow:latest-gpu-py3-jupyter". these are my imports from tensorflow import keras
from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Input, Dense, Flatten, concatenate, Dropout, GlobalAveragePooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import Iterator
from tensorflow.keras.applications.densenet import DenseNet121, preprocess_input
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard, EarlyStopping, ReduceLROnPlateau
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from tensorflow.keras.utils import Sequence
import tensorflow as tf
from datetime import datetime
import io
from sklearn.metrics import roc_curve,roc_auc_score
#this is the return line of my generator:
return (X1i[0], X2i[0], X3i[0]), X1i[1]
#this is the training line I am using, that it used to work fine with fit_generator:
T_history = classification_model.fit(trainGenerator, steps_per_epoch=steps_per_epoch,
validation_data=validation_generator,
validation_steps=validation_steps,
callbacks=callbacks_list,
epochs=6,
use_multiprocessing=True,
workers=8,
max_queue_size=50) the console message I am getting is :
|
Ah, ok I see what is happening. |
thanks for your time, you are very kind :) |
@robieta we have to make sure that this behavior (shuffle is true except
for generators) makes it into the documentation as well.
|
@Dr-Gandalf No, no, thank you. This is an important performance detail and I'm very happy that it's now going to make it into 2.1. Thanks for reporting. |
FYI this is fixed by 7d533c2, and will be cherrypicked into TF 2.1 |
I'm going to go ahead and close this since the appropriate fixes are now in tf. |
I also notice that when replacing the fit_generator for fit and even when using use_multiprocessing and workers I do not observe multi-threading. |
Hello there! I was using TF 1.14.0 along with Keras 2.3.1. But then, to use keract, I moved to TF 2.0.0. |
@farhodfm TF 2.0 has known issues with both |
@robieta thanks for the reply! Let me try to upgrade to |
@robieta Is there a known workaround for this in TF 2.0? I am using it with keras-tuner, so I have limited control over the training code and I cannot easily upgrade to TF 2.1 (driver requirements collide with me not having admin rights on the machine). |
@phiwei I would be skeptical that the change would be back ported since it is a non-trivial behavior change, and putting it into a point release could cause other models in 2.0 to silently train differently. (Which is why point releases are generally reserved for absolutely critical fixes.) Although I am no longer a member of the TensorFlow team, so that is pure speculation on my part. |
I have the same issue while using TF 2.0. I stick with model.fit and model.predict. the performace is very slow and after disabling the eager execution using |
@shtse8 Thanks for the tips. I am using TF 2.0.0 on a virtual machine and I can confirm that |
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
NOTE: I have provided Google Colab' notebooks to reproduce the slowness.
You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with: 1. TF 1.0:
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
2. TF 2.0:python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
It happens on the standard Colab GPU instance
Describe the current behavior
Version 2.0.0 is SLOW compared to the identical code running v1.14.0.
The code I have used to demonstrate it is very simple, and very similar to most existing Keras examples.
A Larger NN on MNIST is going from
10s
per epoch to~20s
and that is a very major slowdown.Describe the expected behavior
A new version should have similar or better performance than the previous version.
If user error or a new limitation/feature is causing the problem, it should be warned about in Update Notes/Quick Start. This code was perfectly normal in TF 1.X
Code to reproduce the issue
See this (GPU) Colab Notebook example with MNIST Data:
https://colab.research.google.com/gist/Raukk/f0927a5e2a357f2d80c9aeef1202e6ee/example_slow_tf2.ipynb
See this (GPU) Colab Notebook example with numpy random for Data:
https://colab.research.google.com/gist/Raukk/518d3d21e08ad02089429529bd6c67d4/simplified_example_slow_tf2.ipynb
See this (GPU) Colab Notebook example using standard Conv2D (not DepthwiseConv2D):
https://colab.research.google.com/gist/Raukk/4f102e192f47a6dc144b890925b652f8/standardconv_example_slow_tf2.ipynb
Please notify me if you cannot access any of these notebooks, or if they do not run, or don't sufficiently reproduce the issue.
Other info / logs
Each example above starts with a TLDR; that gives a very basic summary of results.
Thank you!
The text was updated successfully, but these errors were encountered: