-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tf.keras.model fit() significantly slower when using weighted validation data in comparison to tf2.1.0 #39588
Comments
Was able to reproduce the issue. TF v2.2 and TF-nightly take more time for each epoch when compared to TF v2.1. Please find the attached gist. Thanks! |
Tensorflow 2.2 takes much more time than 2.1/2.0 to start training, after called "keras.fit".
It stucks about 7 mins to start training. |
Log from 2.1
|
Interesting that the random weighing causes the performance slowdown. In my case, turning on dropout layers (even with dropout_prob=0) causes the performance slowdown. Could it be something in the tensorflow randomness modules? |
What is the status of this issue? Is someone actively looking into it? If not is there any estimation when someone might look at this issue? Is there something I can do to help? Currently this issue circumvents us from updating to tensorflow 2.2 and thus updating to python 3.8. Luckily python 3.8 or tensorflow 2.2 is not yet a requirement. |
@sirvincent thanks for reporting the issue, a fix was submitted in 1d2d05f. that is available in the latest nightly. |
Thanks @goldiegadde I have tested tf-nightly 2.3.0.dev20200619 and the issue seems to be fixed. |
This regretion is not completely fixed with 2.3.0. It seems that, for whatever reason, the first epoch needs a long time to start, and the first validation step is also very slow. From the 2nd epoch onward, the epoch times are comparable. |
@romanovzky Can you please open a new issue with the gist (you already have one above). Thanks! |
I had the same issue and was able to circumvent it by converting my weights numpy-array into a pandas series. Training now starts immediately and I do not have to wait anymore. |
This is works for me, but how that actual works? the API isn't only numpy array? |
The problem is also fixed if you use a generator (keras Sequence), which is what I have been using. |
If this work, I call sorcery. Thank you! |
It's work for me |
New issue has been opened recently: #48965 |
Please make sure that this is an issue related to performance of TensorFlow.
As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:performance_template
System information
No
ArchLinux & Ubuntu 18.04 LTS
No
binary
v2.2.0-rc4-8-g2b96f3662b 2.2.0
(compared to: v2.1.0-rc2-17-ge5bf8de 2.1.0)
3.7.5
The ArchLinux machine runs on CPU
The Ubuntu machine runs on GPU with:
10.1.243
GeForce GTX 1080 with 7126 MB memory
Describe the current behavior
When training a simple tf.keras.model multilayer perceptron with a call to .fit() containing a validation_data that contains weights results in a significant slower fit() then in comparison to TensorFlow 2.1.0 with the exact same code.
Describe the expected behavior
Similar performance between TensorFlow 2.1.0 and 2.2.0 when training a tf.keras.model with a weighted validation data set.
Standalone code to reproduce the issue
Package requirements for code snippet using python 3.7.5:
Other info / logs
Running the above code snippet on the ArchLinux machine, run on CPU:
takes roughly 19 seconds per epoch. When the same code is run in TensorFlow 2.1.0 it takes roughly 5 seconds per epoch. When the weighing off the validation dataset is turned off with TensorFlow 2.2.0 (is_random_weighing = False) the performance becomes similar to TensorFlow 2.1.0; roughly 5 seconds per epoch.
The slowdown is also seen on the Ubuntu machine, run on GPU, but then due to likely different hardware, tf 2.2.0 is 7 times as slow as tf 2.1.0.
The effect was not seen (but maybe it was not measurable) when using mnist in place of emnist.
The issue seems related to: #39039
In which the comment by @romanovzky brought to light that it might be due to the validation data or validation split. Although that is in the context of comparing a tensorflow estimator to keras.
This issue also seems related to: #39434
In which also from tf2.1 to tf.2.2 a significant performance drop is seen.
It seems like another small puzzle piece in a larger puzzle (or I do something simple wrong on both machines).
The text was updated successfully, but these errors were encountered: