Low performance when using persistent mode GradientTape with LSTM/GRU layers #51818
Labels
2.6.0
comp:ops
OPs related issues
stat:awaiting tensorflower
Status - Awaiting response from tensorflower
type:performance
Performance Issue
System information
Describe the current behavior
The performance was very low in graph mode when using persistent mode tf.GradientTape or create multi-GradientTape objects in one with block.
This phenomenon only happens when the model includes a LSTM or GRU layers.
Standalone code to reproduce the issue
output
output
output
Other info
Both train_step_0 and train_step_1 show the error, while train_step_2 doesn't. In my GPU, the first 2 approaches take around 17 in doing 100 training steps, while the third one takes 4.3s.
Furthermore, we can only reproduce this performace drop when using GRU/LSTMs in graph mode. Which is, if we remove the tf.function decorator from the train_step functions or if we switch the LSTM by a dense layer, all 3 examples take the same time and none of them outputs any error.
As an additional info, this problem happens running both in CPU and in GPU
By the way, this issue is an updated version of #35928 which addressed a very similar problem
The text was updated successfully, but these errors were encountered: