Gradient checkpointing for TF keras models #38766

pidajay · 2020-04-21T20:54:46Z

System information

TensorFlow version (you are using): 2.1
Are you willing to contribute it (Yes/No): Yes

Describe the feature and the current behavior/state.
I have implemented a version of gradient check pointing for TF keras sequential models (with future plans to extend it for Keras functional API and custom models). The PR can be found here - tensorflow/addons#1600. I initially envisioned it as a package in TF addons repo but reviewers felt it was not right place and could potentially go in as a fix for the existing recompute_grad functionality in TF core -

tensorflow/tensorflow/python/ops/custom_gradient.py

Line 458 in 64f4a59

def recompute_grad(f):

Here are the issues with the existing implementation of recompute_grad in TF core and my solutions to those

Issue - Not usable. Most people have no idea how to use it. No docs or tutorials that explains how to use it.
Solution - My PR provides a notebook tutorial to demonstrate how to use the implemented functionality.
For people who did figure out how to use it, no memory savings was observed.
Solution - My PR provides links to results with observed memory savings. Caveat - only CPU profiled results available. GPU and TPU results need to be done.
There is probably an expectation that the user explicitly has to partition the model and decorate each partition. This is not user friendly and can make tasks such as transfer learning difficult.
Solution - My PR expects no explicit partitioning of the model. The user just needs to add a single decorator to the model. That is it.
No checkpointing functionality is implemented.
Solution - My PR implements the checkpointing functionality that allows the user to balance the tradeoff between memory and compute time.

Does it make sense to port the PR to TF core?

Will this change the current api? How?
The existing implementation can potentially be shoe horned into the existing API for recompute_grad if desirable.

Who will benefit with this feature?
Anyone who wants to train models in resource constrained environments.

Any Other info.

pidajay · 2020-04-21T21:11:00Z

Does it make sense to port the above mentioned PR as a PR to TF core? Also tagging @seanpmorgan from TF addons to keep in loop. Thanks!

jarednielsen · 2020-04-23T18:18:37Z

This functionality would be very helpful for training memory-intensive language models!

pidajay · 2020-04-27T17:45:00Z

By the way I am thinking of addressing this with separate PRs as follows

Provide a fix (which I already have) and tutorial for the existing tf recompute_grad function. From what I understand, using this requires the user to manually partition the model which might be suboptimal in many cases.
Provide a new decorator function '@recompute_sequential' that enables gradient checkpointing on Keras sequential models (like in the PR I have linked above). The big advantage over 1 is that the user does not have to manually partition the model.
Provide another decorator function '@recompute_functional' that enables gradient checkpointing on Keras functions models and potentially sub-classed models.

Does this approach make sense? Thanks!

nyngwang · 2022-08-08T10:03:01Z

@pidajay Did you make it?

pidajay added the type:feature Feature requests label Apr 21, 2020

google-ml-butler bot assigned gadagashwini-zz Apr 21, 2020

Saduf2019 assigned Saduf2019 and gowthamkpr and unassigned gadagashwini-zz and Saduf2019 Apr 22, 2020

gowthamkpr assigned allenlavoie and unassigned gowthamkpr Apr 23, 2020

gowthamkpr added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 23, 2020

allenlavoie assigned fchollet and unassigned allenlavoie Apr 23, 2020

pidajay mentioned this issue Apr 29, 2020

Fixed recompute grad issue - no memory savings #39042

Merged

This was referenced Aug 18, 2022

Gradient checkpointing usage davisyoshida/tf2-gradient-checkpointing#1

Open

It seems that recompute_grad is not saving GPU memory. #57205

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient checkpointing for TF keras models #38766

Gradient checkpointing for TF keras models #38766

Gradient checkpointing for TF keras models #38766

Gradient checkpointing for TF keras models #38766

Comments