[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I clear GPU memory in tensorflow 2? #36465

Open
HristoBuyukliev opened this issue Feb 4, 2020 · 120 comments
Open

How can I clear GPU memory in tensorflow 2? #36465

HristoBuyukliev opened this issue Feb 4, 2020 · 120 comments
Assignees
Labels
comp:gpu GPU related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.7 Issues related to TF 2.7.0 type:bug Bug

Comments

@HristoBuyukliev
Copy link

System information

  • Custom code; nothing exotic though.
  • Ubuntu 18.04
  • installed from source (with pip)
  • tensorflow version v2.1.0-rc2-17-ge5bf8de
  • 3.6
  • CUDA 10.1
  • Tesla V100, 32GB RAM

I created a model, nothing especially fancy in it. When I create the model, when using nvidia-smi, I can see that tensorflow takes up nearly all of the memory. When I try to fit the model with a small batch size, it successfully runs. When I fit with a larger batch size, it runs out of memory. Nothing unexpected so far.

However, the only way I can then release the GPU memory is to restart my computer. When I run nvidia-smi I can see the memory is still used, but there is no process using a GPU. Also, If I try to run another model, it fails much sooner.

Nothing in the first five pages of google results works. (and most solutions are for TF1)

Is there any way to release GPU memory in tensorflow 2?

@amahendrakar
Copy link
Contributor

@HristoBuyukliev,
Could you please check this Tensorflow documentation and let us know if it helps. Thanks!

@amahendrakar amahendrakar added comp:gpu GPU related issues TF 2.1 for tracking issues in 2.1 release type:support Support issues stat:awaiting response Status - Awaiting response from author labels Feb 5, 2020
@HristoBuyukliev
Copy link
Author

@amahendrakar Hi, this is not what I am looking for. Not using up all the memory at once sounds like a useful feature, however I am looking to clear the memory tf has already taken.

I just tried it out, it doesn't help. I am iteratively increasing batch size, trying to find the biggest one I can use. Once the jupyter kernel crashes, the memory stays taken up.

Additionally, even the advertised functionality does not work. I made a model that had two times fewer parameters, tensorflow still took up 31 out of 32 gigabytes.

@HristoBuyukliev HristoBuyukliev changed the title How can I clear GPU memory in tensorflow 2.0? How can I clear GPU memory in tensorflow 2? Feb 5, 2020
@amahendrakar amahendrakar assigned ymodak and unassigned amahendrakar Feb 5, 2020
@taborda11
Copy link

Hello @HristoBuyukliev, I had a similar problem when I was iterating over model.predict(), if you are iteratively increasing batch size, try after each batch_size training do tf.keras.backend.clear_session().
That seems to be a case of memory leak in each training.

@ymodak
Copy link
Contributor
ymodak commented Feb 5, 2020

You may try limiting gpu memory growth in this case.
Put following snippet on top of your code;

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)
# your code

@EKami
Copy link
EKami commented Feb 6, 2020

Hi @HristoBuyukliev , this is a very old issue that everyone is facing in TF 1.x as well as TF 2.x, it seems to be a design flaw and the TF team doesn't seem to care about fixing (I have been facing this issue for more than 2 years now).

What worked well for me was just to run my train/eval in a separate process and wait for it to finish. So when the process finishes the system kills it and releases the GPU resources automatically.
You can achieve this by doing something like:

import multiprocessing

process_eval = multiprocessing.Process(target=evaluate, args=(...))
process_eval.start()
process_eval.join()

@HristoBuyukliev
Copy link
Author

@ymodak
As I also said to amahendrakar:

  1. This seems like a nice feature, but not relevant to my problem.
  2. Tried it anyway, did not work.

@taborda11 Thank you for your suggestion, unfortunately it did not work.
@EKami Yes, I figured by now there is no solution. Thank you for your suggestion, I will try it out.

@taborda11 @EKami a teammate of mine found a hacky solution that kind of works:

for i in $(sudo lsof /dev/nvidia2 | grep python | awk '{print $2}' | sort -u); do sudo kill -9 $i; done

This gets all the python processes that are using GPU2 in my case, and kills them. It works, but is very very ugly and I was hoping for a better way.

@ymodak ymodak added type:bug Bug and removed stat:awaiting response Status - Awaiting response from author type:support Support issues labels Feb 7, 2020
@sanjoy
Copy link
Contributor
sanjoy commented Feb 7, 2020

However, the only way I can then release the GPU memory is to restart my computer.

How do you exit the TF processes?

This looks like an issue with nvidia-smi based on your last comment. If lsof /dev/nvidia2 can find the processes using the GPU then nvidia-smi should find them as well.

@ymodak ymodak removed their assignment Feb 8, 2020
@HristoBuyukliev
Copy link
Author

@sanjoy I think that nvidia-smi does not list GPU processes when used within Docker (as in my case)

@sanjoy
Copy link
Contributor
sanjoy commented Feb 16, 2020

@sanjoy I think that nvidia-smi does not list GPU processes when used within Docker (as in my case)

I see, thanks!

How do you exit the TF processes? Based on what you've said so far, it looks like the TF processes are not being dying and the workaround is to find them via lsof /dev/nvidia2 and to kill -9 them manually. So there may be something wrong with how they are being stopped normally.

@mminervini
Copy link

I have also been battling with the issue of releasing GPU memory for quite some time...

My use case is a machine in a production environment with a single Python process that has to serve different types of clients and I need to switch models depending on the service to be provided. Thus, purging previous models from memory is mandatory in this case, otherwise resource exhausted errors appear sooner than later.
With TF 1.x and Keras (when it was separate from TF) I managed to make this work with keras.backend.clear_session(). At some point, I decided to finally move to TF 2.x with Keras integrated into TF (tf.keras) and then clearing GPU memory apparently became an impossible thing to do! I got the impression that something broke in TF memory management when Keras was integrated into TF.

I tried all combinations of tf.keras.backend.clear_session(), tf.compat.v1.reset_default_graph(), gc.collect(), close() the session, tf.compat.v1.disable_eager_execution(), and other solutions that I found online, but none of these really solved the issue.

As a last resort, I will try the solution proposed by @EKami to spawn a subprocess every time I need to switch models and I will report on how it goes.
In any case, this introduces inter-process communication and complicates things unnecessarily, so I really hope the TF team will improve GPU memory management and offer a function to really clear the session!

@mminervini
Copy link

Replying to my own comment...

I implemented the solution based on spawning a subprocess to run Tensorflow code and (as expected) it actually works, because all resources (particularly GPU memory) are released once the subprocess is destroyed.

Of course, there are some drawbacks in terms of implementation complexity, since one has to deal with multi-processing related stuff that otherwise would not be needed, such as inter-process communication or logging from multiple processes.
Performance is also significantly affected, since every subprocess will need to import TF and other modules and load models on the GPU. So, this is definitely not suited for time-critical operations.

@phiwei
Copy link
phiwei commented May 20, 2020

@EKami @mminervini
I have been struggling with this issue for an amount of time that is way beyond reasonable at this point as well... PyTorch did have working solutions for this already two years ago, but I am stuck with TF for now... if you can make the switch, I can warmly recommend it.

Anyhow, could you point me to a good example / tutorial for the subprocess approach if you know of any?

@EKami
Copy link
EKami commented May 20, 2020

@phiwei Yep I came to the exact same conclusion but it's hard to move big projects which rely so much on TF/Keras to Pytorch. For my future projects, I won't do the same mistake tho and I can clearly see from the papers trends that it's where everyone is heading to. Even the argument of "TF is better suited for production" doesn't hold anymore, in fact we are shooting ourselves in the foot with bugs like this one which even after many years, are still not fixed.

The future is JAX/Pytorch, TF is doomed to be a relic of the past at this rate.

As for the subprocess tutorial, I don't have any to share but the small example I gave here: #36465 (comment)

The bad news is: It seems that this solution doesn't work with TF 2.2 on RTX cards (yet another problem). It works well with RTX cards on TF 1.15.x and non-RTX cards on TF 2.2 (like nvidia T4). It seems to be driver related so maybe with the next driver release for RTX the issue will go away... no idea, we'll see, but at this point, I don't expect much.

@phiwei
Copy link
phiwei commented May 20, 2020

@EKami Thanks for the warning, I am in fact using TF 2.2 with an RTX card... I worked with PyTorch a lot two years ago and in my opinion, it was already a very mature tool that actually behaves the way you would want Python code to behave. Something I found very neat was that they by default use dicts for batches, loved that for customising models / handing information through models.

Edit:
On that note though, Keras-Tuner works for me with RTX and TF 2.2, so there must be some way to accomplish this.

@nalane
Copy link
nalane commented May 2, 2023

Has there been any movement on this?

@AmosDinh
Copy link

Hi @HristoBuyukliev , this is a very old issue that everyone is facing in TF 1.x as well as TF 2.x, it seems to be a design flaw and the TF team doesn't seem to care about fixing (I have been facing this issue for more than 2 years now).
What worked well for me was just to run my train/eval in a separate process and wait for it to finish. So when the process finishes the system kills it and releases the GPU resources automatically. You can achieve this by doing something like:

import multiprocessing

process_eval = multiprocessing.Process(target=evaluate, args=(...))
process_eval.start()
process_eval.join()

Here's how to get the return value of the process:

from multiprocessing import Process, Queue
import random

def my_func(arg, q):
    ret_val = random.random()
    print(ret_val, type(ret_val))
    q.put(ret_val)
    return 1


if __name__ == '__main__':
    queue = Queue()  # Here
    p1 = Process(target=my_func, args=('not used', queue))
    p1.start()
    p1.join()

    res = queue.get()
    print(res, type(res))

Does this work with gpu?

@nalane
Copy link
nalane commented Jun 29, 2023

Does this work with gpu?

Yeah, it does

@exgphe
Copy link
exgphe commented Jul 12, 2023

I can confirm that in CPU mode Tensorflow does not release the memory either. I created a small experiment that creates a small model, and then executes del model, tf.keras.backend.clean_session() and gc.collect(), in a for loop. The memory usage grows steadily and will end up occupying all of your RAM. The only way to release the memory is to create the model in a separate process and then kill the process.

@nalane
Copy link
nalane commented Aug 14, 2023

@reedwm @sachinprasadhs @mohantym I don't want to come across as rude, but can we get a confirmation from you guys that you at least recognize this as an issue? We have provided graphs and data showing that TensorFlow has a memory leak, and yet every time someone comes along with a "solution" that doesn't even fix the problem at hand, someone from Google tries to close this thread. From my perspective, it's as if Google is trying to say one of two things:

  1. "We don't even care that people are telling us there is a memory leak," or
  2. "We know it's a problem and are trying to sweep it under the rug."

@fingoldo
Copy link
fingoldo commented Aug 23, 2023

It's just unbelievable to see this in year 2023. And you guys (core tensorflow devs) call it a production-ready machine learning framework? Learn how to clean up your crap (claimed GPU memory) first. It's disrespectful to your users. I am currently in sutuation where I need to train many models (not only ANNs, but also boostings etc, some of them also using GPU) on the same fold of data, then issue predictions with each model over a range of separate big files, proceed to the next fold etc. Spawning separate thread for every such training and every prediction task, importing tf, communicating results back to the main thread seems much more silly than just calling some helper funciton to clear allocated GPU memory. If you are able to allocate memory then you must be able to also release it easily. Find courage to listen to you users and do it.

@Sam-Seaberry
Copy link

Not a fix but I use a docker container that I pull up each time I need to evaluate a model, once the evaluation is complete I break from the main program so the docker container is stopped. This releases all GPU resources.

Use this container to enable GPU processing in your container (or a similar container with the correct cuda version for your gpu):
nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04

The container can be pulled up with a subprocess call like:
subprocess.run(['docker', 'start', container_name], stderr=subprocess.PIPE)

However, to enable GPU processing within the container add the arguments:
--gpus all

@nalane
Copy link
nalane commented Aug 25, 2023

@reedwm @sachinprasadhs Do you hear how ridiculous the above is? THAT is what this memory leak is making us do.

@jurahul jurahul assigned rohan100jain and unassigned reedwm and sachinprasadhs Aug 25, 2023
@jurahul
Copy link
Contributor
jurahul commented Aug 25, 2023

I've escalated this to the TF team.

@robotoil
Copy link
robotoil commented Aug 25, 2023 via email

@nalane
Copy link
nalane commented Aug 25, 2023

@robotoil Whenever someone tells me, “I’m going to start a new project in TensorFlow,” I point them to this exact issue and tell them to use PyTorch instead

@robotoil
Copy link
robotoil commented Aug 25, 2023 via email

@xucian
Copy link
xucian commented Sep 17, 2023

What a funny thing to have this persist 3.5 years now. for these years-old issues (a lot of companies have them), the cause it's usually something we're not seeing (usually something detrimental to them if they fix it).
Or maybe in this case they think that just starting the python in a subprocess is an acceptable solution (for me, it was at the time).

@EKami
Copy link
EKami commented Sep 17, 2023

My first fix was to run in a subprocess. My second fix, which was more painful, was to migrate the models to ONNX and from that point, only use PyTorch for future models

@nalane
Copy link
nalane commented Sep 17, 2023

the cause it's usually something we're not seeing (usually something detrimental to them if they fix it).

The fact that they haven’t even recognized it as an issue, despite the fact that we have supplied code and charts demonstrating that it’s an issue, makes me think that this is the case.

@cjmcclellan
Copy link

This issue seems to have been fixed by setting TF_GPU_ALLOCATOR=cuda_malloc_async as mentioned in #48545, which seems to work for me on TF 2.15 and CUDA 12.3. I'm not sure why this hasn't been mentioned here yet.

@nalane
Copy link
nalane commented Jan 17, 2024

@cjmcclellan From #36465 (comment) in this thread:

I ran this on TF 2.8.0 on both Windows and Linux, on 3 different GPUs (with different CUDA versions), and with both the BFC memory manager and the experimental async allocator (TF_GPU_ALLOCATOR=cuda_malloc_async); results are consistent across all variations I've tried.

@andreped
Copy link

This issue seems to have been fixed by setting TF_GPU_ALLOCATOR=cuda_malloc_async as mentioned in #48545, which seems to work for me on TF 2.15 and CUDA 12.3. I'm not sure why this hasn't been mentioned here yet.

@cjmcclellan I think most of us have given up at this point :P

As @nalane mentions, if you think this fixes the issue, demonstrate it through running @FirefoxMetzger's benchmark from earlier in this thread with the proposed fix (#36465 (comment))

@cjmcclellan
Copy link

@andreped and @nalane thanks sorry I missed that comment. The async allocator fixed my issue, but clearly did not entirely fix the memory leak.

@ElrondL
Copy link
ElrondL commented Apr 25, 2024

Yeah TF team, please fix this. While the multiprocess trick works for single GPU, how does that work if you run distributed training? Wouldn't multiprocess be more likely to mess up the distributed training? If you run distributed training + Hyperparameter optimization you would be looking at your code eating through hundreds of GBs of RAM

@takeyama0
Copy link
takeyama0 commented May 31, 2024

In my environment, specifying a strategy mitigate the problem.
For example, in the single GPU case, the following not completely but significantly reduced memory leakage.

import tensorflow as tf

# set strategy
strategy = tf.distribute.OneDeviceStrategy(device="/gpu:0")

# fit and evaluate under the specified strategy
with strategy.scope():
    dataset = tf.data.Dataset.from_tensors(([1.], [1.])).repeat(100).batch(10)
    model = tf.keras.Sequential([tf.keras.layers.Dense(1,)])
    model.compile(loss='mse', optimizer='adam')
    model.fit(dataset, epochs=10)
    model.evaluate(dataset)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gpu GPU related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.7 Issues related to TF 2.7.0 type:bug Bug
Projects
None yet
Development

No branches or pull requests