Release GPU Memory(VRAM) after tf.keras.backend.clear_session() #39535

Arktius · 2020-05-14T09:41:20Z

System information

Windows 10 Microsoft Windows [Version 10.0.18362.418]
TensorFlow 2.0 installed from Conda:
Python version: 3.6.10
CUDA/cuDNN version: NVIDIA-SMI 445.75 Driver Version: 445.75 CUDA Version: 11.0
GPU model and memory: NVIDIA 2060S

Describe the current behavior
There's no command which frees the previously used VRAM. Even deleting the model and the data had no effect on the VRAM.

Describe the expected behavior
Any of these commands should release the VRAM.

Standalone code to reproduce the issue

import tensorflow as tf
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from tensorflow import math, dtypes
from tensorflow import float32 as f32 
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Input
import random
import numpy as np # linear algebra
import gc

rseed=10
np.random.seed(rseed)
random.seed(rseed)
tf.compat.v1.set_random_seed(rseed)
    
def MMSE( preds,targets, mask_value=0.0):
    tf.print('\npred',preds)
    tf.print('target',targets)
    mask = dtypes.cast(tf.not_equal(targets,0),f32) 
    num_rating = math.reduce_sum(mask) #count ratings
    loss = math.reduce_sum(math.square(mask*(preds - targets))) / num_rating 
    return loss


input_dim = Input(shape = (3, ))
model = Sequential()
model.add(Dense(3,input_dim=3))
model.add(Dense(3))
model.compile(optimizer = Adam(lr=0.01),loss=[MMSE]) 
            
data  = tf.math.round(tf.random.normal(shape=[5,3]))
history = model.fit(data,data, epochs = 1, batch_size = 5,verbose=0, shuffle=False) 

del input_dim,model,data,history
tf.compat.v1.reset_default_graph()
tf.keras.backend.clear_session()
gc.collect()

I've used nvidia-smi to check the memory-usage.

The text was updated successfully, but these errors were encountered:

byronyi · 2020-05-15T01:58:00Z

It is currently not possible without exiting the Python process due to the fact that many TF internal objects, e.g. GPU memory pool, device streams, do not support clean shutdown.

Arktius · 2020-05-15T06:56:43Z

What a pity. Can I somehow estimate it by using the number of trainable parameters and the batch size?

Arktius · 2020-05-15T07:14:55Z

One could check the model size or the trainable parameters, but this is somehow not satisfying. Source

gowthamkpr · 2020-05-16T14:52:38Z

@Arktius As this issue has been answered, can you please create a new issue and close this one. Thanks!

ELind77 · 2020-08-14T19:18:15Z

It is currently not possible without exiting the Python process due to the fact that many TF internal objects, e.g. GPU memory pool, device streams, do not support clean shutdown.

This seems like a core deficiency in the library. Is there a followup ticket for a line of work to fix this?

Arktius added the type:bug Bug label May 14, 2020

google-ml-butler bot assigned ravikyram May 14, 2020

ravikyram added comp:gpu GPU related issues TF 2.0 Issues relating to TensorFlow 2.0 type:performance Performance Issue and removed type:bug Bug labels May 15, 2020

ravikyram assigned gowthamkpr and unassigned ravikyram May 15, 2020

gowthamkpr added the stat:awaiting response Status - Awaiting response from author label May 16, 2020

Arktius closed this as completed May 16, 2020

GatGit12 mentioned this issue Apr 16, 2021

Introduce ability to clear GPU memory in Tensorflow 2 #48545

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release GPU Memory(VRAM) after tf.keras.backend.clear_session() #39535

Release GPU Memory(VRAM) after tf.keras.backend.clear_session() #39535

Release GPU Memory(VRAM) after tf.keras.backend.clear_session() #39535

Release GPU Memory(VRAM) after tf.keras.backend.clear_session() #39535

Comments