Keras two-head controllable gradient flow #62576

ymuv · 2023-12-06T14:50:42Z

Issue type

Support

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

binary

TensorFlow version

2.11.0

Custom code

No

OS platform and distribution

Windows 10

Mobile device

No response

Python version

3.9.11

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

We use a two-head model.
We want only one head to be involved in the training, the other - not. And after that - vice versa. So the gradient spreads to the body from one head, and not from the other, after which the heads change roles. The approach is similar to https://arxiv.org/abs/2210.05657 .

We have tried some ways to achieve this:

Before every head we add dropout with rate = 1 or ~0 and in the generator class change drop rate ((for example, on method getitem() or on_epoch_end() get model layers and change rate: self.model.get_layer('dr1').rate=0 or 1). Please see example 1.
Use a custom layer before every head. This layer mul its input to some const. Also, we have tried to change this const var to 0/1 in data. Please see example 2.
We have tried to recompile the model after applying changes from paragraphs 1) and 2), but it cause error:
tmp_logs = self.train_function(iterator)
TypeError: 'NoneType' object is not callable
Please see example 3.

Unfortunately, all upper mentioned approaches didn't work in our case: we can see that the weights have changed in the model, but the output of the model doesn`t change. Prediction output for the model with const=0 or const=1 is the same

I want to ask, is it a normal behavior or a bug?

Standalone code to reproduce the issue

#example.py
import numpy as np
import tensorflow.keras.backend as K
import tensorflow as tf
from tensorflow.keras import regularizers
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.layers import Dense, GRU, Input, Lambda, Dropout, Layer
from tensorflow.keras.models import Model as KerasModel
from tensorflow.keras.utils import Sequence


maxValue = 1 - tf.keras.backend.epsilon()

dataSize = 100
BS = 10
dataShape = (10, 2)
outShape = 1

def getModel():
    input_layer = Input(shape=dataShape, name="input")
    x = input_layer
    x = GRU(units=10, unroll=True, name="gru")(x)
    dropoutTraining = True
    x1 = Dropout(maxValue, name="head1")(x, training=dropoutTraining)
    x2 = Dropout(0, name="head2")(x, training=dropoutTraining)
    out1 = Dense(outShape, use_bias=False)(x1)
    out2 = Dense(outShape, use_bias=False)(x2)
    model = KerasModel(input_layer, [out1, out2])
    model.summary()

    model.compile(
        loss="binary_crossentropy",
        optimizer='Adam',
    )
    return model


class DataGenerator(Sequence):
    def __init__(self, model):
        self.model = model
        self.id = 0

    def __len__(self):
        return 10

    def __getitem__(self, index):
        X = np.random.rand(BS, *dataShape)
        out_vec1 = np.random.rand(BS, outShape)
        out_vec2 = np.random.rand(BS, outShape)
        return X, [out_vec1, out_vec2]

    def on_epoch_end(self):
        self.id+=1
        X = np.random.rand(1, *dataShape)

        y = self.model.predict(X)
        print("Before switch", y)

        head1 = self.model.get_layer('head1')
        head2 = self.model.get_layer('head2')
        if self.id % 2 == 0:
            head1.rate = maxValue
            head2.rate = 0
        else:
            head1.rate = 0
            head2.rate = maxValue

        y = self.model.predict(X)
        print("After switch", y)

model = getModel()

gen = DataGenerator(model)
steps_per_epoch = dataSize // BS


history = model.fit_generator(
    gen,
    steps_per_epoch = steps_per_epoch,
    epochs = 10,
    verbose = 1)


#example2.py
import numpy as np
import tensorflow.keras.backend as K
import tensorflow as tf
from tensorflow.keras import regularizers
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.layers import Dense, GRU, Input, Lambda, Dropout, Layer
from tensorflow.keras.models import Model as KerasModel
from tensorflow.keras.utils import Sequence


maxValue = 1 - tf.keras.backend.epsilon()

dataSize = 100
BS = 10
dataShape = (10, 2)
outShape = 1


class ConstMul(Layer):
    def __init__(self, const_val, *args, **kwargs):
        super().__init__(*args,  **kwargs)
        self.const = const_val

    def call(self, inputs, **kwargs):
        return inputs * self.const


def getModel():
    input_layer = Input(shape=dataShape, name="input")
    x = input_layer
    x = GRU(units=10, unroll=True, name="gru")(x)
    dropoutTraining = True
    x1 = ConstMul(1, name="head1")(x, training=dropoutTraining)
    x2 = ConstMul(0, name="head2")(x, training=dropoutTraining)
    out1 = Dense(outShape, use_bias=False)(x1)
    out2 = Dense(outShape, use_bias=False)(x2)
    model = KerasModel(input_layer, [out1, out2])
    model.summary()

    model.compile(
        loss="binary_crossentropy",
        optimizer='Adam')
    return model


class DataGenerator(Sequence):
    def __init__(self, model):
        self.model = model
        self.id = 0

    def __len__(self):
        return 10

    def __getitem__(self, index):
        X = np.random.rand(BS, *dataShape)
        out_vec1 = np.random.rand(BS, outShape)
        out_vec2 = np.random.rand(BS, outShape)
        return X, [out_vec1, out_vec2]

    def on_epoch_end(self):
        self.id+=1
        X = np.random.rand(1, *dataShape)

        y = self.model.predict(X)
        print("Before switch", y)

        head1 = self.model.get_layer('head1')
        head2 = self.model.get_layer('head2')
        if self.id % 2 == 0:
            head1.const = maxValue
            head2.const = 0
        else:
            head1.const = 0
            head2.const = maxValue

        y = self.model.predict(X)
        print("After switch", y)


model = getModel()

gen = DataGenerator(model)
steps_per_epoch = dataSize // BS


history = model.fit_generator(
    gen,
    steps_per_epoch = steps_per_epoch,
    epochs = 10,
    verbose = 1)

#example3.py
import numpy as np
import tensorflow.keras.backend as K
import tensorflow as tf
from tensorflow.keras import regularizers
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.layers import Dense, GRU, Input, Lambda, Dropout, Layer
from tensorflow.keras.models import Model as KerasModel
from tensorflow.keras.utils import Sequence


maxValue = 1 - tf.keras.backend.epsilon()

dataSize = 100
BS = 10
dataShape = (10, 2)
outShape = 1


class ConstMul(Layer):
    def __init__(self, const_val, *args, **kwargs):
        super().__init__(*args,  **kwargs)
        self.const = const_val

    def call(self, inputs, **kwargs):
        return inputs * self.const


def getModel():
    input_layer = Input(shape=dataShape, name="input")
    x = input_layer
    x = GRU(units=10, unroll=True, name="gru")(x)
    dropoutTraining = True
    x1 = ConstMul(1, name="head1")(x, training=dropoutTraining)
    x2 = ConstMul(0, name="head2")(x, training=dropoutTraining)
    out1 = Dense(outShape, use_bias=False)(x1)
    out2 = Dense(outShape, use_bias=False)(x2)
    model = KerasModel(input_layer, [out1, out2])
    model.summary()

    model.compile(
        loss="binary_crossentropy",
        optimizer='Adam')
    return model


class DataGenerator(Sequence):
    def __init__(self, model):
        self.model = model
        self.id = 0

    def __len__(self):
        return 10

    def __getitem__(self, index):
        X = np.random.rand(BS, *dataShape)
        out_vec1 = np.random.rand(BS, outShape)
        out_vec2 = np.random.rand(BS, outShape)
        return X, [out_vec1, out_vec2]

    def on_epoch_end(self):
        self.id+=1
        X = np.random.rand(1, *dataShape)

        y = self.model.predict(X)
        print("Before switch", y)

        head1 = self.model.get_layer('head1')
        head2 = self.model.get_layer('head2')
        if self.id % 2 == 0:
            head1.const = maxValue
            head2.const = 0
        else:
            head1.const = 0
            head2.const = maxValue

        self.model.compile(
            loss="binary_crossentropy",
            optimizer='Adam')

        y = self.model.predict(X)
        print("After switch", y)


model = getModel()

gen = DataGenerator(model)
steps_per_epoch = dataSize // BS


history = model.fit_generator(
    gen,
    steps_per_epoch = steps_per_epoch,
    epochs = 10,
    verbose = 1)

Relevant log output

No response

google-ml-butler bot added the type:support Support issues label Dec 6, 2023

google-ml-butler bot assigned SuryanarayanaY Dec 6, 2023

ymuv changed the title ~~https://arxiv.org/abs/2210.05657~~ Keras two-head controllable gradient flow Dec 6, 2023

SuryanarayanaY added the TF 2.11 Issues related to TF 2.11 label Dec 7, 2023

SuryanarayanaY assigned sachinprasadhs and unassigned SuryanarayanaY Dec 8, 2023

sachinprasadhs assigned sampathweb Dec 27, 2023

sachinprasadhs added comp:keras Keras related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Dec 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keras two-head controllable gradient flow #62576

Keras two-head controllable gradient flow #62576

Keras two-head controllable gradient flow #62576

Keras two-head controllable gradient flow #62576

Comments

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output