[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keras two-head controllable gradient flow #62576

Open
ymuv opened this issue Dec 6, 2023 · 0 comments
Open

Keras two-head controllable gradient flow #62576

ymuv opened this issue Dec 6, 2023 · 0 comments
Assignees
Labels
comp:keras Keras related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.11 Issues related to TF 2.11 type:support Support issues

Comments

@ymuv
Copy link
ymuv commented Dec 6, 2023

Issue type

Support

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

binary

TensorFlow version

2.11.0

Custom code

No

OS platform and distribution

Windows 10

Mobile device

No response

Python version

3.9.11

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

We use a two-head model.
We want only one head to be involved in the training, the other - not. And after that - vice versa. So the gradient spreads to the body from one head, and not from the other, after which the heads change roles. The approach is similar to https://arxiv.org/abs/2210.05657 .

We have tried some ways to achieve this:

  1. Before every head we add dropout with rate = 1 or ~0 and in the generator class change drop rate ((for example, on method getitem() or on_epoch_end() get model layers and change rate: self.model.get_layer('dr1').rate=0 or 1). Please see example 1.

  2. Use a custom layer before every head. This layer mul its input to some const. Also, we have tried to change this const var to 0/1 in data. Please see example 2.

  3. We have tried to recompile the model after applying changes from paragraphs 1) and 2), but it cause error:
    tmp_logs = self.train_function(iterator)
    TypeError: 'NoneType' object is not callable
    Please see example 3.

Unfortunately, all upper mentioned approaches didn't work in our case: we can see that the weights have changed in the model, but the output of the model doesn`t change. Prediction output for the model with const=0 or const=1 is the same

I want to ask, is it a normal behavior or a bug?

Standalone code to reproduce the issue

#example.py
import numpy as np
import tensorflow.keras.backend as K
import tensorflow as tf
from tensorflow.keras import regularizers
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.layers import Dense, GRU, Input, Lambda, Dropout, Layer
from tensorflow.keras.models import Model as KerasModel
from tensorflow.keras.utils import Sequence


maxValue = 1 - tf.keras.backend.epsilon()

dataSize = 100
BS = 10
dataShape = (10, 2)
outShape = 1

def getModel():
    input_layer = Input(shape=dataShape, name="input")
    x = input_layer
    x = GRU(units=10, unroll=True, name="gru")(x)
    dropoutTraining = True
    x1 = Dropout(maxValue, name="head1")(x, training=dropoutTraining)
    x2 = Dropout(0, name="head2")(x, training=dropoutTraining)
    out1 = Dense(outShape, use_bias=False)(x1)
    out2 = Dense(outShape, use_bias=False)(x2)
    model = KerasModel(input_layer, [out1, out2])
    model.summary()

    model.compile(
        loss="binary_crossentropy",
        optimizer='Adam',
    )
    return model


class DataGenerator(Sequence):
    def __init__(self, model):
        self.model = model
        self.id = 0

    def __len__(self):
        return 10

    def __getitem__(self, index):
        X = np.random.rand(BS, *dataShape)
        out_vec1 = np.random.rand(BS, outShape)
        out_vec2 = np.random.rand(BS, outShape)
        return X, [out_vec1, out_vec2]

    def on_epoch_end(self):
        self.id+=1
        X = np.random.rand(1, *dataShape)

        y = self.model.predict(X)
        print("Before switch", y)

        head1 = self.model.get_layer('head1')
        head2 = self.model.get_layer('head2')
        if self.id % 2 == 0:
            head1.rate = maxValue
            head2.rate = 0
        else:
            head1.rate = 0
            head2.rate = maxValue

        y = self.model.predict(X)
        print("After switch", y)

model = getModel()

gen = DataGenerator(model)
steps_per_epoch = dataSize // BS


history = model.fit_generator(
    gen,
    steps_per_epoch = steps_per_epoch,
    epochs = 10,
    verbose = 1)


#example2.py
import numpy as np
import tensorflow.keras.backend as K
import tensorflow as tf
from tensorflow.keras import regularizers
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.layers import Dense, GRU, Input, Lambda, Dropout, Layer
from tensorflow.keras.models import Model as KerasModel
from tensorflow.keras.utils import Sequence


maxValue = 1 - tf.keras.backend.epsilon()

dataSize = 100
BS = 10
dataShape = (10, 2)
outShape = 1


class ConstMul(Layer):
    def __init__(self, const_val, *args, **kwargs):
        super().__init__(*args,  **kwargs)
        self.const = const_val

    def call(self, inputs, **kwargs):
        return inputs * self.const


def getModel():
    input_layer = Input(shape=dataShape, name="input")
    x = input_layer
    x = GRU(units=10, unroll=True, name="gru")(x)
    dropoutTraining = True
    x1 = ConstMul(1, name="head1")(x, training=dropoutTraining)
    x2 = ConstMul(0, name="head2")(x, training=dropoutTraining)
    out1 = Dense(outShape, use_bias=False)(x1)
    out2 = Dense(outShape, use_bias=False)(x2)
    model = KerasModel(input_layer, [out1, out2])
    model.summary()

    model.compile(
        loss="binary_crossentropy",
        optimizer='Adam')
    return model


class DataGenerator(Sequence):
    def __init__(self, model):
        self.model = model
        self.id = 0

    def __len__(self):
        return 10

    def __getitem__(self, index):
        X = np.random.rand(BS, *dataShape)
        out_vec1 = np.random.rand(BS, outShape)
        out_vec2 = np.random.rand(BS, outShape)
        return X, [out_vec1, out_vec2]

    def on_epoch_end(self):
        self.id+=1
        X = np.random.rand(1, *dataShape)

        y = self.model.predict(X)
        print("Before switch", y)

        head1 = self.model.get_layer('head1')
        head2 = self.model.get_layer('head2')
        if self.id % 2 == 0:
            head1.const = maxValue
            head2.const = 0
        else:
            head1.const = 0
            head2.const = maxValue

        y = self.model.predict(X)
        print("After switch", y)


model = getModel()

gen = DataGenerator(model)
steps_per_epoch = dataSize // BS


history = model.fit_generator(
    gen,
    steps_per_epoch = steps_per_epoch,
    epochs = 10,
    verbose = 1)

#example3.py
import numpy as np
import tensorflow.keras.backend as K
import tensorflow as tf
from tensorflow.keras import regularizers
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.layers import Dense, GRU, Input, Lambda, Dropout, Layer
from tensorflow.keras.models import Model as KerasModel
from tensorflow.keras.utils import Sequence


maxValue = 1 - tf.keras.backend.epsilon()

dataSize = 100
BS = 10
dataShape = (10, 2)
outShape = 1


class ConstMul(Layer):
    def __init__(self, const_val, *args, **kwargs):
        super().__init__(*args,  **kwargs)
        self.const = const_val

    def call(self, inputs, **kwargs):
        return inputs * self.const


def getModel():
    input_layer = Input(shape=dataShape, name="input")
    x = input_layer
    x = GRU(units=10, unroll=True, name="gru")(x)
    dropoutTraining = True
    x1 = ConstMul(1, name="head1")(x, training=dropoutTraining)
    x2 = ConstMul(0, name="head2")(x, training=dropoutTraining)
    out1 = Dense(outShape, use_bias=False)(x1)
    out2 = Dense(outShape, use_bias=False)(x2)
    model = KerasModel(input_layer, [out1, out2])
    model.summary()

    model.compile(
        loss="binary_crossentropy",
        optimizer='Adam')
    return model


class DataGenerator(Sequence):
    def __init__(self, model):
        self.model = model
        self.id = 0

    def __len__(self):
        return 10

    def __getitem__(self, index):
        X = np.random.rand(BS, *dataShape)
        out_vec1 = np.random.rand(BS, outShape)
        out_vec2 = np.random.rand(BS, outShape)
        return X, [out_vec1, out_vec2]

    def on_epoch_end(self):
        self.id+=1
        X = np.random.rand(1, *dataShape)

        y = self.model.predict(X)
        print("Before switch", y)

        head1 = self.model.get_layer('head1')
        head2 = self.model.get_layer('head2')
        if self.id % 2 == 0:
            head1.const = maxValue
            head2.const = 0
        else:
            head1.const = 0
            head2.const = maxValue

        self.model.compile(
            loss="binary_crossentropy",
            optimizer='Adam')

        y = self.model.predict(X)
        print("After switch", y)


model = getModel()

gen = DataGenerator(model)
steps_per_epoch = dataSize // BS


history = model.fit_generator(
    gen,
    steps_per_epoch = steps_per_epoch,
    epochs = 10,
    verbose = 1)

Relevant log output

No response

@google-ml-butler google-ml-butler bot added the type:support Support issues label Dec 6, 2023
@ymuv ymuv changed the title https://arxiv.org/abs/2210.05657 Keras two-head controllable gradient flow Dec 6, 2023
@SuryanarayanaY SuryanarayanaY added the TF 2.11 Issues related to TF 2.11 label Dec 7, 2023
@sachinprasadhs sachinprasadhs added comp:keras Keras related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Dec 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.11 Issues related to TF 2.11 type:support Support issues
Projects
None yet
Development

No branches or pull requests

4 participants