Keras two-head controllable gradient flow #62576
Labels
comp:keras
Keras related issues
stat:awaiting tensorflower
Status - Awaiting response from tensorflower
TF 2.11
Issues related to TF 2.11
type:support
Support issues
Issue type
Support
Have you reproduced the bug with TensorFlow Nightly?
Yes
Source
binary
TensorFlow version
2.11.0
Custom code
No
OS platform and distribution
Windows 10
Mobile device
No response
Python version
3.9.11
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
We use a two-head model.
We want only one head to be involved in the training, the other - not. And after that - vice versa. So the gradient spreads to the body from one head, and not from the other, after which the heads change roles. The approach is similar to https://arxiv.org/abs/2210.05657 .
We have tried some ways to achieve this:
Before every head we add dropout with rate = 1 or ~0 and in the generator class change drop rate ((for example, on method getitem() or on_epoch_end() get model layers and change rate: self.model.get_layer('dr1').rate=0 or 1). Please see example 1.
Use a custom layer before every head. This layer mul its input to some const. Also, we have tried to change this const var to 0/1 in data. Please see example 2.
We have tried to recompile the model after applying changes from paragraphs 1) and 2), but it cause error:
tmp_logs = self.train_function(iterator)
TypeError: 'NoneType' object is not callable
Please see example 3.
Unfortunately, all upper mentioned approaches didn't work in our case: we can see that the weights have changed in the model, but the output of the model doesn`t change. Prediction output for the model with const=0 or const=1 is the same
I want to ask, is it a normal behavior or a bug?
Standalone code to reproduce the issue
Relevant log output
No response
The text was updated successfully, but these errors were encountered: