InvalidArgumentError when using class_weight in Model.fit with labels having extra dimension for time step #48555

oO0oO0oO0o0o00 · 2021-04-16T09:26:38Z

We used the class_weight parameter to assign different weights for samples of different classes in keras.Model.fit().
Our data (input) and labels (output) have an extra time-step dimension besides the batch dimension.
The labels are indices of classes (i.e. not one-hot encoded), used along with the sparse CE loss function.
We've written a minimal reproduction script (as presented below) to simplify the situation.

It looks like that class_weight is only designed for outputs of shape (batch_dim, n_classes).

Possible workarounds:

convert class_weight to sample_weight
collapse time-step axis into batch axis using tf.reshape() in a Lambda layer, which would mess the code up

I've noticed there are relevant issues but they are left there and closed as the authors did not reply in time.

System information

Have I written custom code: Yes
OS Platform and Distribution: Linux Ubuntu 16.04
TensorFlow installed from (source or binary): Anaconda, binary
TensorFlow version (use command below): 2.2.0
Python version: 3.8.5
CUDA/cuDNN version: cudnn 7.6.5 cuda10.1_0.conda (irrelevant)
GPU model and memory: Tesla K40c, 11441MiB (irrelevant)

Describe the current behavior
Crashed on model.fit with the following exception (traceback appended at the end as it is too long):

InvalidArgumentError: indices[0] = 3 is not in [0, 2)
[[{{node GatherV2}}]] [Op:IteratorGetNext]

The training progress bar did not appear.

Describe the expected behavior
Complete the training, though the minimal reproduction script has no actual trainable parameter.

Standalone code to reproduce the issue

import numpy
import tensorflow as tf
from tensorflow import keras

def test_class_weight_error():
    # the model simply return the input time_step*n_class=20x2 data as-is
    model = keras.Sequential([keras.layers.Reshape((20, 2), input_shape=(20, 2))])
    # run_eagerly improves the readability of the traceback a bit
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', run_eagerly=True)
    model.summary()
    # X (inputs, as well as y_pred): samples*time_step*n_classes=15x20x2
    xs = tf.reshape(tf.one_hot(tf.ones(300, dtype=tf.int32), 2), [-1, 20, 2]).numpy()
    # Y (labels i.e. y_true): samples*time_step=15x20, class labels of 0 or 1
    ys = np.ones([15, 20], dtype=np.int32)
    # without the line below (a.k.a. all labels being 1) there's no exception
    ys[:,:3] = 0
    # here's the crash
    model.fit(xs, ys, batch_size=3, class_weight={0:1.,1:1.})
    
test_class_weight_error()

Other info / logs

Traceback:
traceback.txt

Model Structure (in minimal repro script):

Model: "sequential_64"
Layer (type) Output Shape Param
reshape_64 (Reshape) (None, 20, 2) 0
Total params: 0
Trainable params: 0
Non-trainable params: 0

The text was updated successfully, but these errors were encountered:

Saduf2019 · 2021-04-25T16:29:01Z

I am able to replicate this issue on tf 2.2, tf 2.4 and nightly, please find the gist here

nealchau · 2021-07-13T18:01:12Z

I was also able to reproduce this on TF 2.4. From the reference page for fit it also seems that the class_weight is argument is specifically created to allow the user to weight by class:

class_weight	Optional named list mapping indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to "pay more attention" to samples from an under-represented class.

chunduriv · 2021-10-04T11:16:04Z

I was able to reproduce the issue on colab using TF2.6 and TF-nightly(2.8.0-dev20211003).Please find the gist here for reference.Thanks!

tilakrayal · 2024-09-24T13:03:32Z

Hi,

Thank you for opening this issue. Since this issue has been open for a long time, the code/debug information for this issue may not be relevant with the current state of the code base.

The Tensorflow team is constantly improving the framework by fixing bugs and adding new features. We suggest you try the latest TensorFlow version with the latest compatible hardware configuration which could potentially resolve the issue. If you are still facing the issue, please create a new GitHub issue with your latest findings, with all the debugging information which could help us investigate.

Please follow the release notes to stay up to date with the latest developments which are happening in the Tensorflow space.

google-ml-butler · 2024-09-28T03:27:09Z

Are you satisfied with the resolution of your issue?
Yes
No

oO0oO0oO0o0o00 added the type:bug Bug label Apr 16, 2021

google-ml-butler bot assigned UsharaniPagadala Apr 16, 2021

UsharaniPagadala added TF 2.2 Issues related to TF 2.2 comp:keras Keras related issues labels Apr 16, 2021

UsharaniPagadala assigned Saduf2019 and unassigned UsharaniPagadala Apr 19, 2021

Saduf2019 assigned ymodak and unassigned Saduf2019 Apr 25, 2021

ymodak added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label May 6, 2021

chunduriv added 2.6.0 and removed TF 2.2 Issues related to TF 2.2 labels Oct 4, 2021

Venkat6871 added TF 2.16 and removed 2.6.0 labels Apr 2, 2024

tilakrayal added the stat:awaiting response Status - Awaiting response from author label Sep 24, 2024

oO0oO0oO0o0o00 closed this as completed Sep 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InvalidArgumentError when using class_weight in Model.fit with labels having extra dimension for time step #48555

InvalidArgumentError when using class_weight in Model.fit with labels having extra dimension for time step #48555

InvalidArgumentError when using class_weight in Model.fit with labels having extra dimension for time step #48555

InvalidArgumentError when using class_weight in Model.fit with labels having extra dimension for time step #48555

Comments