Conversion failure: tfl.batch_matmul "expected 3 but got 2" (regression since 2.14, worked in 2.13) #65769

gustavla · 2024-04-15T23:12:25Z

1. System information

macOS 14.1.2 / Python 3.10
pip install tensorflow-macos==2.16.1 tf-keras==2.16.0 keras==3.2.1

2. Code

import tensorflow as tf
import keras

input0_shape = [1, 5]
input1_shape = [1, 5, 7]
output_shape = [1, 1, 7]

tf_input0 = keras.Input(input0_shape[1:], batch_size=1)
tf_input1 = keras.Input(input1_shape[1:], batch_size=1)


class MyMatMul(keras.layers.Layer):
    def call(self, tf_input0, tf_input1):
        # -> [1, 1, 5]
        tf_input0_rank3 = tf.expand_dims(tf_input0, [1])

        # [1, 1, 5] x [1, 5, 7] -> [1, 1, 7]
        tf_output_rank3 = tf.linalg.matmul(tf_input0_rank3, tf_input1)

        # -> [1, 7]
        tf_output = tf.squeeze(tf_output_rank3, [1])

        return tf_output

tf_output = MyMatMul()(tf_input0, tf_input1)

model = keras.Model(inputs=[tf_input0, tf_input1], outputs=[tf_output])

# Convert the model.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

# Save the model.
with open('model.tflite', 'wb') as f:
  f.write(tflite_model)

3. Failure after conversion

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1713222146.555150 18460241 tf_tfl_flatbuffer_helpers.cc:390] Ignored output_format.
W0000 00:00:1713222146.555506 18460241 tf_tfl_flatbuffer_helpers.cc:393] Ignored drop_control_dependency.
2024-04-15 16:02:26.556328: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /var/folders/d5/6vzc45z14_79pfg8d_mbpmsr049hy4/T/tmpw802bhj6
2024-04-15 16:02:26.556581: I tensorflow/cc/saved_model/reader.cc:51] Reading meta graph with tags { serve }
2024-04-15 16:02:26.556589: I tensorflow/cc/saved_model/reader.cc:146] Reading SavedModel debug info (if present) from: /var/folders/d5/6vzc45z14_79pfg8d_mbpmsr049hy4/T/tmpw802bhj6
2024-04-15 16:02:26.558343: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2024-04-15 16:02:26.558600: I tensorflow/cc/saved_model/loader.cc:234] Restoring SavedModel bundle.
2024-04-15 16:02:26.582278: I tensorflow/cc/saved_model/loader.cc:218] Running initialization op on SavedModel bundle at path: /var/folders/d5/6vzc45z14_79pfg8d_mbpmsr049hy4/T/tmpw802bhj6
2024-04-15 16:02:26.583903: I tensorflow/cc/saved_model/loader.cc:317] SavedModel load for tags { serve }; Status: success: OK. Took 27577 microseconds.
2024-04-15 16:02:26.602113: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
loc(fused[callsite(callsite(fused["Squeeze:", "functional_1_1/my_mat_mul_1/Squeeze@__inference_serving_default_16"] at fused["PartitionedCall:", "PartitionedCall@__inference_signature_wrapper_serving_default_23"]) at fused["PartitionedCall:", "PartitionedCall"]), callsite(callsite(fused["ExpandDims:", "functional_1_1/my_mat_mul_1/ExpandDims@__inference_serving_default_16"] at fused["PartitionedCall:", "PartitionedCall@__inference_signature_wrapper_serving_default_23"]) at fused["PartitionedCall:", "PartitionedCall"]), callsite(callsite(fused["BatchMatMulV2:", "functional_1_1/my_mat_mul_1/MatMul@__inference_serving_default_16"] at fused["PartitionedCall:", "PartitionedCall@__inference_signature_wrapper_serving_default_23"]) at fused["PartitionedCall:", "PartitionedCall"])]): error: 'tfl.batch_matmul' op found invalid output rank, expected 3 but got 2

The key is the final line: error: 'tfl.batch_matmul' op found invalid output rank, expected 3 but got 2.

Note that the expand and squeeze are both required for the failure to reproduce.

4. Regression analysis

Replace tensorflow version in the pip installation above:

2.13: Works (no conversion error, model.tflite produced successfully)
2.14: Failure
2.15: Failure
2.16: Failure

The above error happens across 2.14-2.16.

The text was updated successfully, but these errors were encountered:

sawantkumar · 2024-04-29T12:06:23Z

hi @gustavla ,

I have replicated the issue and i got similar results. I am looking into the issue and will get back to you.

pkgoogle · 2024-05-06T18:19:30Z

Hi @gustavla, can you let us know what chip your mac is using? M series? Intel? Thanks for your help.

gustavla · 2024-05-08T00:49:00Z

@pkgoogle Apple silicon. Also repros on x86_64 Ubuntu.

pkgoogle · 2024-05-08T20:09:13Z

I was able to reproduce on x86_64 Debian on tf-nightly as well with the exact program above. One difference is I got this error/warning:

tensorflow.lite.python.convert_phase.ConverterError: Variable constant folding is failed. Please consider using enabling `experimental_enable_resource_variables` flag in the TFLite converter object. For example, converter.experimental_enable_resource_variables = True test.py:32:1: error: 'tfl.batch_matmul' op found invalid output rank, expected 3 but got 2

I attempted adding this line to see if it helps:

converter.experimental_enable_resource_variables = True

It did not help. @zichuan-wei, can you please take a look? Thanks.

pkgoogle · 2024-06-10T20:35:41Z

Hi @gustavla, if you are able to access a linux system you may be able to resolve your issue by using AI-Edge-Torch, you can find more information here: googleblog.

I have actually created a simple script for converting your model here:

import torch
import torch.nn as nn
import ai_edge_torch


class MyMatMul(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x0, x1):
        x0_rank3 = torch.unsqueeze(x0, 1)
        out = x0_rank3 @ x1
        out = torch.squeeze(out, 1)
        return out


input0_shape = (1, 5)
input1_shape = (1, 5, 7)
model = MyMatMul()
sample_inputs = (torch.randn(*input0_shape), torch.randn(*input1_shape))

edge_model = ai_edge_torch.convert(model.eval(), sample_inputs)
edge_model.export("my_mat_mul.tflite")

If you want to, you can actually try visualizing the result in model-explorer as well.

Please try them out and let us know if this resolves your issue. If you still need further help, feel free to open a new issue at the respective repo.

github-actions · 2024-06-18T01:51:36Z

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions · 2024-06-26T01:51:04Z

This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further.

gustavla added the TFLiteConverter For issues related to TFLite converter label Apr 15, 2024

google-ml-butler bot assigned Venkat6871 Apr 15, 2024

Venkat6871 added the comp:lite TF Lite related issues label Apr 17, 2024

Venkat6871 assigned LakshmiKalaKadali and unassigned Venkat6871 Apr 18, 2024

LakshmiKalaKadali assigned sawantkumar and unassigned LakshmiKalaKadali Apr 18, 2024

sawantkumar added the WIP label Apr 29, 2024

sawantkumar assigned pkgoogle and unassigned sawantkumar May 6, 2024

pkgoogle added stat:awaiting response Status - Awaiting response from author and removed WIP labels May 6, 2024

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label May 8, 2024

pkgoogle added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label May 8, 2024

pkgoogle assigned zichuan-wei May 8, 2024

pkgoogle added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Jun 10, 2024

github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jun 18, 2024

github-actions bot closed this as completed Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conversion failure: tfl.batch_matmul "expected 3 but got 2" (regression since 2.14, worked in 2.13) #65769

Conversion failure: tfl.batch_matmul "expected 3 but got 2" (regression since 2.14, worked in 2.13) #65769

Conversion failure: tfl.batch_matmul "expected 3 but got 2" (regression since 2.14, worked in 2.13) #65769

Conversion failure: tfl.batch_matmul "expected 3 but got 2" (regression since 2.14, worked in 2.13) #65769

Comments

1. System information

2. Code

3. Failure after conversion

4. Regression analysis