[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiOutput Keras Model Evaluation Issue #111

Closed
wlee192 opened this issue Feb 9, 2021 · 21 comments
Closed

MultiOutput Keras Model Evaluation Issue #111

wlee192 opened this issue Feb 9, 2021 · 21 comments

Comments

@wlee192
Copy link
wlee192 commented Feb 9, 2021

System information

  • Have I written custom code (as opposed to using a stock example script
    provided in TensorFlow Model Analysis)
    : NO
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Google Colab
  • TensorFlow Model Analysis installed from (source or binary): source
  • TensorFlow Model Analysis version (use command below): 0.27.0
  • Python version: 3.6.9
  • Jupyter Notebook version: Google Colab
  • Exact command to reproduce:

Describe the problem

Hi, i've been following the TFX Chicago Taxi Example (https://www.tensorflow.org/tfx/tutorials/tfx/components_keras#evaluator) to factor my TensorFlow code into the TFX framework.

However, for my use case, it's a multi-output keras model, where the model consumes a given input, and produces 2 outputs (both being multi-class).

If i ran the evaluator component with just 1 output (e.g.: disable the other output in my model) , it works fine and i can run tfma.run_model_analysis without an issue.

However, reverting to my multi-output model, running the evaluator component throws up an error.

Model - output_0 has 5 classes, and output_1 has 8 classes to predict >>

signature_def['serving_raw']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['CREDIT'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, -1)
        name: serving_raw_CREDIT:0
    inputs['DEBIT'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, -1)
        name: serving_raw_DEBIT:0
    inputs['DESCRIPTION'] tensor_info:
        dtype: DT_STRING
        shape: (-1, -1)
        name: serving_raw_DESCRIPTION:0
    inputs['TRADEDATE'] tensor_info:
        dtype: DT_STRING
        shape: (-1, -1)
        name: serving_raw_TRADEDATE:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output_0'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 5)
        name: StatefulPartitionedCall_2:0
    outputs['output_1'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 8)

Eval_Config >>

eval_config = tfma.EvalConfig(
    model_specs=[     
        tfma.ModelSpec(label_key='my_label_key')
    ],
    metrics_specs=[
        tfma.MetricsSpec(
            metrics=[
                tfma.MetricConfig(class_name = 'SparseCategoricalAccuracy', 
                                  threshold=tfma.MetricThreshold(
                                      value_threshold=tfma.GenericValueThreshold(lower_bound={'value': 0.5}),
                                      change_threshold=tfma.GenericChangeThreshold(
                                          direction=tfma.MetricDirection.HIGHER_IS_BETTER,
                                          absolute={'value': -1e-10}))),
                tfma.MetricConfig(class_name = 'MultiClassConfusionMatrixPlot'),
                tfma.MetricConfig(class_name = "Precision"),
                tfma.MetricConfig(class_name = "Recall")
            ], 
            output_names =['output_0']
        ),
         tfma.MetricsSpec(
            metrics=[
                tfma.MetricConfig(class_name = 'SparseCategoricalAccuracy', 
                                  threshold=tfma.MetricThreshold(
                                      value_threshold=tfma.GenericValueThreshold(lower_bound={'value': 0.5}),
                                      change_threshold=tfma.GenericChangeThreshold(
                                          direction=tfma.MetricDirection.HIGHER_IS_BETTER,
                                          absolute={'value': -1e-10}))),
                tfma.MetricConfig(class_name = 'MultiClassConfusionMatrixPlot'),
                tfma.MetricConfig(class_name = "Precision"),
                tfma.MetricConfig(class_name = "Recall")
            ], 
            output_names =['output_1']
         )
    ],
    slicing_specs=[
        tfma.SlicingSpec(),
    ])

Running tfma.run_model_analysis using the above eval_config,

keras_model_path = os.path.join(trainer.outputs['model'].get()[0].uri,'serving_model_dir') # gets the model from the trainer stage
keras_eval_shared_model = tfma.default_eval_shared_model(
    eval_saved_model_path=keras_model_path,
    eval_config=eval_config)

keras_output_path = os.path.join(os.getcwd(), 'keras2')
tfrecord_file = '/tmp/tfx-interactive-2021-02-09T06_02_48.210135-95bh38cw/Transform/transformed_examples/5/train/transformed_examples-00000-of-00001.gz'
# Run TFMA
keras_eval_result = tfma.run_model_analysis(
    eval_shared_model=keras_eval_shared_model,
    eval_config=eval_config,
    data_location=tfrecord_file,
    output_path=keras_output_path)

I get an error message of the below >>


ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow_model_analysis/model_util.py in process(self, element)
    667     try:
--> 668       result = self._batch_reducible_process(element)
    669       self._batch_size.update(batch_size)

118 frames
ValueError: could not broadcast input array from shape (5) into shape (1)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
ValueError: could not broadcast input array from shape (5) into shape (1)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

ValueError: could not broadcast input array from shape (5) into shape (1) [while running 'ExtractEvaluateAndWriteResults/ExtractAndEvaluate/ExtractPredictions/Predict']

I've tried to find code examples of multi-output eval_config but haven't come across one yet.

Following the documentation, i've arrived at what i think the eval_config should be for a multi-output model - however is it set up correctly given the error message?

@mdreves
Copy link
Member
mdreves commented Feb 9, 2021

Can you provide more of the stack trace? The config won't work as is because Precision and Recall are binary classification metrics so they need to either have a top_k parameter set or be binarized using BinarizationOptions, but it is failing at inference before it gets to evaluation so need to see what is causing that failure first.

@wlee192
Copy link
Author
wlee192 commented Feb 10, 2021

Hi @mdreves ,

I've ran it again, see below the full stack trace. Thanks

ARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Python program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.

Two checkpoint references resolved to different objects (<tensorflow.python.keras.saving.saved_model.load.TensorFlowTransform>TransformFeaturesLayer object at 0x7fb77d53b898> and <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7fb76ec0e630>).
WARNING:tensorflow:7 out of the last 7 calls to <function recreate_function.<locals>.restored_function_body at 0x7fb7fe0d1048> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
WARNING:tensorflow:8 out of the last 8 calls to <function recreate_function.<locals>.restored_function_body at 0x7fb771a30c80> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
WARNING:absl:Tensorflow version (2.4.1) found. Note that TFMA support for TF 2.0 is currently in beta
ERROR:absl:There are change thresholds, but the baseline is missing. This is allowed only when rubber stamping (first run).
Exception ignored in: <bound method CapturableResourceDeleter.__del__ of <tensorflow.python.training.tracking.tracking.CapturableResourceDeleter object at 0x7fb76f542898>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/tracking.py", line 208, in __del__
    self._destroy_resource()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 871, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 726, in _initialize
    *args, **kwds))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 3206, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/function_deserialization.py", line 253, in restored_function_body
    return _call_concrete_function(function, inputs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/function_deserialization.py", line 75, in _call_concrete_function
    result = function._call_flat(tensor_inputs, function._captured_inputs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/load.py", line 116, in _call_flat
    cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1932, in _call_flat
    flat_outputs = forward_function.call(ctx, args_with_tangents)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 589, in call
    executor_type=executor_type)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/functional_ops.py", line 1206, in partitioned_call
    f.add_to_graph(graph)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 505, in add_to_graph
    g._add_function(self)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3396, in _add_function
    gradient)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 'func' argument to TF_GraphCopyFunction cannot be null
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Python program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.

Two checkpoint references resolved to different objects (<tensorflow.python.keras.saving.saved_model.load.TensorFlowTransform>TransformFeaturesLayer object at 0x7fb7dadf4828> and <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7fb7da93c240>).
WARNING:tensorflow:9 out of the last 9 calls to <function recreate_function.<locals>.restored_function_body at 0x7fb77db652f0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
WARNING:tensorflow:10 out of the last 10 calls to <function recreate_function.<locals>.restored_function_body at 0x7fb77b6f5488> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Python program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.

Two checkpoint references resolved to different objects (<tensorflow.python.keras.saving.saved_model.load.TensorFlowTransform>TransformFeaturesLayer object at 0x7fb77dfd7ef0> and <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7fb77dd22ba8>).
WARNING:tensorflow:11 out of the last 11 calls to <function recreate_function.<locals>.restored_function_body at 0x7fb77b44d950> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
WARNING:tensorflow:11 out of the last 11 calls to <function recreate_function.<locals>.restored_function_body at 0x7fb77b44de18> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
WARNING:absl:Large batch_size 1 failed with error could not broadcast input array from shape (5) into shape (1). Attempting to run batch through serially. Note that this will significantly affect the performance.
WARNING:tensorflow:11 out of the last 11 calls to <function recreate_function.<locals>.restored_function_body at 0x7fb77cc1e378> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
Exception ignored in: <bound method CapturableResourceDeleter.__del__ of <tensorflow.python.training.tracking.tracking.CapturableResourceDeleter object at 0x7fb7da8f8fd0>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/tracking.py", line 208, in __del__
    self._destroy_resource()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 871, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 726, in _initialize
    *args, **kwds))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 3206, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/function_deserialization.py", line 253, in restored_function_body
    return _call_concrete_function(function, inputs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/function_deserialization.py", line 75, in _call_concrete_function
    result = function._call_flat(tensor_inputs, function._captured_inputs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/load.py", line 116, in _call_flat
    cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1932, in _call_flat
    flat_outputs = forward_function.call(ctx, args_with_tangents)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 589, in call
    executor_type=executor_type)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/functional_ops.py", line 1206, in partitioned_call
    f.add_to_graph(graph)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 505, in add_to_graph
    g._add_function(self)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3396, in _add_function
    gradient)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 'func' argument to TF_GraphCopyFunction cannot be null
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow_model_analysis/model_util.py in process(self, element)
    667     try:
--> 668       result = self._batch_reducible_process(element)
    669       self._batch_size.update(batch_size)

118 frames
ValueError: could not broadcast input array from shape (5) into shape (1)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
ValueError: could not broadcast input array from shape (5) into shape (1)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

ValueError: could not broadcast input array from shape (5) into shape (1) [while running 'ExtractEvaluateAndWriteResults/ExtractAndEvaluate/ExtractPredictions/Predict']

@wlee192
Copy link
Author
wlee192 commented Feb 14, 2021

Hi @mdreves ,

Just checking if you've had the chance to look further into the above yet?

Thanks

@mdreves
Copy link
Member
mdreves commented Feb 16, 2021

The stack trace still seems to be truncated (and mixed with possibly other concerning errors - although TF seems to ignore them). However, I did just notice that you proved an example serving signature called 'serving_raw', but you didn't set the signature name in your config (the default is to call the model directly if no signature is provided). Try updating your config to the following:

eval_config = tfma.EvalConfig(
    model_specs=[     
        tfma.ModelSpec(signature_name='serving_raw', label_key='my_label_key')
    ],
    ...
)

If this doesn't work, what I would do is load the saved model directly and try to call inference on it and see if you get an error. For example:

model = tf.keras.models.load_model(<path-to-saved-model>)
model(<your-model-input-features>)
OR
model.signatures['serving_raw'](<your-serving-inputs>)

@wlee192
Copy link
Author
wlee192 commented Feb 17, 2021

Hi @mdreves ,

Thanks for the suggestion.
Putting in the "signature_name" didn't resolve the issue.

Just to give a bit more background, the model i have has 2 signatures.
One is 'serving_default', and another is 'serving_raw' as shown above.

'serving_raw' is from following the TFX example here (i've included a snippet below) >> https://www.tensorflow.org/tfx/tutorials/tfx/components_keras#trainer

def _get_serve_tf_examples_fn(model, tf_transform_output):
  """Returns a function that parses a serialized tf.Example and applies TFT."""

  model.tft_layer = tf_transform_output.transform_features_layer()

  @tf.function
  def serve_tf_examples_fn(serialized_tf_examples):
    """Returns the output to be used in the serving signature."""
    feature_spec = tf_transform_output.raw_feature_spec()
    # i have 2 labels here so pop both in the loop _LABEL_KEYS is a list
    for i in _LABEL_KEYS:
        feature_spec.pop(i)
    parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)
    transformed_features = model.tft_layer(parsed_features)
    return model(transformed_features)

  return serve_tf_examples_fn

However, the client that will be interacting with the model does not have the capability to encode the incoming request into a binary single input as expected by 'serving_default', which is why i've created another signature called 'serving_raw' to accept raw input features for the model to predict.

def _get_serve_raw(model, tf_transform_output):
  model.tft_layer = tf_transform_output.transform_features_layer()
  @tf.function
  def serve_raw_fn(CREDIT, DEBIT, TRADEDATE, DESCRIPTION):
    def to_sparse_str(dense):
      idx = tf.where(tf.not_equal(dense, ""))
      return tf.SparseTensor(idx, tf.gather_nd(dense, idx), tf.shape(dense, out_type=tf.int64))
    def to_sparse_flt(dense):
      idx = tf.where(tf.not_equal(dense, 0))
      return tf.SparseTensor(idx, tf.gather_nd(dense, idx), tf.shape(dense, out_type=tf.int64))
    parsed_features = {'CREDIT' : to_sparse_flt(CREDIT),
                       'DEBIT' : to_sparse_flt(DEBIT),
                       'DESCRIPTION': to_sparse_str(DESCRIPTION),
                       'TRADEDATE': to_sparse_str(TRADEDATE)}
    transformed_features = model.tft_layer(parsed_features)
    outputs = model(transformed_features)
    return outputs
  return serve_raw_fn

And the signature for the model is:

    signatures = {
      'serving_default':_get_serve_tf_examples_fn(model,tf_transform_output).get_concrete_function(tf.TensorSpec(shape=[None],dtype=tf.string,name='examples')),
      'serving_raw': _get_serve_raw(model, tf_transform_output).get_concrete_function(
                                    tf.TensorSpec(shape=[None,None], dtype=tf.float32, name='CREDIT'),  
                                    tf.TensorSpec(shape=[None,None], dtype=tf.float32, name='DEBIT'),
                                    tf.TensorSpec(shape=[None,None], dtype=tf.string, name='DESCRIPTION'),
                                    tf.TensorSpec(shape=[None,None], dtype=tf.string, name='TRADEDATE'))
  }

I've checked by loading the saved model back in via model = tf.keras.models.load_model(model path) .
The model output is:

{'output_0': TensorSpec(shape=(None, 5), dtype=tf.float32, name='output_0'),
 'output_1': TensorSpec(shape=(None, 8), dtype=tf.float32, name='output_1')}

And checking it with a test data (The serving component includes the transform layer to transform raw features that's expected by the model hence checking with a 'transformed' example features to test directly with the model)

[{'Day_RScaled': array([0.25], dtype=float32),
  'Quarter_RScaled': array([0.2], dtype=float32),
  'DESCRIPTION_xf': array([b'admin fees dec   iv'], dtype=object),
  'DEBIT_xf': array([-0.15819775], dtype=float32),
  'CREDIT_xf': array([-0.18726462], dtype=float32)}]

This successfully returns the prediction (probability distribution) for the 2 outputs when i call model.predict(testdata)

[array([[1.8274054e-07, 9.9437696e-01, 1.3195525e-03, 4.2717359e-03,
         3.1528725e-05]], dtype=float32),
 array([[7.0776665e-08, 9.9168360e-01, 2.2847725e-03, 5.2758754e-04,
         4.5948986e-05, 2.2446657e-04, 1.7628946e-03, 3.4705126e-03]],
       dtype=float32)]

I've tried to increase the verbosity of the output and have ran the TFMA component again, below is what is printed from the TFMA error. Hopefully this gives a bit more info?

Also, i'm thinking it should be fine to leave tfma.ModelSpec(label_key='my_label_key') as-is without specifying the signature key so it calls the model as default, since the 'transformed feature' data in the tfrecord_file (gzip of tfrecord file from transform stage) provided to TFMA has the right input to pass to the model..?

Eval_Config (with precision and recall taken out for now) >>

eval_config = tfma.EvalConfig(
    model_specs=[     
        tfma.ModelSpec(label_key='my_label_key')
    ],
    metrics_specs=[
        tfma.MetricsSpec(
            metrics=[
                tfma.MetricConfig(class_name = 'SparseCategoricalAccuracy', 
                                  threshold=tfma.MetricThreshold(
                                      value_threshold=tfma.GenericValueThreshold(lower_bound={'value': 0.5}),
                                      change_threshold=tfma.GenericChangeThreshold(
                                          direction=tfma.MetricDirection.HIGHER_IS_BETTER,
                                          absolute={'value': -1e-10}))),
                tfma.MetricConfig(class_name = 'MultiClassConfusionMatrixPlot')
            ], 
            output_names =['output_0']
        ),
         tfma.MetricsSpec(
            metrics=[
                tfma.MetricConfig(class_name = 'SparseCategoricalAccuracy', 
                                  threshold=tfma.MetricThreshold(
                                      value_threshold=tfma.GenericValueThreshold(lower_bound={'value': 0.5}),
                                      change_threshold=tfma.GenericChangeThreshold(
                                          direction=tfma.MetricDirection.HIGHER_IS_BETTER,
                                          absolute={'value': -1e-10}))),
                tfma.MetricConfig(class_name = 'MultiClassConfusionMatrixPlot')
            ], 
            output_names =['output_1']
         )
    ],
    slicing_specs=[
        tfma.SlicingSpec(),
    ])

TFMA trace >>

WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Python program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.

Two checkpoint references resolved to different objects (<tensorflow.python.keras.saving.saved_model.load.TensorFlowTransform>TransformFeaturesLayer object at 0x7f4773e024e0> and <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7f4773d14358>).
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Python program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.

Two checkpoint references resolved to different objects (<tensorflow.python.keras.saving.saved_model.load.TensorFlowTransform>TransformFeaturesLayer object at 0x7f4767506d30> and <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7f476750f128>).
WARNING:tensorflow:5 out of the last 5 calls to <function recreate_function.<locals>.restored_function_body at 0x7f4772956d90> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
WARNING:tensorflow:6 out of the last 6 calls to <function recreate_function.<locals>.restored_function_body at 0x7f4772961f28> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
Exception ignored in: <bound method CapturableResourceDeleter.__del__ of <tensorflow.python.training.tracking.tracking.CapturableResourceDeleter object at 0x7f47d08f82e8>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/tracking.py", line 208, in __del__
    self._destroy_resource()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 871, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 726, in _initialize
    *args, **kwds))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 3206, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/function_deserialization.py", line 253, in restored_function_body
    return _call_concrete_function(function, inputs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/function_deserialization.py", line 75, in _call_concrete_function
    result = function._call_flat(tensor_inputs, function._captured_inputs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/load.py", line 116, in _call_flat
    cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1932, in _call_flat
    flat_outputs = forward_function.call(ctx, args_with_tangents)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 589, in call
    executor_type=executor_type)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/functional_ops.py", line 1206, in partitioned_call
    f.add_to_graph(graph)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 505, in add_to_graph
    g._add_function(self)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3396, in _add_function
    gradient)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 'func' argument to TF_GraphCopyFunction cannot be null
Exception ignored in: <bound method CapturableResourceDeleter.__del__ of <tensorflow.python.training.tracking.tracking.CapturableResourceDeleter object at 0x7f477295cfd0>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/tracking.py", line 208, in __del__
    self._destroy_resource()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 871, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 726, in _initialize
    *args, **kwds))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 3206, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/function_deserialization.py", line 253, in restored_function_body
    return _call_concrete_function(function, inputs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/function_deserialization.py", line 75, in _call_concrete_function
    result = function._call_flat(tensor_inputs, function._captured_inputs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/load.py", line 116, in _call_flat
    cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1932, in _call_flat
    flat_outputs = forward_function.call(ctx, args_with_tangents)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 589, in call
    executor_type=executor_type)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/functional_ops.py", line 1206, in partitioned_call
    f.add_to_graph(graph)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 505, in add_to_graph
    g._add_function(self)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3396, in _add_function
    gradient)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 'func' argument to TF_GraphCopyFunction cannot be null
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Python program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.

Two checkpoint references resolved to different objects (<tensorflow.python.keras.saving.saved_model.load.TensorFlowTransform>TransformFeaturesLayer object at 0x7f47726fd128> and <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7f47726c5f28>).
WARNING:tensorflow:7 out of the last 7 calls to <function recreate_function.<locals>.restored_function_body at 0x7f47720b9a60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
WARNING:tensorflow:8 out of the last 8 calls to <function recreate_function.<locals>.restored_function_body at 0x7f47720a8d08> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order)
WARNING:absl:Large batch_size 1 failed with error could not broadcast input array from shape (5) into shape (1). Attempting to run batch through serially. Note that this will significantly affect the performance.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow_model_analysis/model_util.py in process(self=<tensorflow_model_analysis.model_util.ModelSignaturesDoFn object>, element={'arrow_record_batch': pyarrow.RecordBatch
CREDIT_xf: large_list<item: ...item: large_binary>
  child 0, item: large_binary, 'features': [{'CREDIT_xf': array([-0.18726462], dtype=float32), 'ClassificationA_xf': array([1]), 'ClassificationB_xf': array([1]), 'DEBIT_xf': array([-0.15819775], dtype=float32), 'DESCRIPTION_xf': array([b'admin fees dec  iv'], dtype=object), 'Day_RScaled': array([0.25], dtype=float32), 'Quarter_RScaled': array([0.2], dtype=float32)}], 'input': array([b'\n\xc7\x01\n\x15\n\tCREDIT_xf\x12\x08\x...x12\x06\n\x04\x00\x00\x80>'],
      dtype=object), 'labels': [None], 'transformed_features': [None]})
    667     try:
--> 668       result = self._batch_reducible_process(element)
        result = []
        self._batch_reducible_process = <bound method ModelSignaturesDoFn._batch_reducible_process of <tensorflow_model_analysis.model_util.ModelSignaturesDoFn object at 0x7f477268b4a8>>
        element = {'arrow_record_batch': pyarrow.RecordBatch
CREDIT_xf: large_list<item: float>
  child 0, item: float
ClassificationA_xf: large_list<item: int64>
  child 0, item: int64
ClassificationB_xf: large_list<item: int64>
  child 0, item: int64
DEBIT_xf: large_list<item: float>
  child 0, item: float
DESCRIPTION_xf: large_list<item: large_binary>
  child 0, item: large_binary
Day_RScaled: large_list<item: float>
  child 0, item: float
Quarter_RScaled: large_list<item: float>
  child 0, item: float
__raw_record__: large_list<item: large_binary>
  child 0, item: large_binary, 'features': [{'CREDIT_xf': array([-0.18726462], dtype=float32), 'ClassificationA_xf': array([1]), 'ClassificationB_xf': array([1]), 'DEBIT_xf': array([-0.15819775], dtype=float32), 'DESCRIPTION_xf': array([b'admin fees dec  iv'], dtype=object), 'Day_RScaled': array([0.25], dtype=float32), 'Quarter_RScaled': array([0.2], dtype=float32)}], 'input': array([b'\n\xc7\x01\n\x15\n\tCREDIT_xf\x12\x08\x12\x06\n\x04L\xc2?\xbe\n\x1b\n\x12ClassificationB_xf\x12\x05\x1a\x03\n\x01\x01\n\x1b\n\x0fQuarter_RScaled\x12\x08\x12\x06\n\x04\xcd\xccL>\n(\n\x0eDESCRIPTION_xf\x12\x16\n\x14\n\x12admin fees dec  iv\n\x1b\n\x12ClassificationA_xf\x12\x05\x1a\x03\n\x01\x01\n\x14\n\x08DEBIT_xf\x12\x08\x12\x06\n\x04\x97\xfe!\xbe\n\x17\n\x0bDay_RScaled\x12\x08\x12\x06\n\x04\x00\x00\x80>'],
      dtype=object), 'transformed_features': [None], 'labels': [None]}
    669       self._batch_size.update(batch_size)

125 frames
ValueError: could not broadcast input array from shape (5) into shape (1)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
ValueError: could not broadcast input array from shape (5) into shape (1)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py in asarray(a=[<tf.Tensor: shape=(1, 5), dtype=float32, numpy=
...359e-03,
        3.1528725e-05]], dtype=float32)>, <tf.Tensor: shape=(1, 8), dtype=float32, numpy=
...28946e-03, 3.4705126e-03]],
      dtype=float32)>], dtype=None, order=None)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
        global array = <built-in function array>
        a = [<tf.Tensor: shape=(1, 5), dtype=float32, numpy=
array([[1.8274054e-07, 9.9437696e-01, 1.3195525e-03, 4.2717359e-03,
        3.1528725e-05]], dtype=float32)>, <tf.Tensor: shape=(1, 8), dtype=float32, numpy=
array([[7.0776665e-08, 9.9168360e-01, 2.2847725e-03, 5.2758754e-04,
        4.5948986e-05, 2.2446657e-04, 1.7628946e-03, 3.4705126e-03]],
      dtype=float32)>]
        dtype = None
        global copy = undefined
        order = None
     84 
     85 

ValueError: could not broadcast input array from shape (5) into shape (1) [while running 'ExtractEvaluateAndWriteResults/ExtractAndEvaluate/ExtractPredictions/Predict']

@wlee192
Copy link
Author
wlee192 commented Feb 19, 2021

Hi @mdreves ,

Thought i'll share with you the updates on my end on the issue above.
I tried to isolate the potential issues, and what worked for me in the end, was a small change.

What was happening before when the error above occurred, is that in the Evaluator component of the tfx pipeline i have, i was passing the 'transformed' outputs from the transform stage for evaluation of the 2 classification outputs i have - reason being in the transform layer, i'm encoding the 2 labels from raw string to their int representation.

model_analyzer = Evaluator(
      examples=transform.outputs['transformed_examples'],
      model=trainer.outputs['model'],
      example_splits=['eval', 'test'],
      baseline_model=model_resolver.outputs['model'], # gets the baseline model from the resolvernode (whichever is the latest blessed)
      eval_config=eval_config)

What worked, was instead of passing in the labels as raw string, i've encoded them separately so at the very early stage of ingesting using ExampleGen, the labels that come with the raw data are already encoded. And instead of then using the 'transformed' examples transform.outputs['transformed_examples'] , if i just use the raw example_gen example_gen.outputs['examples'] and pass them to the evaluation component, it runs fine without issue

model_analyzer = Evaluator(
      examples=example_gen.outputs['examples'],
      model=trainer.outputs['model'],
      example_splits=['eval', 'test'],
      baseline_model=model_resolver.outputs['model'], # gets the baseline model from the resolvernode (whichever is the latest blessed)
      eval_config=eval_config)
  1. Q1:

I thought by passing in the 'transformed' artifacts, the evaluator component should then run the comparison between model prediction against the transformed labels (which is what i want since it wouldn't work if i pass in the raw string labels to compare). But it looks like this may not be supported in tfma yet..?

Ideally, i would like to be able to include the label transformation as part of the pipeline (instead of pre-processing it separately beforehand), and still be able to go through the evaluator component so it can evaluate based on the 'transformed' labels. Are you aware if there's a way we can include the transform graph into the Evaluator component if i can't pass in the transformed dataset; e.g.:

model_analyzer = Evaluator(
      examples=example_gen.outputs['examples'],
      TRANSFORM_GRAPH_URI= <Transform graph uri here>,
      model=trainer.outputs['model'],
      example_splits=['eval', 'test'],
      baseline_model=model_resolver.outputs['model'], # gets the baseline model from the resolvernode (whichever is the latest blessed)
      eval_config=eval_config)

So that i can get the transform graph to operate on the raw example_gen.outputs['examples'] data, where it transform the labels, and tfma then use this to evaluate the model?

  1. Q2:

For tfma.view.render plot and tfma.view.render_slicing_metrics rendering API, is there a way to view/ break down the visualization by custom splits? I've tried to look at the documentation but haven't found an example yet.
Instead of the default 'eval' split, i've split the data into ['eval', 'test'], and when the rendering API runs, it shows the combined result without any filters for split.

Reason this may be applicable is to assess the evaluation of the splits individually - .e.g: check the Precision on Test set vs Evaluation set.

  1. Q3:

For the eval_config, I understand the need for output_names as part of the MetricsSpec, however it appears that i need to provide tfma.ModelSpec(label_key='ClassificationA'), and the label_key HAS to match one of the outputs i have (My outputs are ['ClassificationA', 'ClassificationB'].

Anything else, it'll error out.

Just wondering, what role does label_key play if the output is already specified in output_names, and in my case of a multi-output model, logically speaking i would be inclined to configure label_key = ['ClassificationA', 'ClassificationB'] since there's 2 labels..? But label_key only accepts a single string. FYI - i'm using Keras model

The eval_config that works for me is:

eval_config = tfma.EvalConfig(
    model_specs=[
        tfma.ModelSpec(label_key='ClassificationA')
    ],
    metrics_specs=[
        tfma.MetricsSpec(
            metrics=[
                tfma.MetricConfig(class_name = 'SparseCategoricalAccuracy', 
                                  threshold=tfma.MetricThreshold(
                                      value_threshold=tfma.GenericValueThreshold(lower_bound={'value': 0.3}),
                                      change_threshold=tfma.GenericChangeThreshold(
                                          direction=tfma.MetricDirection.HIGHER_IS_BETTER,
                                          absolute={'value': -1e-10}))),
                tfma.MetricConfig(class_name = 'MultiClassConfusionMatrixPlot'),
                tfma.MetricConfig(class_name = "Precision", config ='"top_k": 5'), 
                tfma.MetricConfig(class_name = "Recall", config ='"top_k": 5')
            ], 
            # for multi output specify the output names to go with the metrics to be evaluated.
            output_names =['ClassificationA']
        ),
         tfma.MetricsSpec(
            metrics=[
                tfma.MetricConfig(class_name = 'SparseCategoricalAccuracy', 
                                  threshold=tfma.MetricThreshold(
                                      value_threshold=tfma.GenericValueThreshold(lower_bound={'value': 0.3}),
                                      change_threshold=tfma.GenericChangeThreshold(
                                          direction=tfma.MetricDirection.HIGHER_IS_BETTER,
                                          absolute={'value': -1e-10}))),
                tfma.MetricConfig(class_name = 'MultiClassConfusionMatrixPlot'),
                tfma.MetricConfig(class_name = "Precision", config ='"top_k": 8'), 
                tfma.MetricConfig(class_name = "Recall", config ='"top_k": 8')
            ], 
            output_names =['ClassificationB']
         )
    ],
    slicing_specs=[
        tfma.SlicingSpec(),
    ])

Would appreciate your input on the above. Thanks!

@mdreves
Copy link
Member
mdreves commented Feb 23, 2021

To use TFT with TFMA you need to add a workaround described in [1]. Passing TFT outputs directly should work if your model takes TFT inputs and the names used for the model input names match the TFT output names. Given the labels were causing the issue I suspect the issue is that the outputs from TFT were not in the form that you expected for the labels (i.e. maybe the TFT output was dense while the metrics you used expected sparse).

[1] tensorflow/tfx#2920

The only way I know to do this is to add the splits to your examples as features and then slice on them.

The label_key does not need to match the output_name. The label_key identifies the name of the input feature to use for the label. If you have multiple outputs and a single label_key then the assumption is that you want to use the same label for all outputs. I think what you want is to specify a different label for each output. This is done using the label_keys field (note the 's' on the end). See [2].

[2]

map<string, string> label_keys = 6; // oneof not allowed with maps

@wlee192
Copy link
Author
wlee192 commented Feb 24, 2021

Hi @mdreves ,

Thanks for your input. Much appreciated.

  1. Yup i've tried [1] and it works fine for me!
  2. Noted on 2.
  3. I see, according to Model_spec_documentation

label_keys | repeated LabelKeysEntry label_keys

and LabelKeysEntry expects a key value pair as mentioned here

I'm pretty sure this is just an issue on my config end, but have tried to look around and couldn't find an example that has label_keys set

I've set it to what i believe is required as per the documentation;

val_config = tfma.EvalConfig(
    model_specs=[
        tfma.ModelSpec(label_keys=[tfma.ModelSpec.LabelKeysEntry(key = _transformed_name(_CATEGORICAL_LABEL_KEYS[0]), value = _transformed_name(_CATEGORICAL_LABEL_KEYS[0])),
                                   tfma.ModelSpec.LabelKeysEntry(key = _transformed_name(_CATEGORICAL_LABEL_KEYS[1]), value = _transformed_name(_CATEGORICAL_LABEL_KEYS[1]))],
                       preprocessing_function_names =["tft_layer_eval"])
    ],
    metrics_specs=[
        tfma.MetricsSpec(
            metrics=[
                tfma.MetricConfig(class_name = 'SparseCategoricalAccuracy', 
                                  threshold=tfma.MetricThreshold(
                                      value_threshold=tfma.GenericValueThreshold(lower_bound={'value': SP_ACC_LB}),
                                      change_threshold=tfma.GenericChangeThreshold(
                                          direction=tfma.MetricDirection.HIGHER_IS_BETTER,
                                          absolute={'value': SP_ABS_DELTA}))),   
                tfma.MetricConfig(class_name = 'MultiClassConfusionMatrixPlot'),
               ],
            output_names =[_transformed_name(_CATEGORICAL_LABEL_KEYS[0])]
        ),
         tfma.MetricsSpec(
            metrics=[
                tfma.MetricConfig(class_name = 'SparseCategoricalAccuracy', 
                                  threshold=tfma.MetricThreshold(
                                      value_threshold=tfma.GenericValueThreshold(lower_bound={'value': SP_ACC_LB}),
                                      change_threshold=tfma.GenericChangeThreshold(
                                          direction=tfma.MetricDirection.HIGHER_IS_BETTER,
                                          absolute={'value': SP_ABS_DELTA}))),
                tfma.MetricConfig(class_name = 'MultiClassConfusionMatrixPlot'),
            ], 
            output_names =[_transformed_name(_CATEGORICAL_LABEL_KEYS[1])]
         )
    ],
    slicing_specs=[
        tfma.SlicingSpec(),
    ])

But am getting an error below when i run the evaluator component

TypeError                                 Traceback (most recent call last)
<ipython-input-77-ce8902712e82> in <module>
     29         tfma.ModelSpec(label_keys=[tfma.ModelSpec.LabelKeysEntry(key = _transformed_name(_CATEGORICAL_LABEL_KEYS[0]), value = _transformed_name(_CATEGORICAL_LABEL_KEYS[0])),
     30                                    tfma.ModelSpec.LabelKeysEntry(key = _transformed_name(_CATEGORICAL_LABEL_KEYS[1]), value = _transformed_name(_CATEGORICAL_LABEL_KEYS[1]))],
---> 31                        preprocessing_function_names =["tft_layer_eval"])
     32     ],
     33     metrics_specs=[

/usr/lib/python3.7/_collections_abc.py in update(*args, **kwds)
    844                     self[key] = other[key]
    845             else:
--> 846                 for key, value in other:
    847                     self[key] = value
    848         for key, value in kwds.items():

TypeError: cannot unpack non-iterable LabelKeysEntry object

Most probably because the label_keys has not been set up correctly - is what i have above the right way to set it?

FYI - _CATEGORICAL_LABEL_KEYS is just a list i have for the model output names, and _transformed_name is just a function that will add '_xf' to the suffix of the output name

Thanks!

@mdreves
Copy link
Member
mdreves commented Feb 24, 2021

The label_keys are really just a dict mapping output names to label names. Proto is not always obvious and friendly to use, but you can use a dict in this case. Try the following:

val_config = tfma.EvalConfig(
    model_specs=[
        tfma.ModelSpec(label_keys={
             _transformed_name(_CATEGORICAL_LABEL_KEYS[0]): 
                    _transformed_name(_CATEGORICAL_LABEL_KEYS[0])),
             _transformed_name(_CATEGORICAL_LABEL_KEYS[1]):
                    _transformed_name(_CATEGORICAL_LABEL_KEYS[1]))
        }),
       preprocessing_function_names =["tft_layer"])
    ],
    ...

@wlee192
Copy link
Author
wlee192 commented Feb 26, 2021

Hi @mdreves ,

Gotcha! I've tried the above and it works.
Thanks for your help, will now close this ticket out.

Thanks

@wlee192 wlee192 closed this as completed Feb 26, 2021
@axeltidemann
Copy link
axeltidemann commented Sep 17, 2021

Very helpful thread, thank you. I also have a two-headed output, with one target read from the input data and the other calculated based on the provided input target.

I have noticed something peculiar: if my input target is called label, the output target* must be named label_xf. If I change the output target name to something arbitrarily, I get the following error: ValueError: unable to prepare labels and predictions because the labels and/or predictions are dicts with unrecognized keys.

What can be the reason for this?

*) This refers to the output features dictionary in the preprocessing function and the outputs dictionary to the Keras model (and by consequence, in the EvalConfig as discussed in this thread).

@mdreves
Copy link
Member
mdreves commented Sep 17, 2021

I'm a bit confused by the use of the word output here, but let's say the the model takes as raw inputs label1 and label2 and your model's outputs from inference are output1 and output2. Let's also assume your preprocessing function transforms your raw labels to transformed_label1 and transformed_label2. In this setup, the label_keys should be a map from output1 -> transformed_label1 and output2 -> transformed_label2.

The raw label names are determined by your feature inputs (e.g. tf.Example), the transformed label names are determined by your the names output by your preprocessing function (e.g. TFT, etc), and the output names are determined by your model.

@axeltidemann
Copy link
axeltidemann commented Sep 21, 2021

Thank you for your response @mdreves sorry for being a bit unclear. Yes, I totally follow your logic. Here is a more structured explanation of my issue, based on the Chicago taxi example.

I define my label names in features.py like this:

INPUT_TARGET = 'target_in_seconds' # The one read from input data
LINEAR_TARGET = 'target_in_seconds_xf' # Unchanged
CLASS_TARGET = 'target_in_categories_xf' # Processed

In preprocessing.py I do some transformations:

def preprocessing_fn(input_features): 
   output_features[features.LINEAR_TARGET] = input_features[features.INPUT_TARGET]
   [...]
   output_features[features.CLASS_TARGET] = tft.apply_buckets(input_features[features.INPUT_TARGET], boundaries)

   return output_features

In model.py I build the Keras model:

def _build_keras_model(hparams: kerastuner.HyperParameters) -> tf.keras.Model:
    [...]
    outputs = {
    features.LINEAR_TARGET:
    keras.layers.Dense(1, activation='linear', name=features.LINEAR_TARGET)(linear_tower),

    features.CLASS_TARGET:
    keras.layers.Dense(constants.N_TARGET_CLASSES, activation='softmax',
                       name=features.CLASS_TARGET)(softmax_tower)
     }
             
     model = keras.Model(inputs=inputs, outputs=outputs)
     [...]

To do evaluation in pipeline.py:

  eval_config = tfma.EvalConfig(
    model_specs=[tfma.ModelSpec(
      signature_name='serving_default',
      label_keys={features.LINEAR_TARGET: features.LINEAR_TARGET,
                  features.CLASS_TARGET: features.CLASS_TARGET},
      preprocessing_function_names=['tft_layer'])],

    slicing_specs=[tfma.SlicingSpec()],                                                                                                                                                                         
    metrics_specs=[
      tfma.MetricsSpec(
        output_names=[features.LINEAR_TARGET],                                                                                                                                                                                  
        per_slice_thresholds={
        'mean_absolute_percentage_error':                                                                                                                                                                                         
        tfma.config.PerSliceMetricThresholds(thresholds=[
          tfma.PerSliceMetricThreshold(
            slicing_specs=[tfma.SlicingSpec()],                                                                                                                                      
            threshold=tfma.MetricThreshold(
              value_threshold=tfma.GenericValueThreshold(
                lower_bound={'value': 0.0}),                                                                                                                                             
              change_threshold=tfma.GenericChangeThreshold(
                direction=tfma.MetricDirection.LOWER_IS_BETTER,                                                                                                                                                    
                absolute={'value': 0.01}))
          )])
      })])

This works, everything is fine. I can see that the output signature has the corresponding names:

   "outputs": {
    "target_in_categories_xf": {
     "dtype": "DT_FLOAT",
     "tensor_shape": {
      "dim": [
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "10",
        "name": ""
       }
      ],
      "unknown_rank": false
     },
     "name": "StatefulPartitionedCall_13:0"
    },
    "target_in_seconds_xf": {
     "dtype": "DT_FLOAT",
     "tensor_shape": {
      "dim": [
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "1",
        "name": ""
       }
      ],
      "unknown_rank": false
     },
     "name": "StatefulPartitionedCall_13:1"
    }
   },

What baffles me, is if I try to change the label (and output) names to something else in features.py, for instance:

LINEAR_TARGET = 'seconds'
CLASS_TARGET = 'categories'

I get the following error:

ValueError: unable to prepare labels and predictions because the labels and/or predictions are dicts with unrecognized keys. If a multi-output keras model (or estimator) was used check that an output_name was provided. If an estimator was used check that common prediction keys were provided (e.g. logistic, probabilities, etc): labels={'categories': None, 'seconds': None}, predictions=[0.10932574 0.09787992 0.10235582 0.07289839 0.08729261 0.09646159
 0.11951147 0.10658901 0.12390085 0.08378454], prediction_key= [while running 'ExtractEvaluateAndWriteResults/ExtractAndEvaluate/EvaluateMetricsAndPlots/ComputeMetricsAndPlots()/CombineMetricsPerSlice/WindowIntoDiscarding']

I feel like I might be missing something very obvious here. Any input would be greatly appreciated.

@mdreves
Copy link
Member
mdreves commented Sep 21, 2021

I don't see the signature definitions, but just to clarify it seems you are using the same names for the both the inputs to the model (i.e. the output of TFT) and the outputs to the model. I don't see an issue, but just want to confirm to avoid confusion.

I would be useful to get the signature for the tft_layer or get an example of its outputs by passing an input example to model.tft_layer(...).

@axeltidemann
Copy link
axeltidemann commented Sep 22, 2021

Yes, you are absolutely right on your first point. I figured out what caused the error. I had changed the name of my target variables during development of the pipeline, so when the Evaluator pulled in a previously blessed model, it was that model that did not have the corresponding output keys and caused the error. Thank you for your suggestions, which led me down this path and figured it out.

(FYI: I wanted to specify output names of the model, so it will be more legible for downstream components, instead of output_0, output_1 etc.)

@gcarr1020
Copy link

Hello @axeltidemann @wlee192 @mdreves , I am having a similar issue but with images.

My output names from my model are below.
['anatomy_xf', 'pathology_xf']

I also check that they are the same as above in the Evaluator by using model.summary() in the source code used by tfx.components.Evaluator.

The signature for my tft_layer is defined below

def _get_serve_tf_examples_fn(model, tf_transform_output):
  model.tft_layer = tf_transform_output.transform_features_layer()


  @tf.function
  def serve_tf_examples_fn(serialized_tf_examples):
    feature_spec = tf_transform_output.raw_feature_spec()

    keys = list(feature_spec.keys())

    for key in keys:
        if key != Config.IMAGE_KEY:
            feature_spec.pop(key)

    parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)
    transformed_features = model.tft_layer(parsed_features)

    return model(transformed_features)

  return serve_tf_examples_fn

Following the directions in this thread I have created the evaluation configuration below:

def get_eval_config():
    anatomy = _transformed_name(Config.ANATOMY_KEY)
    pathology = _transformed_name(Config.PATHOLOGY_KEY)

    config = tfma.EvalConfig(
        model_specs=[
            tfma.ModelSpec(
            label_keys={'output_0': anatomy, 'output_1': pathology}
            )
        ],
        metrics_specs=[
            tfma.MetricsSpec(
                metrics=[
                    tfma.MetricConfig(class_name = 'SparseCategoricalAccuracy'),   
                ],
                output_names =[anatomy]
            ),
            tfma.MetricsSpec(
                metrics=[
                    tfma.MetricConfig(class_name = 'BinaryAccuracy'),
                ], 
                output_names =[pathology]
            )
        ],
        slicing_specs=[
            tfma.SlicingSpec(),
        ])
    return config

Then I use the evaluation configuration with the following call in my TFX pipeline:

    evaluator = Evaluator(
        examples=example_gen.outputs['examples'],
        model=trainer.outputs['model'],
        baseline_model=resolver.outputs['model'],
        eval_config=eval_config)

Resulting in this error:

  File "/home/gage/.local/share/virtualenvs/x-ray-classification-tfx-KhxklEAF/lib/python3.8/site-packages/tensorflow_model_analysis/evaluators/metrics_plots_and_validations_evaluator.py", line 358, in add_input
    result = c.add_input(a, get_combiner_input(element, i))
  File "/home/gage/.local/share/virtualenvs/x-ray-classification-tfx-KhxklEAF/lib/python3.8/site-packages/tensorflow_model_analysis/evaluators/keras_util.py", line 232, in add_input
    accumulator = self._add_input(accumulator, element)
  File "/home/gage/.local/share/virtualenvs/x-ray-classification-tfx-KhxklEAF/lib/python3.8/site-packages/tensorflow_model_analysis/evaluators/keras_util.py", line 325, in _add_input
    labels, predictions, example_weights = next(
  File "/home/gage/.local/share/virtualenvs/x-ray-classification-tfx-KhxklEAF/lib/python3.8/site-packages/tensorflow_model_analysis/metrics/metric_util.py", line 422, in to_label_prediction_example_weight
    prediction = util.get_by_keys(prediction, [output_name])
  File "/home/gage/.local/share/virtualenvs/x-ray-classification-tfx-KhxklEAF/lib/python3.8/site-packages/tensorflow_model_analysis/util.py", line 152, in get_by_keys
    raise ValueError('"%s" key not found (or value is empty dict): %s' %
ValueError: "anatomy_xf" key not found (or value is empty dict): {'output_0': array([0.06304925, 0.20808356, 0.21717651, 0.1627071 , 0.16071999,
       0.08475181, 0.10351174], dtype=float32), 'output_1': array([0.5306584], dtype=float32)} [while running 'ExtractEvaluateAndWriteResults/ExtractAndEvaluate/EvaluateMetricsAndPlots/ComputeMetricsAndPlots()/CombineMetricsPerSlice/WindowIntoDiscarding']

So it looks like I am getting a prediction, but the labels are not able to map to them. It was my understanding that label_keys would take care of this, but it does not seem to. I have tried numerous different things but I always end up with the same error.

Things I have tried:

  • Switching the order of the label_keys mapping
  • Deleted all pipeline artifacts, caches, and folders
  • Playing around with prediction_keys
  • Setting label_keys={output_0: output_0, output_1: output_1}
  • Setting label_keys={anatomy: anatomy, pathology: pathology}
  • Setting out_names to output_0 and output_1 respectively

My feeling is that there is something wrong with my signature creation but I am not sure how to check the output of model.tft_layer. Any ideas or pointers will be greatly appreciated!

@mdreves
Copy link
Member
mdreves commented Sep 22, 2021

I'm not sure about this part of your code:

    for key in keys:
        if key != Config.IMAGE_KEY:
            feature_spec.pop(key)

This would remove all the features except the IMAGE_KEY, are you sure it's not the opposite you want?

@gcarr1020
Copy link

@mdreves It was my impression that in you remove all features/labels that are not input into the model. In my case my only model input is an image aka IMAGE_KEY and I remove my two labels from the feature_spec (ANATOMY_KEY and PATHOLOGY_KEY) because they are not input into the model. Is that mistaken?

In testing removing 'IMAGE_KEY` causes

ValueError: Missing data for input "image_xf". You passed a data dictionary with keys ['anatomy_xf', 'pathology_xf']. Expected the following keys: ['image_xf']

Which makes sense because it expects an image to be passed to it. I also tried keeping all three, but it resulted in the same error as my previous post.

@mdreves
Copy link
Member
mdreves commented Sep 22, 2021

If you remove the labels from the output then TFMA will not have access to them. If you add back just the labels in addition to the model inputs then you should be fine.

Typically the way you want to set this up is to have two different signatures (one for serving) and one for the transformed outputs that you want to use with TFMA (e.g. transformed labels and any transformed features for slicing). There is an example here:

https://github.com/tensorflow/tfx/blob/f4d804dbe05741a8df66fd251b87658450703001/tfx/examples/penguin/penguin_utils_base.py#L94

You can then setup you EvalConfig to specify the signature to use for inference (signature_name) and a separate one for getting access to the transformed labels/features (preprocessing_function_names).

eval_config = tfma.EvalConfig(
    model_specs=[     
        tfma.ModelSpec(
             signature_name='serving_default', 
             label_key='my_label_key',
             preprocessing_function_names=['my_transform_sig'])
    ],
    ...

@gcarr1020
Copy link

@mdreves Thank you so much!

For anyone looking at this later these are my updated signatures:

def make_serving_signatures(model, tf_transform_output: tft.TFTransformOutput):
  model.tft_layer = tf_transform_output.transform_features_layer()

  @tf.function(input_signature=[
      tf.TensorSpec(shape=[None], dtype=tf.string, name='examples')
  ])
  def serve_tf_examples_fn(serialized_tf_example):
    """Returns the output to be used in the serving signature."""
    raw_feature_spec = tf_transform_output.raw_feature_spec()
    # Remove label feature since these will not be present at serving time.

    keys = list(raw_feature_spec.keys())

    for key in keys:
        if key != Config.IMAGE_KEY:
            raw_feature_spec.pop(key)

    raw_features = tf.io.parse_example(serialized_tf_example, raw_feature_spec)
    transformed_features = model.tft_layer(raw_features)

    outputs = model(transformed_features)
    anatomy = _transformed_name(Config.ANATOMY_KEY)
    pathology = _transformed_name(Config.PATHOLOGY_KEY)
    mapped = {anatomy: outputs[0], pathology: outputs[1]}
    return mapped

  @tf.function(input_signature=[
      tf.TensorSpec(shape=[None], dtype=tf.string, name='examples')
  ])
  def transform_features_fn(serialized_tf_example):
    """Returns the transformed_features to be fed as input to evaluator."""
    raw_feature_spec = tf_transform_output.raw_feature_spec()
    raw_features = tf.io.parse_example(serialized_tf_example, raw_feature_spec)
    transformed_features = model.tft_layer(raw_features)
    return transformed_features

  return {
      'serving_default': serve_tf_examples_fn,
      'transform_features': transform_features_fn
  }

And this is my updated evaluation configuration

def get_eval_config():
    anatomy = _transformed_name(Config.ANATOMY_KEY)
    pathology = _transformed_name(Config.PATHOLOGY_KEY)

    config = tfma.EvalConfig(
        model_specs=[
            tfma.ModelSpec(
                signature_name='serving_default',
                label_keys={anatomy: anatomy, pathology: pathology},
                preprocessing_function_names=['transform_features']
            )
        ],
        metrics_specs=[
            tfma.MetricsSpec(
                metrics=[
                    tfma.MetricConfig(class_name = 'SparseCategoricalAccuracy'),   
                ],
                output_names =[anatomy]
            ),
            tfma.MetricsSpec(
                metrics=[
                    tfma.MetricConfig(class_name = 'BinaryAccuracy'),
                ], 
                output_names =[pathology]
            )
        ],
        slicing_specs=[
            tfma.SlicingSpec(),
        ])
    return config

@hanj99
Copy link
hanj99 commented Oct 20, 2021

Thanks @gcarr1020. Your example clarified multi-output case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants