Errors when applying tft.scale_to_z_score_per_key #220

cyc · 2021-02-10T10:41:09Z

(This is on TFT 0.23.0, I have not tried it yet with the latest version)

I am running into an issue with applying tft.scale_to_z_score_per_key where both x and key are SparseTensors. Analysis seems to proceed fine, but during transformation I get the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[553] = -1 is not in [0, 417)
	 [[node transform/transform/scale_to_z_score_per_key_25/GatherV2

I traced this to a specific tf.gather op in tf_utils.map_per_key_reductions and I noticed that if I supply a key_vocabulary_filename to tft.scale_to_z_score_per_key it does not invoke tf_utils.map_per_key_reductions. It instead calls tf_utils.apply_per_key_vocabulary. However, this also gives an error:

  File "tensorflow_transform/tf_utils.py", line 470, in apply_per_key_vocabulary
    sparse_result = tf.compat.v1.strings.split(table.lookup(key), sep=',')
  File "tensorflow/python/ops/ragged/ragged_string_ops.py", line 630, in strings_split_v1
    input, dtype=dtypes.string, name="input")
  File "tensorflow/python/ops/ragged/ragged_tensor.py", line 2433, in convert_to_tensor_or_ragged_tensor
    value=value, dtype=dtype, preferred_dtype=preferred_dtype, name=name)
  File "tensorflow/python/framework/ops.py", line 1341, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "tensorflow/python/framework/constant_op.py", line 321, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "tensorflow/python/framework/constant_op.py", line 262, in constant
    allow_broadcast=True)
  File "tensorflow/python/framework/constant_op.py", line 300, in _constant_impl
    allow_broadcast=allow_broadcast))
  File "tensorflow/python/framework/tensor_util.py", line 451, in make_tensor_proto
    _AssertCompatible(values, dtype)
  File "tensorflow/python/framework/tensor_util.py", line 331, in _AssertCompatible
    (dtype.name, repr(mismatch), type(mismatch).__name__))
TypeError: Expected string, got <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x153d77cd0> of type 'SparseTensor' instead.

I don't have a minimal dataset that reproduces this error. It only seems to happen when I try to process a relatively large amount of data.

The text was updated successfully, but these errors were encountered:

zoyahav · 2021-02-10T11:34:07Z

Note that tft.scale_to_z_score_per_key doesn't support out-of-vocabulary handling, is it possible that the dataset that's passed to TransformDataset has keys that were not present in the analysis dataset?
(This is specifically for when key_vocabulary_filename isn't provided)

cyc · 2021-02-10T11:45:23Z

Ah, good point. That could explain the first error that I observed. Is there some way to make tft.scale_to_z_score_per_key robust to OOV keys? I wouldn't even mind it just failing silently and returning 0 if the key is OOV.

cyc · 2021-02-10T11:59:14Z

Updated the description with more of the stack trace for the key_vocabulary_filename case

zoyahav · 2021-02-10T12:27:24Z

The issue with supporting this has been coming up with a behaviour that isn't silently surprising.
Perhaps we could let users pass a parameter such as allow_missing_keys which can have the same default as empty analysis datasets.

cyc · 2021-02-10T17:01:11Z

To unblock myself, if I fork tft.scale_to_z_score_per_key and change the behavior of tf_utils.apply_per_key_vocabulary to return some default value of mean and variance for OOV keys that should be fine, correct?

zoyahav · 2021-02-11T13:21:16Z

Yes, that should work (make sure the default value is 0).
I'm also working on a change currently, will update here when it's available on a TFT nightly release.

zoyahav · 2021-02-12T10:01:46Z

This issue should be fixed now at head with 42916c2.
And on nightly as well:
https://pypi-nightly.tensorflow.org/#/package/tensorflow-transform
Please let us know if you run into any more issues with it.

zoyahav self-assigned this Feb 10, 2021

arghyaganguly added type:bug stat:awaiting tensorflower type:feature and removed type:bug labels Feb 11, 2021

zoyahav closed this as completed Feb 12, 2021

cyc mentioned this issue Oct 26, 2021

scale_to_z_score_per_key should give caller control over OOV behavior #252

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors when applying tft.scale_to_z_score_per_key #220

Errors when applying tft.scale_to_z_score_per_key #220

Errors when applying tft.scale_to_z_score_per_key #220

Errors when applying tft.scale_to_z_score_per_key #220

Comments