[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors when applying tft.scale_to_z_score_per_key #220

Closed
cyc opened this issue Feb 10, 2021 · 7 comments
Closed

Errors when applying tft.scale_to_z_score_per_key #220

cyc opened this issue Feb 10, 2021 · 7 comments

Comments

@cyc
Copy link
cyc commented Feb 10, 2021

(This is on TFT 0.23.0, I have not tried it yet with the latest version)

I am running into an issue with applying tft.scale_to_z_score_per_key where both x and key are SparseTensors. Analysis seems to proceed fine, but during transformation I get the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[553] = -1 is not in [0, 417)
	 [[node transform/transform/scale_to_z_score_per_key_25/GatherV2

I traced this to a specific tf.gather op in tf_utils.map_per_key_reductions and I noticed that if I supply a key_vocabulary_filename to tft.scale_to_z_score_per_key it does not invoke tf_utils.map_per_key_reductions. It instead calls tf_utils.apply_per_key_vocabulary. However, this also gives an error:

  File "tensorflow_transform/tf_utils.py", line 470, in apply_per_key_vocabulary
    sparse_result = tf.compat.v1.strings.split(table.lookup(key), sep=',')
  File "tensorflow/python/ops/ragged/ragged_string_ops.py", line 630, in strings_split_v1
    input, dtype=dtypes.string, name="input")
  File "tensorflow/python/ops/ragged/ragged_tensor.py", line 2433, in convert_to_tensor_or_ragged_tensor
    value=value, dtype=dtype, preferred_dtype=preferred_dtype, name=name)
  File "tensorflow/python/framework/ops.py", line 1341, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "tensorflow/python/framework/constant_op.py", line 321, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "tensorflow/python/framework/constant_op.py", line 262, in constant
    allow_broadcast=True)
  File "tensorflow/python/framework/constant_op.py", line 300, in _constant_impl
    allow_broadcast=allow_broadcast))
  File "tensorflow/python/framework/tensor_util.py", line 451, in make_tensor_proto
    _AssertCompatible(values, dtype)
  File "tensorflow/python/framework/tensor_util.py", line 331, in _AssertCompatible
    (dtype.name, repr(mismatch), type(mismatch).__name__))
TypeError: Expected string, got <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x153d77cd0> of type 'SparseTensor' instead.

I don't have a minimal dataset that reproduces this error. It only seems to happen when I try to process a relatively large amount of data.

@zoyahav
Copy link
Member
zoyahav commented Feb 10, 2021

Note that tft.scale_to_z_score_per_key doesn't support out-of-vocabulary handling, is it possible that the dataset that's passed to TransformDataset has keys that were not present in the analysis dataset?
(This is specifically for when key_vocabulary_filename isn't provided)

@cyc
Copy link
Author
cyc commented Feb 10, 2021

Ah, good point. That could explain the first error that I observed. Is there some way to make tft.scale_to_z_score_per_key robust to OOV keys? I wouldn't even mind it just failing silently and returning 0 if the key is OOV.

@cyc
Copy link
Author
cyc commented Feb 10, 2021

Updated the description with more of the stack trace for the key_vocabulary_filename case

@zoyahav
Copy link
Member
zoyahav commented Feb 10, 2021

The issue with supporting this has been coming up with a behaviour that isn't silently surprising.
Perhaps we could let users pass a parameter such as allow_missing_keys which can have the same default as empty analysis datasets.

@zoyahav zoyahav self-assigned this Feb 10, 2021
@cyc
Copy link
Author
cyc commented Feb 10, 2021

To unblock myself, if I fork tft.scale_to_z_score_per_key and change the behavior of tf_utils.apply_per_key_vocabulary to return some default value of mean and variance for OOV keys that should be fine, correct?

@zoyahav
Copy link
Member
zoyahav commented Feb 11, 2021

Yes, that should work (make sure the default value is 0).
I'm also working on a change currently, will update here when it's available on a TFT nightly release.

@zoyahav
Copy link
Member
zoyahav commented Feb 12, 2021

This issue should be fixed now at head with 42916c2.
And on nightly as well:
https://pypi-nightly.tensorflow.org/#/package/tensorflow-transform
Please let us know if you run into any more issues with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants