You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have not been able to reproduce this on a small test case locally, and the case where I have experienced this error involves a proprietary dataset that I can't share. But the issue is essentially this: I have two tfrecord shards of some dataset, one where a feature is always present and another where a feature is always missing (as in, the feature name is not even populated in the tf.Example). If I run GenerateStatistics on these shards while running on google cloud dataflow, the resulting stats file claims that there are 0 missing entries for that feature.
For instance, for this feature in question, the stats file in json format shows:
I have not been able to reproduce this on a small test case locally, and the case where I have experienced this error involves a proprietary dataset that I can't share. But the issue is essentially this: I have two tfrecord shards of some dataset, one where a feature is always present and another where a feature is always missing (as in, the feature name is not even populated in the
tf.Example
). If I runGenerateStatistics
on these shards while running on google cloud dataflow, the resulting stats file claims that there are 0 missing entries for that feature.For instance, for this feature in question, the stats file in json format shows:
Whereas a fully present feature looks like this:
Please let me know if this is expected behavior.
The text was updated successfully, but these errors were encountered: