You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the tft_beam.AnalyzeAndTransformDataset really uses all workers, generates multiple output files, which makes processing quite fast (gist: analyze_and_transform()) .
The tft_beam.TransformDataset however uses only one worker and produces only one output file (gist: transform_only()). This makes almost impossible to process test and validation dastasets within a reasonable amount of time.
The text was updated successfully, but these errors were encountered:
wsuchy
changed the title
TransformDataset doesn't process the data in paralell (uses only single core)
TransformDataset doesn't process the data in paralell (uses only single worker)
Nov 1, 2019
When using multiple input files and
FnApiRunner / SUBPROCESS_SDK
runner:the
tft_beam.AnalyzeAndTransformDataset
really uses all workers, generates multiple output files, which makes processing quite fast (gist:analyze_and_transform()
) .The
tft_beam.TransformDataset
however uses only one worker and produces only one output file (gist:transform_only()
). This makes almost impossible to process test and validation dastasets within a reasonable amount of time.Is there a problem with my code or is it a bug?
GIST: https://gist.github.com/wsuchy/0c89b27a72b457ae6c904d8786658d2e
Dataset comes from https://www.kaggle.com/generall/oneshotwikilinks and has been processed using
prepare_dataset
functionThe text was updated successfully, but these errors were encountered: