Exposes num_parallel_reads and num_parallel_calls #1232

i-ony · 2020-12-16T23:14:17Z

-Exposes num_parallel_reads and num_parallel_calls in AvroRecordDataset and make_avro_record_dataset
-Adds parameter constraints
-Fixes lint issues

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues

google-cla · 2020-12-16T23:15:22Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

terrytangyuan · 2020-12-16T23:18:13Z

tensorflow_io/core/python/experimental/avro_record_dataset_ops.py


+=======


Conflicts here

Fixed, please review.

terrytangyuan · 2020-12-16T23:18:40Z

tensorflow_io/core/python/experimental/avro_record_dataset_ops.py

+def _require(condition: bool, err_msg: Optional[str] = None) -> None:
+    """Checks if the specified condition is true else raises exception
+
+    :param condition: The condition to test


Use consistent docstring style

Fixed, please review.

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues

google-cla · 2020-12-17T01:25:07Z

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

ashahab · 2020-12-18T19:27:17Z

@googlebot I consent

yongtang · 2020-12-18T19:35:15Z

tensorflow_io/core/python/experimental/make_avro_record_dataset.py

@@ -79,14 +78,26 @@ def make_avro_record_dataset(
      prefetch_buffer_size: (Optional.) An int specifying the number of
        feature batches to prefetch for performance improvement.
        Defaults to auto-tune. Set to 0 to disable prefetching.
+<<<<<<< HEAD
+<<<<<<< HEAD


@ashahab Can you resolve the merge conflict here?

@yongtang absolutely, sorry about that. @StanfordMCP

@yongtang Fixed the issue, please take a look.

yongtang · 2020-12-18T19:35:25Z

tensorflow_io/core/python/experimental/make_avro_record_dataset.py

+        records from files. By default or if set to a value >1, the
+        results will be interleaved.
+      num_parallel_reads: (Optional.) Number of parallel
+>>>>>>> f7032e3... Exposed num_parallel_reads as well as num_parallel_calls


yongtang · 2020-12-18T19:35:33Z

tensorflow_io/core/python/experimental/make_avro_record_dataset.py

+=======
+      num_parallel_reads: (Optional.) Number of parallel
+        records to parse in parallel. Defaults to None(no parallelization).
+>>>>>>> d41d946... Added parameter constraints


-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues

kvignesh1420 · 2020-12-21T12:00:04Z

tensorflow_io/core/python/experimental/avro_record_dataset_ops.py

@@ -16,11 +16,30 @@

 import tensorflow as tf
 from tensorflow_io.core.python.ops import core_ops
+from typing import Optional


Since we are not using any type checker's (like mypy) as of now. I feel this style is a bit out of place when compared with other modules in the codebase.

@kvignesh1420 Thanks for the comment. Updated, please review.

-

kvignesh1420

@StanfordMCP if you can add some test cases around this functionality, that would be great. Please check the existing tests in:

/io/tests/test_avro_eager.py

kvignesh1420 · 2020-12-21T18:48:09Z

tensorflow_io/core/python/experimental/avro_record_dataset_ops.py

+        We set this equal to `block_length`, so that each time n number of records are returned for each of the n
+        files.
+        num_parallel_calls: Number of threads spawned by the interleave call.
+        deterministic: Sets whether the interleaved records are written in deterministic order. in tf.interleave thi sis default true


in tf.interleave thi sis default true please check typo.

-This update adds a test to check if ValueErrors are raised when given an invalid input for num_parallel_calls

yongtang · 2021-01-06T19:15:51Z

/cc @kvignesh1420 @terrytangyuan to take a look.

burgerkingeater · 2021-01-06T19:22:56Z

tests/test_parse_avro_eager.py

@@ -30,8 +30,8 @@
 from avro.schema import Parse as parse
 import tensorflow_io as tfio

-if sys.platform == "darwin":
-    pytest.skip("TODO: skip macOS", allow_module_level=True)
+# if sys.platform == "darwin":


please uncomment this

kvignesh1420

@StanfordMCP overall LGTM. Can you please make the change as per @burgerkingeater 's comment: https://github.com/tensorflow/io/pull/1232/files#r552916186. The tests are currently unstable on macOS env.

kvignesh1420

LGTM. Thanks, @StanfordMCP.

kvignesh1420

Please run the following to fix lint issues:

bazel run //tools/lint:lint

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues - Adds test method for _require() function -This update adds a test to check if ValueErrors are raised when given an invalid input for num_parallel_calls Co-authored-by: Abin Shahab <ashahab@linkedin.com>

* Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues * Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues * Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues * Fixes Lint Issues * Removes Optional typing for method parameter - * Adds test method for _require() function -This update adds a test to check if ValueErrors are raised when given an invalid input for num_parallel_calls * Uncomments skip for macOS pytests * Fixes Lint issues Co-authored-by: Abin Shahab <ashahab@linkedin.com>

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues - Adds test method for _require() function -This update adds a test to check if ValueErrors are raised when given an invalid input for num_parallel_calls Co-authored-by: Abin Shahab <ashahab@linkedin.com>

* Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues * Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues * Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues * Fixes Lint Issues * Removes Optional typing for method parameter - * Adds test method for _require() function -This update adds a test to check if ValueErrors are raised when given an invalid input for num_parallel_calls * Uncomments skip for macOS pytests * Fixes Lint issues Co-authored-by: Abin Shahab <ashahab@linkedin.com>

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues - Adds test method for _require() function -This update adds a test to check if ValueErrors are raised when given an invalid input for num_parallel_calls Co-authored-by: Abin Shahab <ashahab@linkedin.com>

…he parsing time (#1283) * Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues -Adds test method for _require() function -This update adds a test to check if ValueErrors are raised when given an invalid input for num_parallel_calls * Bump Apache Arrow to 2.0.0 (#1231) * Bump Apache Arrow to 2.0.0 Also bumps Apache Thrift to 0.13.0 Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update code to match Arrow Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Bump pyarrow to 2.0.0 Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Stay with version=1 for write_feather to pass tests Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Bump flatbuffers to 1.12.0 Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix Windows issue Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix tests Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix Windows Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Remove -std=c++11 and leave default -std=c++14 for arrow build Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update sha256 of libapr1 As the hash changed by the repo. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add emulator for gcs (#1234) * Bump com_github_googleapis_google_cloud_cpp to `1.21.0` * Add gcs testbench * Bump `libcurl` to `7.69.1` * Remove the CI build for CentOS 8 (#1237) Building shared libraries on CentOS 8 is pretty much the same as on Ubuntu 20.04 except `apt` should be changed to `yum`. For that our CentOS 8 CI test is not adding a lot of value. Furthermore with the upcoming CentOS 8 change: https://www.phoronix.com/scan.php?page=news_item&px=CentOS-8-Ending-For-Stream CentOS 8 is effectively EOLed at 2021. For that we may want to drop the CentOS 8 build (only leave a comment in README.md) Note we keep CentOS 7 build for now as there are still many users using CentOS 7 and CentOS 7 will only be EOLed at 2024. We might drop CentOS 7 build in the future as well if there is similiar changes to CentOS 7 like CentOS 8. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * add tf-c-header rule (#1244) * Skip tf-nightly:tensorflow-io==0.17.0 on API compatibility test (#1247) Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * [s3] add support for testing on macOS (#1253) * [s3] add support for testing on macOS * modify docker-compose cmd * add notebook formatting instruction in README (#1256) * [docs] Restructure README.md content (#1257) * Refactor README.md content * bump to run ci jobs * Update libtiff/libgeotiff dependency (#1258) This PR updates libtiff/libgeotiff to the latest version. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * remove unstable elasticsearch test setup on macOS (#1263) * Exposes num_parallel_reads and num_parallel_calls (#1232) -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues - Adds test method for _require() function -This update adds a test to check if ValueErrors are raised when given an invalid input for num_parallel_calls Co-authored-by: Abin Shahab <ashahab@linkedin.com> * Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches. This is recommended to be set equal to the vcore request. * Exposes num_parallel_reads and num_parallel_calls (#1232) * Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues * Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues * Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues * Fixes Lint Issues * Removes Optional typing for method parameter - * Adds test method for _require() function -This update adds a test to check if ValueErrors are raised when given an invalid input for num_parallel_calls * Uncomments skip for macOS pytests * Fixes Lint issues Co-authored-by: Abin Shahab <ashahab@linkedin.com> * add avro tutorial testing data (#1267) Co-authored-by: Cheng Ren <1428327+chengren311@users.noreply.github.com> * Update Kafka tutorial to work with Apache Kafka (#1266) * Update Kafka tutorial to work with Apache Kafka Minor update to the Kafka tutorial to remove the dependency on Confluent's distribution of Kafka, and instead work with vanilla Apache Kafka. Signed-off-by: Dale Lane <dale.lane@uk.ibm.com> * Address review comments Remove redundant pip install commands Signed-off-by: Dale Lane <dale.lane@gmail.com> * add github workflow for performance benchmarking (#1269) * add github workflow for performance benchmarking * add github-action-benchmark step * handle missing dependencies while benchmarking (#1271) * handle missing dependencies while benchmarking * setup test_sql * job name change * set auto-push to true * remove auto-push * add personal access token * use alternate method to push to gh-pages * add name to the action * use different id * modify creds * use github_token * change repo name * set auto-push * set origin and push results * set env * use PERSONAL_GITHUB_TOKEN * use push changes action * use github.head_ref to push the changes * try using fetch-depth * modify branch name * use alternative push approach * git switch - * test by merging with forked master * Disable s3 macOS for now as docker is not working on GitHub Actions for macOS (#1277) * Revert "[s3] add support for testing on macOS (#1253)" This reverts commit 81789bd. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * rename testing data files (#1278) * Add tutorial for avro dataset API (#1250) * remove docker based mongodb tests in macos (#1279) * trigger benchmarks workflow only on commits (#1282) * Bump Apache Arrow to 3.0.0 (#1285) Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add bazel cache (#1287) Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add initial bigtable stub test (#1286) * Add initial bigtable stub test Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix kokoro test Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add reference to github-pages benchmarks in README (#1289) * add reference to github-pages benchmarks * minor grammar change * Update README.md Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> * Clear outputs (#1292) * fix kafka online-learning section in tutorial notebook (#1274) * kafka notebook fix for colab env * change timeout from 30 to 20 seconds * reduce stream_timeout * Only enable bazel caching writes for tensorflow/io github actions (#1293) This PR updates so that only GitHub actions run on tensorflow/io repo will be enabled with bazel cache writes. Without the updates, a focked repo actions will cause error. Note once bazel cache read-permissions are enabled from gcs forked repo will be able to access bazel cache (read-only). Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Enable ready-only bazel cache (#1294) This PR enables read-only bazel cache Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Rename tests (#1297) * Combine Ubuntu 20.04 and CentOS 7 tests into one GitHub jobs (#1299) When GitHub Actions runs it looks like there is an implicit concurrent jobs limit. As such the CentOS 7 test normally is scheduled later after other jobs completes. However, many times CentOS 7 test hangs (e.g., https://github.com/tensorflow/io/runs/1825943449). This is likely due to the CentOS 7 test is on the GitHub Actions queue for too long. This PR moves CentOS 7 to run after Ubuntu 20.04 test complete, to try to avoid hangs. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update names of api tests (#1300) We renamed the tests to remove "_eager" parts. This PR updates the api test for correct filenames Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix wrong benchmark tests names (#1301) Fixes wrong benchmark tests names caused by last commit Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Patch arrow to temporarily resolve the ARROW-11518 issue (#1304) This PR patchs arrow to temporarily resolve the ARROW-11518 issue. See 1281 for details Credit to diggerk. We will update arrow after the upstream PR is merged. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Remove AWS headers from tensorflow, and use headers from third_party … (#1241) * Remove external headers from tensorflow, and use third_party headers instead This PR removes external headers from tensorflow, and use third_party headers instead. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Address review comment Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Switch to use github to download libgeotiff (#1307) Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add @com_google_absl//absl/strings:cord (#1308) Fix read/STDIN_FILENO Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Switch to modular file system for hdfs (#1309) * Switch to modular file system for hdfs This PR is part of the effort to switch to modular file system for hdfs. When TF_ENABLE_LEGACY_FILESYSTEM=1 is provided, old behavior will be preserved. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Build against tf-nightly Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update tests Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Adjust the if else logic, follow review comment Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Disable test_write_kafka test for now. (#1310) With tensorflow upgrade to tf-nightly, the test_write_kafka test is failing and that is block the plan to modular file system migration. This PR disables the test temporarily so that CI can continue to push tensorflow-io-nightly image (needed for modular file system migration) Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Switch to modular file system for s3 (#1312) This PR is part of the effort to switch to modular file system for s3. When TF_ENABLE_LEGACY_FILESYSTEM=1 is provided, old behavior will be preserved. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add python 3.9 on Windows (#1316) * Updates the PR to use attribute instead of Env Variable -Originally AVRO_PARSER_NUM_MINIBATCH was set as an environmental variable. Because tensorflow-io rarely uses env vars to fine tune kernal ops this was changed to an attribute. See comment here: #1283 (comment) * Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches. This is recommended to be set equal to the vcore request. * Updates the PR to use attribute instead of Env Variable -Originally AVRO_PARSER_NUM_MINIBATCH was set as an environmental variable. Because tensorflow-io rarely uses env vars to fine tune kernal ops this was changed to an attribute. See comment here: #1283 (comment) * Adds addtional comments in source code for understandability Co-authored-by: Abin Shahab <ashahab@linkedin.com> Co-authored-by: Yong Tang <yong.tang.github@outlook.com> Co-authored-by: Vo Van Nghia <vovannghia2409@gmail.com> Co-authored-by: Vignesh Kothapalli <vikoth18@in.ibm.com> Co-authored-by: Cheng Ren <chren@linkedin.com> Co-authored-by: Cheng Ren <1428327+chengren311@users.noreply.github.com> Co-authored-by: Dale Lane <dale.lane@gmail.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Mark Daoust <markdaoust@google.com>

* Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues * Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues * Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues * Fixes Lint Issues * Removes Optional typing for method parameter - * Adds test method for _require() function -This update adds a test to check if ValueErrors are raised when given an invalid input for num_parallel_calls * Uncomments skip for macOS pytests * Fixes Lint issues Co-authored-by: Abin Shahab <ashahab@linkedin.com>

…he parsing time (tensorflow#1283) * Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues -Adds test method for _require() function -This update adds a test to check if ValueErrors are raised when given an invalid input for num_parallel_calls * Bump Apache Arrow to 2.0.0 (tensorflow#1231) * Bump Apache Arrow to 2.0.0 Also bumps Apache Thrift to 0.13.0 Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update code to match Arrow Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Bump pyarrow to 2.0.0 Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Stay with version=1 for write_feather to pass tests Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Bump flatbuffers to 1.12.0 Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix Windows issue Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix tests Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix Windows Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Remove -std=c++11 and leave default -std=c++14 for arrow build Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update sha256 of libapr1 As the hash changed by the repo. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add emulator for gcs (tensorflow#1234) * Bump com_github_googleapis_google_cloud_cpp to `1.21.0` * Add gcs testbench * Bump `libcurl` to `7.69.1` * Remove the CI build for CentOS 8 (tensorflow#1237) Building shared libraries on CentOS 8 is pretty much the same as on Ubuntu 20.04 except `apt` should be changed to `yum`. For that our CentOS 8 CI test is not adding a lot of value. Furthermore with the upcoming CentOS 8 change: https://www.phoronix.com/scan.php?page=news_item&px=CentOS-8-Ending-For-Stream CentOS 8 is effectively EOLed at 2021. For that we may want to drop the CentOS 8 build (only leave a comment in README.md) Note we keep CentOS 7 build for now as there are still many users using CentOS 7 and CentOS 7 will only be EOLed at 2024. We might drop CentOS 7 build in the future as well if there is similiar changes to CentOS 7 like CentOS 8. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * add tf-c-header rule (tensorflow#1244) * Skip tf-nightly:tensorflow-io==0.17.0 on API compatibility test (tensorflow#1247) Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * [s3] add support for testing on macOS (tensorflow#1253) * [s3] add support for testing on macOS * modify docker-compose cmd * add notebook formatting instruction in README (tensorflow#1256) * [docs] Restructure README.md content (tensorflow#1257) * Refactor README.md content * bump to run ci jobs * Update libtiff/libgeotiff dependency (tensorflow#1258) This PR updates libtiff/libgeotiff to the latest version. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * remove unstable elasticsearch test setup on macOS (tensorflow#1263) * Exposes num_parallel_reads and num_parallel_calls (tensorflow#1232) -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues - Adds test method for _require() function -This update adds a test to check if ValueErrors are raised when given an invalid input for num_parallel_calls Co-authored-by: Abin Shahab <ashahab@linkedin.com> * Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches. This is recommended to be set equal to the vcore request. * Exposes num_parallel_reads and num_parallel_calls (tensorflow#1232) * Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues * Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues * Exposes num_parallel_reads and num_parallel_calls -Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues * Fixes Lint Issues * Removes Optional typing for method parameter - * Adds test method for _require() function -This update adds a test to check if ValueErrors are raised when given an invalid input for num_parallel_calls * Uncomments skip for macOS pytests * Fixes Lint issues Co-authored-by: Abin Shahab <ashahab@linkedin.com> * add avro tutorial testing data (tensorflow#1267) Co-authored-by: Cheng Ren <1428327+chengren311@users.noreply.github.com> * Update Kafka tutorial to work with Apache Kafka (tensorflow#1266) * Update Kafka tutorial to work with Apache Kafka Minor update to the Kafka tutorial to remove the dependency on Confluent's distribution of Kafka, and instead work with vanilla Apache Kafka. Signed-off-by: Dale Lane <dale.lane@uk.ibm.com> * Address review comments Remove redundant pip install commands Signed-off-by: Dale Lane <dale.lane@gmail.com> * add github workflow for performance benchmarking (tensorflow#1269) * add github workflow for performance benchmarking * add github-action-benchmark step * handle missing dependencies while benchmarking (tensorflow#1271) * handle missing dependencies while benchmarking * setup test_sql * job name change * set auto-push to true * remove auto-push * add personal access token * use alternate method to push to gh-pages * add name to the action * use different id * modify creds * use github_token * change repo name * set auto-push * set origin and push results * set env * use PERSONAL_GITHUB_TOKEN * use push changes action * use github.head_ref to push the changes * try using fetch-depth * modify branch name * use alternative push approach * git switch - * test by merging with forked master * Disable s3 macOS for now as docker is not working on GitHub Actions for macOS (tensorflow#1277) * Revert "[s3] add support for testing on macOS (tensorflow#1253)" This reverts commit 81789bd. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * rename testing data files (tensorflow#1278) * Add tutorial for avro dataset API (tensorflow#1250) * remove docker based mongodb tests in macos (tensorflow#1279) * trigger benchmarks workflow only on commits (tensorflow#1282) * Bump Apache Arrow to 3.0.0 (tensorflow#1285) Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add bazel cache (tensorflow#1287) Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add initial bigtable stub test (tensorflow#1286) * Add initial bigtable stub test Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix kokoro test Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add reference to github-pages benchmarks in README (tensorflow#1289) * add reference to github-pages benchmarks * minor grammar change * Update README.md Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> * Clear outputs (tensorflow#1292) * fix kafka online-learning section in tutorial notebook (tensorflow#1274) * kafka notebook fix for colab env * change timeout from 30 to 20 seconds * reduce stream_timeout * Only enable bazel caching writes for tensorflow/io github actions (tensorflow#1293) This PR updates so that only GitHub actions run on tensorflow/io repo will be enabled with bazel cache writes. Without the updates, a focked repo actions will cause error. Note once bazel cache read-permissions are enabled from gcs forked repo will be able to access bazel cache (read-only). Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Enable ready-only bazel cache (tensorflow#1294) This PR enables read-only bazel cache Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Rename tests (tensorflow#1297) * Combine Ubuntu 20.04 and CentOS 7 tests into one GitHub jobs (tensorflow#1299) When GitHub Actions runs it looks like there is an implicit concurrent jobs limit. As such the CentOS 7 test normally is scheduled later after other jobs completes. However, many times CentOS 7 test hangs (e.g., https://github.com/tensorflow/io/runs/1825943449). This is likely due to the CentOS 7 test is on the GitHub Actions queue for too long. This PR moves CentOS 7 to run after Ubuntu 20.04 test complete, to try to avoid hangs. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update names of api tests (tensorflow#1300) We renamed the tests to remove "_eager" parts. This PR updates the api test for correct filenames Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix wrong benchmark tests names (tensorflow#1301) Fixes wrong benchmark tests names caused by last commit Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Patch arrow to temporarily resolve the ARROW-11518 issue (tensorflow#1304) This PR patchs arrow to temporarily resolve the ARROW-11518 issue. See 1281 for details Credit to diggerk. We will update arrow after the upstream PR is merged. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Remove AWS headers from tensorflow, and use headers from third_party … (tensorflow#1241) * Remove external headers from tensorflow, and use third_party headers instead This PR removes external headers from tensorflow, and use third_party headers instead. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Address review comment Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Switch to use github to download libgeotiff (tensorflow#1307) Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add @com_google_absl//absl/strings:cord (tensorflow#1308) Fix read/STDIN_FILENO Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Switch to modular file system for hdfs (tensorflow#1309) * Switch to modular file system for hdfs This PR is part of the effort to switch to modular file system for hdfs. When TF_ENABLE_LEGACY_FILESYSTEM=1 is provided, old behavior will be preserved. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Build against tf-nightly Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update tests Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Adjust the if else logic, follow review comment Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Disable test_write_kafka test for now. (tensorflow#1310) With tensorflow upgrade to tf-nightly, the test_write_kafka test is failing and that is block the plan to modular file system migration. This PR disables the test temporarily so that CI can continue to push tensorflow-io-nightly image (needed for modular file system migration) Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Switch to modular file system for s3 (tensorflow#1312) This PR is part of the effort to switch to modular file system for s3. When TF_ENABLE_LEGACY_FILESYSTEM=1 is provided, old behavior will be preserved. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add python 3.9 on Windows (tensorflow#1316) * Updates the PR to use attribute instead of Env Variable -Originally AVRO_PARSER_NUM_MINIBATCH was set as an environmental variable. Because tensorflow-io rarely uses env vars to fine tune kernal ops this was changed to an attribute. See comment here: tensorflow#1283 (comment) * Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches. This is recommended to be set equal to the vcore request. * Updates the PR to use attribute instead of Env Variable -Originally AVRO_PARSER_NUM_MINIBATCH was set as an environmental variable. Because tensorflow-io rarely uses env vars to fine tune kernal ops this was changed to an attribute. See comment here: tensorflow#1283 (comment) * Adds addtional comments in source code for understandability Co-authored-by: Abin Shahab <ashahab@linkedin.com> Co-authored-by: Yong Tang <yong.tang.github@outlook.com> Co-authored-by: Vo Van Nghia <vovannghia2409@gmail.com> Co-authored-by: Vignesh Kothapalli <vikoth18@in.ibm.com> Co-authored-by: Cheng Ren <chren@linkedin.com> Co-authored-by: Cheng Ren <1428327+chengren311@users.noreply.github.com> Co-authored-by: Dale Lane <dale.lane@gmail.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Mark Daoust <markdaoust@google.com>

Exposes num_parallel_reads and num_parallel_calls

21bc6c7

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues

terrytangyuan reviewed Dec 16, 2020

View reviewed changes

ashahab and others added 2 commits December 16, 2020 17:00

Exposes num_parallel_reads and num_parallel_calls

fb93e5c

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues

Removes merge conflicts in file

e3c8742

yongtang reviewed Dec 18, 2020

View reviewed changes

ashahab and others added 2 commits December 18, 2020 14:28

Exposes num_parallel_reads and num_parallel_calls

5203f5d

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset` -Adds parameter constraints -Fixes lint issues

Fixes Lint Issues

fe683eb

kvignesh1420 reviewed Dec 21, 2020

View reviewed changes

Removes Optional typing for method parameter

8c4973c

-

kvignesh1420 reviewed Dec 21, 2020

View reviewed changes

Adds test method for _require() function

8de7138

-This update adds a test to check if ValueErrors are raised when given an invalid input for num_parallel_calls

burgerkingeater requested changes Jan 6, 2021

View reviewed changes

kvignesh1420 requested changes Jan 6, 2021

View reviewed changes

Uncomments skip for macOS pytests

bbd426a

kvignesh1420 approved these changes Jan 6, 2021

View reviewed changes

kvignesh1420 requested changes Jan 6, 2021

View reviewed changes

Fixes Lint issues

787ddc7

kvignesh1420 merged commit bba69e3 into tensorflow:master Jan 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exposes num_parallel_reads and num_parallel_calls #1232

Exposes num_parallel_reads and num_parallel_calls #1232


		=======

Exposes num_parallel_reads and num_parallel_calls #1232

Exposes num_parallel_reads and num_parallel_calls #1232

Conversation

What to do if you already signed the CLA

Individual signers

Corporate signers

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment