[DO NOT MERGE] Remove RTLD_GLOBAL when importing pywrap_tensorflow #11563

allenlavoie · 2017-07-18T00:10:56Z

Just making a pull request to run tests.

tensorflow-jenkins · 2017-07-18T00:10:58Z

Can one of the admins verify this patch?

allenlavoie · 2017-07-18T00:37:17Z

Jenkins, test this please.

Just needed to remove a reference to @protobuf//:protobuf.

allenlavoie · 2017-07-18T16:37:42Z

Thanks Alex. Can I test this myself, or do I need someone with write access to say the magic words?

Jenkins, test this please.

allenlavoie · 2017-07-18T18:30:32Z

Jenkins, test this please.

allenlavoie · 2017-07-18T18:32:33Z

Jenkins, test this please.

jhseu · 2017-07-18T18:57:29Z

Marking this as do not merge since it's testing only. Feel free to change the title if you intend to merge.

…w_test)

allenlavoie · 2017-08-04T21:17:37Z

Jenkins, test this please.

allenlavoie · 2017-08-04T23:04:29Z

Hi Asim and Jonathan,

I'm adding you two as initial reviewers. I'm off next week, but will address comments when I get back. Once you're satisfied, I will add a few more people.

jhseu · 2017-08-15T21:19:16Z

As discussed, please move this to an internal change so there are fewer conflicts that the sync rotation has to resolve. Thanks!

…gather having multiple users Imported from GitHub PR openxla/xla#11563 We have identified another optimization opportunity for gpt-3 using collective matmul, in the backward pass, the all-gather has multiple dot users but current spmd will duplicate multiple collective matmul loops. We'd like this transformation: before: ``` // input // / | // / | // AG windowed loop // / // / // dot ``` after: ``` // input // | // | // windowed loop // | // | // dot ``` This is advantageous since the chained dot can fully utilize all the resource on the GPU while comm is hidden by the first collective matmul loop. We introduced an option to turn off CM loop duplication in SPMD and rewrite the graph to desired pattern in the gpu_windowed_einsum_handler pass. Copybara import of the project: -- 986ac94ab44d31f6d11ec6f135f6cfb2e5636d80 by TJ <tjx@nvidia.com>: Moved most of changes to gpu pass -- 44e81df91c235cac635f334c89d1d8a117ac6511 by TJ <tjx@nvidia.com>: Added e2e test for windowed einsum Minimized unit test hlo -- 8fc24a479de7515f532f36de8ffbcce49516c154 by TJ <tjx@nvidia.com>: Added explanations for spmd tests and dot_handler to skip multiple consumers -- 142d84d54db2b6291484443e43913d86c44a485c by TJ <tjx@nvidia.com>: move windowed einsum test to stateful_rng_spmd_partitioner_test -- 8b9fc43746136b40a814d93bf8086a687490fd7f by TJ <tjx@nvidia.com>: Changed e2e test back to include reducescatter Merging this change closes #11563 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#11563 from Tixxx:tixxx/ag_multi_user 8b9fc43746136b40a814d93bf8086a687490fd7f PiperOrigin-RevId: 633179304

…gather having multiple users Imported from GitHub PR openxla/xla#11563 We have identified another optimization opportunity for gpt-3 using collective matmul, in the backward pass, the all-gather has multiple dot users but current spmd will duplicate multiple collective matmul loops. We'd like this transformation: before: ``` // input // / | // / | // AG windowed loop // / // / // dot ``` after: ``` // input // | // | // windowed loop // | // | // dot ``` This is advantageous since the chained dot can fully utilize all the resource on the GPU while comm is hidden by the first collective matmul loop. We introduced an option to turn off CM loop duplication in SPMD and rewrite the graph to desired pattern in the gpu_windowed_einsum_handler pass. Copybara import of the project: -- 986ac94ab44d31f6d11ec6f135f6cfb2e5636d80 by TJ <tjx@nvidia.com>: Moved most of changes to gpu pass -- 44e81df91c235cac635f334c89d1d8a117ac6511 by TJ <tjx@nvidia.com>: Added e2e test for windowed einsum Minimized unit test hlo -- 8fc24a479de7515f532f36de8ffbcce49516c154 by TJ <tjx@nvidia.com>: Added explanations for spmd tests and dot_handler to skip multiple consumers -- 142d84d54db2b6291484443e43913d86c44a485c by TJ <tjx@nvidia.com>: move windowed einsum test to stateful_rng_spmd_partitioner_test -- 8b9fc43746136b40a814d93bf8086a687490fd7f by TJ <tjx@nvidia.com>: Changed e2e test back to include reducescatter Merging this change closes #11563 PiperOrigin-RevId: 633483864

allenlavoie added 3 commits July 14, 2017 19:01

Remove RTLD_GLOBAL when loading pywrap_tensorflow

3e7dd0a

Merge remote-tracking branch 'upstream/master'

62db0d5

Small build rule cleanup

13ad2b2

allenlavoie self-assigned this Jul 18, 2017

allenlavoie requested review from alextp, asimshankar, petewarden, satok16, sherrym, ThomasColthurst and wolffg as code owners July 18, 2017 00:10

googlebot added the cla: yes label Jul 18, 2017

allenlavoie removed request for alextp, asimshankar, petewarden, satok16, sherrym, ThomasColthurst and wolffg July 18, 2017 00:11

Fix everything when compiling with XLA.

d9db437

Just needed to remove a reference to @protobuf//:protobuf.

alextp previously approved these changes Jul 18, 2017

View reviewed changes

jhseu changed the title ~~Remove RTLD_GLOBAL when importing pywrap_tensorflow~~ [DO NOT MERGE] Remove RTLD_GLOBAL when importing pywrap_tensorflow Jul 18, 2017

allenlavoie added 2 commits July 18, 2017 12:11

Add test_ops to the pip package

1c2ea08

Merge remote-tracking branch 'upstream/master'

6445b1b

allenlavoie added 5 commits August 4, 2017 12:18

Merge remote-tracking branch 'upstream/master'

e3efb4e

A bit of documentation on new/changed build rules.

f37283c

Include extra libraries in the Java package.

9771291

More packaging fixes (for //tensorflow/tools/lib_package:libtensorflo…

48bfc4f

…w_test)

Merge remote-tracking branch 'upstream/master'

00da4f9

allenlavoie added 3 commits August 4, 2017 15:00

Fold jemalloc into libtfframework.so

efebcf9

Clean up library search paths

8ea32b2

More search path cleanups after .so consolidation

f8a039c

allenlavoie changed the title ~~[DO NOT MERGE] Remove RTLD_GLOBAL when importing pywrap_tensorflow~~ Remove RTLD_GLOBAL when importing pywrap_tensorflow Aug 4, 2017

allenlavoie requested review from asimshankar and jhseu August 4, 2017 22:52

allenlavoie changed the title ~~Remove RTLD_GLOBAL when importing pywrap_tensorflow~~ [DO NOT MERGE] Remove RTLD_GLOBAL when importing pywrap_tensorflow Aug 15, 2017

allenlavoie removed request for asimshankar and jhseu August 15, 2017 21:47

allenlavoie added 8 commits August 15, 2017 15:40

Merge remote-tracking branch 'upstream/master'

0765286

Merge remote-tracking branch 'upstream/master'

bfb86d6

Cleanups

5ccf4c7

More cleaning

dfda1e5

More cleanup

8d0bcf9

More merge-friendly changes

12e7f05

Merge remote-tracking branch 'upstream/master'

5e524df

More changes to move towards a static compilation config option

b62dc23

allenlavoie closed this Aug 23, 2017

copybara-service bot mentioned this pull request May 13, 2024

PR #11563: [NVIDIA GPU] Improve GPU collective matmul to support all-gather having multiple users #67445

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DO NOT MERGE] Remove RTLD_GLOBAL when importing pywrap_tensorflow #11563

[DO NOT MERGE] Remove RTLD_GLOBAL when importing pywrap_tensorflow #11563

[DO NOT MERGE] Remove RTLD_GLOBAL when importing pywrap_tensorflow #11563

[DO NOT MERGE] Remove RTLD_GLOBAL when importing pywrap_tensorflow #11563

Conversation