-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] Remove RTLD_GLOBAL when importing pywrap_tensorflow #11563
Conversation
Can one of the admins verify this patch? |
Jenkins, test this please. |
Just needed to remove a reference to @protobuf//:protobuf.
Thanks Alex. Can I test this myself, or do I need someone with write access to say the magic words? Jenkins, test this please. |
Jenkins, test this please. |
1 similar comment
Jenkins, test this please. |
Marking this as do not merge since it's testing only. Feel free to change the title if you intend to merge. |
Jenkins, test this please. |
Hi Asim and Jonathan, I'm adding you two as initial reviewers. I'm off next week, but will address comments when I get back. Once you're satisfied, I will add a few more people. |
As discussed, please move this to an internal change so there are fewer conflicts that the sync rotation has to resolve. Thanks! |
…gather having multiple users Imported from GitHub PR openxla/xla#11563 We have identified another optimization opportunity for gpt-3 using collective matmul, in the backward pass, the all-gather has multiple dot users but current spmd will duplicate multiple collective matmul loops. We'd like this transformation: before: ``` // input // / | // / | // AG windowed loop // / // / // dot ``` after: ``` // input // | // | // windowed loop // | // | // dot ``` This is advantageous since the chained dot can fully utilize all the resource on the GPU while comm is hidden by the first collective matmul loop. We introduced an option to turn off CM loop duplication in SPMD and rewrite the graph to desired pattern in the gpu_windowed_einsum_handler pass. Copybara import of the project: -- 986ac94ab44d31f6d11ec6f135f6cfb2e5636d80 by TJ <tjx@nvidia.com>: Moved most of changes to gpu pass -- 44e81df91c235cac635f334c89d1d8a117ac6511 by TJ <tjx@nvidia.com>: Added e2e test for windowed einsum Minimized unit test hlo -- 8fc24a479de7515f532f36de8ffbcce49516c154 by TJ <tjx@nvidia.com>: Added explanations for spmd tests and dot_handler to skip multiple consumers -- 142d84d54db2b6291484443e43913d86c44a485c by TJ <tjx@nvidia.com>: move windowed einsum test to stateful_rng_spmd_partitioner_test -- 8b9fc43746136b40a814d93bf8086a687490fd7f by TJ <tjx@nvidia.com>: Changed e2e test back to include reducescatter Merging this change closes #11563 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#11563 from Tixxx:tixxx/ag_multi_user 8b9fc43746136b40a814d93bf8086a687490fd7f PiperOrigin-RevId: 633179304
…gather having multiple users Imported from GitHub PR openxla/xla#11563 We have identified another optimization opportunity for gpt-3 using collective matmul, in the backward pass, the all-gather has multiple dot users but current spmd will duplicate multiple collective matmul loops. We'd like this transformation: before: ``` // input // / | // / | // AG windowed loop // / // / // dot ``` after: ``` // input // | // | // windowed loop // | // | // dot ``` This is advantageous since the chained dot can fully utilize all the resource on the GPU while comm is hidden by the first collective matmul loop. We introduced an option to turn off CM loop duplication in SPMD and rewrite the graph to desired pattern in the gpu_windowed_einsum_handler pass. Copybara import of the project: -- 986ac94ab44d31f6d11ec6f135f6cfb2e5636d80 by TJ <tjx@nvidia.com>: Moved most of changes to gpu pass -- 44e81df91c235cac635f334c89d1d8a117ac6511 by TJ <tjx@nvidia.com>: Added e2e test for windowed einsum Minimized unit test hlo -- 8fc24a479de7515f532f36de8ffbcce49516c154 by TJ <tjx@nvidia.com>: Added explanations for spmd tests and dot_handler to skip multiple consumers -- 142d84d54db2b6291484443e43913d86c44a485c by TJ <tjx@nvidia.com>: move windowed einsum test to stateful_rng_spmd_partitioner_test -- 8b9fc43746136b40a814d93bf8086a687490fd7f by TJ <tjx@nvidia.com>: Changed e2e test back to include reducescatter Merging this change closes #11563 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#11563 from Tixxx:tixxx/ag_multi_user 8b9fc43746136b40a814d93bf8086a687490fd7f PiperOrigin-RevId: 633179304
…gather having multiple users Imported from GitHub PR openxla/xla#11563 We have identified another optimization opportunity for gpt-3 using collective matmul, in the backward pass, the all-gather has multiple dot users but current spmd will duplicate multiple collective matmul loops. We'd like this transformation: before: ``` // input // / | // / | // AG windowed loop // / // / // dot ``` after: ``` // input // | // | // windowed loop // | // | // dot ``` This is advantageous since the chained dot can fully utilize all the resource on the GPU while comm is hidden by the first collective matmul loop. We introduced an option to turn off CM loop duplication in SPMD and rewrite the graph to desired pattern in the gpu_windowed_einsum_handler pass. Copybara import of the project: -- 986ac94ab44d31f6d11ec6f135f6cfb2e5636d80 by TJ <tjx@nvidia.com>: Moved most of changes to gpu pass -- 44e81df91c235cac635f334c89d1d8a117ac6511 by TJ <tjx@nvidia.com>: Added e2e test for windowed einsum Minimized unit test hlo -- 8fc24a479de7515f532f36de8ffbcce49516c154 by TJ <tjx@nvidia.com>: Added explanations for spmd tests and dot_handler to skip multiple consumers -- 142d84d54db2b6291484443e43913d86c44a485c by TJ <tjx@nvidia.com>: move windowed einsum test to stateful_rng_spmd_partitioner_test -- 8b9fc43746136b40a814d93bf8086a687490fd7f by TJ <tjx@nvidia.com>: Changed e2e test back to include reducescatter Merging this change closes #11563 PiperOrigin-RevId: 633483864
Just making a pull request to run tests.