-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Branch 171672655 #13606
Closed
Closed
Branch 171672655 #13606
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Ensures that variable shapes are TensorShapes when accessed in graph_callable functions. PiperOrigin-RevId: 171347097
(2) Adds the ability to clip (so we can get a soft version of relu6) PiperOrigin-RevId: 171347879
…for consistency with relu. This would result in memory savings when training conv->relu6->bn and conv->bn->relu6->conv models, as the inputs to bn and conv are already retained for backprop. PiperOrigin-RevId: 171348086
- Adds explicit and exponential strategies for now. PiperOrigin-RevId: 171350246
PiperOrigin-RevId: 171351986
PiperOrigin-RevId: 171352662
PiperOrigin-RevId: 171352952
PiperOrigin-RevId: 171355854
PiperOrigin-RevId: 171363240
…romOperands. If you clone an instruction and then don't insert it into a computation, it's on you to call DetachFromOperands before destroying it. Otherwise the instruction will stay in its operands' use lists. PiperOrigin-RevId: 171367649
PiperOrigin-RevId: 171369892
Without this change, if TensorFlow is compiled with support for other devices (such with XLA, which makes XLA_CPU and XLA_GPU devices available), then tfe.num_gpus() was incorrectly overcounting the number of available GPUs. PiperOrigin-RevId: 171373389
- fixed a recent regression where VirtualPlacer stopped placing onto non-default devices like "device:TPU", added a test for this, verified that the test failed without the fix; - fixed a number of problems with uppercase/lowercase mismatch in VirtualPlacer code, before that a slight difference between VirtualCluster device and node device ("/tpu:0" vs "/device:TPU:0") could cause fallback to default device, new code should be more resilient. PiperOrigin-RevId: 171374421
PiperOrigin-RevId: 171374722
…arn.Experiment PiperOrigin-RevId: 171375351
PiperOrigin-RevId: 171375399
Also log more info when an asynchronous node is finished. This is useful for debugging deadlocks and issues where a kernel does not return. PiperOrigin-RevId: 171379066
PiperOrigin-RevId: 171382508
PiperOrigin-RevId: 171441141
PiperOrigin-RevId: 171441927
This is helpful to identify erroneous vocab file for the common case of training programs with multiple vocabs. PiperOrigin-RevId: 171476954
PiperOrigin-RevId: 171477807
PiperOrigin-RevId: 171489827
PiperOrigin-RevId: 171536686
Fixes tensorflow#13576 PiperOrigin-RevId: 171540987
`scan()` is similar to `Dataset.map()`, with the addition of a generic piece of state that is accumulated across the elements of the input, and that may be used in the computation of the output elements. This change also updates `rejection_resample()` to use `scan()` rather than a local `tf.ResourceVariable` for accumulating the number of times each class has been encountered. PiperOrigin-RevId: 171542274
PiperOrigin-RevId: 171543801
momentum PiperOrigin-RevId: 171546603
PiperOrigin-RevId: 171549468
PiperOrigin-RevId: 171552443
PiperOrigin-RevId: 171608395
Two optimizations: 1. If dst_type == type(x), Bitcast(x, dst_type) => No-op 2. Bitcast(Bitcast(x, type1), type2) => Bitcast(x, type2) PiperOrigin-RevId: 171608976
PiperOrigin-RevId: 171610904
…name scope. PiperOrigin-RevId: 171610946
…of data. PiperOrigin-RevId: 171611082
…torization estimator, to prevent a rare but possible race condition. PiperOrigin-RevId: 171612114
…onOp with type int8 on a GPU that doesn't support it. Old error message: "No algorithm worked!" New error message: "FusedConv2DBiasActivation is only supported on GPUs with compute capability 6.1 or later." PiperOrigin-RevId: 171614032
PiperOrigin-RevId: 171616821
…s tf.contrib.nn.scaled_softplus. PiperOrigin-RevId: 171618233
…r size histogram is printed out optionally (use vmodule=analytical_cost_estimator=1 or 2). PiperOrigin-RevId: 171619454
PiperOrigin-RevId: 171620470
I would like to reclaim ops.h for a different purpose in a later patch. It doesn't make sense to shove it all in the same header because FusedIrEmitter uses (tuple_)ops.h, but my new functions will use FusedIrEmitter. PiperOrigin-RevId: 171622776
…egenerate BB. It's possible to create a graph such that an elementwise concat is emitted into an LLVM basic block which lacks a terminator. In this case it's an error to call splitBasicBlock(), so we need to handle this (as is done elsewhere in this file). PiperOrigin-RevId: 171624976
SPINN and probably other models commonly split large tensors into many equal parts (e.g. along the batch dimension). When we compute the gradient of such split, we often don't have gradients comming from all parts and end up creating zero tensors. This change caches the last created zero tensor and reuses it. It reduces SPINN training time by over 13%. PiperOrigin-RevId: 171625608
LLVM does not deal well with huge arrays emitted inline into the IR. In JIT mode, this change teaches XLA to emit large constant tensors onto a side data structure, which are then symbolically linked to the generated executable. It is important to note that this works only in JIT mode, and my current understanding is that making this work reliably in AOT will be somewhat more difficult. PiperOrigin-RevId: 171626043
PiperOrigin-RevId: 171627028
An example of a bad ReshapeMover rewrite: BEFORE %reshape.1 = f32[1,1,128] reshape(f32[1,128] %dot) %constant = f32[128] constant({...}) %reshape.2 = f32[1,1,128] reshape(f32[128] %constant) %add = f32[1,1,128] add(f32[1,1,128] %reshape.1, f32[1,1,128] %reshape.2) AFTER %constant = f32[128] constant({...}) %add = f32[1,128] add(f32[1,128] %dot, f32[128] %constant) %reshape = f32[1,1,128] reshape(f32[1,128] %add) The problem in AFTER is the add now contains an implicit broadcast. One way to fix this is to re-shape the %constant to f32[1,128] before the %add. Instead of that, the fix introduced in this CL is to simply prevent the ReshapeMover from moving the reshapes in this case. A comment in reshape_mover.cc describes the complexities that led to this choice. Also added HloVerifiedTestBase, which keeps track of a default HloModule, and automatically runs HloVerifier at the end of every test. This is useful for many HLO tests; the tests of various passes can probably all use this. Three existing issues in reshape_mover_test.cc were found and fixed as a result. PiperOrigin-RevId: 171628656
PiperOrigin-RevId: 171672655
alextp
approved these changes
Oct 10, 2017
Closing in favor of #13613 |
copybara-service bot
pushed a commit
that referenced
this pull request
Jun 12, 2024
Imported from GitHub PR openxla/xla#13639 Resolves a use-after-free when matching and rewriting layer norm patterns. See #13606. Copybara import of the project: -- 91ebf7b4a2ac90ebadce27d1a73e88fb4513aed4 by Philipp Hack <phack@nvidia.com>: Resolves a use-after-free in the norm rewriter. Merging this change closes #13639 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#13639 from philipphack:u_layer_uaf_xla 91ebf7b4a2ac90ebadce27d1a73e88fb4513aed4 PiperOrigin-RevId: 642447222
copybara-service bot
pushed a commit
that referenced
this pull request
Jun 12, 2024
Imported from GitHub PR openxla/xla#13639 Resolves a use-after-free when matching and rewriting layer norm patterns. See #13606. Copybara import of the project: -- 91ebf7b4a2ac90ebadce27d1a73e88fb4513aed4 by Philipp Hack <phack@nvidia.com>: Resolves a use-after-free in the norm rewriter. Merging this change closes #13639 PiperOrigin-RevId: 642548037
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.