[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branch 171672655 #13606

Closed
wants to merge 66 commits into from
Closed

Branch 171672655 #13606

wants to merge 66 commits into from

Conversation

caisq
Copy link
Contributor
@caisq caisq commented Oct 10, 2017

No description provided.

akshayka and others added 30 commits October 6, 2017 14:52
Ensures that variable shapes are TensorShapes when accessed in
graph_callable functions.

PiperOrigin-RevId: 171347097
(2) Adds the ability to clip (so we can get a soft version of relu6)

PiperOrigin-RevId: 171347879
…for consistency with relu. This would result in memory savings when training conv->relu6->bn and conv->bn->relu6->conv models, as the inputs to bn and conv are already retained for backprop.

PiperOrigin-RevId: 171348086
- Adds explicit and exponential strategies for now.

PiperOrigin-RevId: 171350246
PiperOrigin-RevId: 171351986
PiperOrigin-RevId: 171355854
…romOperands.

If you clone an instruction and then don't insert it into a computation,
it's on you to call DetachFromOperands before destroying it.  Otherwise
the instruction will stay in its operands' use lists.

PiperOrigin-RevId: 171367649
PiperOrigin-RevId: 171369892
Without this change, if TensorFlow is compiled with support for other devices
(such with XLA, which makes XLA_CPU and XLA_GPU devices available), then
tfe.num_gpus() was incorrectly overcounting the number of available GPUs.

PiperOrigin-RevId: 171373389
- fixed a recent regression where VirtualPlacer stopped placing onto non-default devices like "device:TPU", added a test for this, verified that the test failed without the fix;
- fixed a number of problems with uppercase/lowercase mismatch in VirtualPlacer code, before that a slight difference between VirtualCluster device and node device ("/tpu:0" vs "/device:TPU:0") could cause fallback to default device, new code should be more resilient.

PiperOrigin-RevId: 171374421
PiperOrigin-RevId: 171374722
Also log more info when an asynchronous node is finished.

This is useful for debugging deadlocks and issues where a kernel does not return.

PiperOrigin-RevId: 171379066
PiperOrigin-RevId: 171382508
PiperOrigin-RevId: 171441141
PiperOrigin-RevId: 171441927
This is helpful to identify erroneous vocab file for the common case of training programs with multiple vocabs.

PiperOrigin-RevId: 171476954
PiperOrigin-RevId: 171536686
Fixes tensorflow#13576

PiperOrigin-RevId: 171540987
`scan()` is similar to `Dataset.map()`, with the addition of a generic piece of
state that is accumulated across the elements of the input, and that may be
used in the computation of the output elements.

This change also updates `rejection_resample()` to use `scan()` rather than a
local `tf.ResourceVariable` for accumulating the number of times each class
has been encountered.

PiperOrigin-RevId: 171542274
PiperOrigin-RevId: 171543801
allenlavoie and others added 19 commits October 9, 2017 17:12
Two optimizations:
1. If dst_type == type(x), Bitcast(x, dst_type) => No-op
2. Bitcast(Bitcast(x, type1), type2) => Bitcast(x, type2)

PiperOrigin-RevId: 171608976
…torization estimator, to prevent a rare but possible race condition.

PiperOrigin-RevId: 171612114
…onOp with type int8 on a GPU that doesn't support it.

Old error message: "No algorithm worked!"
New error message: "FusedConv2DBiasActivation is only supported on GPUs with compute capability 6.1 or later."

PiperOrigin-RevId: 171614032
…s tf.contrib.nn.scaled_softplus.

PiperOrigin-RevId: 171618233
…r size histogram

is printed out optionally (use vmodule=analytical_cost_estimator=1 or 2).

PiperOrigin-RevId: 171619454
I would like to reclaim ops.h for a different purpose in a later patch.
It doesn't make sense to shove it all in the same header because
FusedIrEmitter uses (tuple_)ops.h, but my new functions will use
FusedIrEmitter.

PiperOrigin-RevId: 171622776
…egenerate BB.

It's possible to create a graph such that an elementwise concat is
emitted into an LLVM basic block which lacks a terminator.  In this case
it's an error to call splitBasicBlock(), so we need to handle this (as
is done elsewhere in this file).

PiperOrigin-RevId: 171624976
SPINN and probably other models commonly split large tensors into many
equal parts (e.g. along the batch dimension). When we compute the
gradient of such split, we often don't have gradients comming from all
parts and end up creating zero tensors. This change caches the last
created zero tensor and reuses it. It reduces SPINN training time by
over 13%.

PiperOrigin-RevId: 171625608
LLVM does not deal well with huge arrays emitted inline into the IR.  In JIT
mode, this change teaches XLA to emit large constant tensors onto a side data
structure, which are then symbolically linked to the generated executable.  It
is important to note that this works only in JIT mode, and my current
understanding is that making this work reliably in AOT will be somewhat more
difficult.

PiperOrigin-RevId: 171626043
An example of a bad ReshapeMover rewrite:

BEFORE
  %reshape.1 = f32[1,1,128] reshape(f32[1,128] %dot)
  %constant = f32[128] constant({...})
  %reshape.2 = f32[1,1,128] reshape(f32[128] %constant)
  %add = f32[1,1,128] add(f32[1,1,128] %reshape.1, f32[1,1,128] %reshape.2)

AFTER
  %constant = f32[128] constant({...})
  %add = f32[1,128] add(f32[1,128] %dot, f32[128] %constant)
  %reshape = f32[1,1,128] reshape(f32[1,128] %add)

The problem in AFTER is the add now contains an implicit broadcast. One way to
fix this is to re-shape the %constant to f32[1,128] before the %add.

Instead of that, the fix introduced in this CL is to simply prevent the
ReshapeMover from moving the reshapes in this case. A comment in
reshape_mover.cc describes the complexities that led to this choice.

Also added HloVerifiedTestBase, which keeps track of a default HloModule, and
automatically runs HloVerifier at the end of every test. This is useful for many
HLO tests; the tests of various passes can probably all use this. Three existing
issues in reshape_mover_test.cc were found and fixed as a result.

PiperOrigin-RevId: 171628656
@caisq caisq requested a review from alextp October 10, 2017 14:28
@caisq caisq self-assigned this Oct 10, 2017
This was referenced Oct 10, 2017
@caisq
Copy link
Contributor Author
caisq commented Oct 10, 2017

Closing in favor of #13613

@caisq caisq closed this Oct 10, 2017
copybara-service bot pushed a commit that referenced this pull request Jun 12, 2024
Imported from GitHub PR openxla/xla#13639

Resolves a use-after-free when matching and rewriting layer norm patterns. See #13606.
Copybara import of the project:

--
91ebf7b4a2ac90ebadce27d1a73e88fb4513aed4 by Philipp Hack <phack@nvidia.com>:

Resolves a use-after-free in the norm rewriter.

Merging this change closes #13639

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#13639 from philipphack:u_layer_uaf_xla 91ebf7b4a2ac90ebadce27d1a73e88fb4513aed4
PiperOrigin-RevId: 642447222
copybara-service bot pushed a commit that referenced this pull request Jun 12, 2024
Imported from GitHub PR openxla/xla#13639

Resolves a use-after-free when matching and rewriting layer norm patterns. See #13606.
Copybara import of the project:

--
91ebf7b4a2ac90ebadce27d1a73e88fb4513aed4 by Philipp Hack <phack@nvidia.com>:

Resolves a use-after-free in the norm rewriter.

Merging this change closes #13639

PiperOrigin-RevId: 642548037
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet