{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "Tce3stUlHN0L" }, "source": [ "##### Copyright 2020 The TensorFlow Authors." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "tuOe1ymfHZPu" }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "qFdPvlXBOdUN" }, "source": [ "# Introduction to gradients and automatic differentiation" ] }, { "cell_type": "markdown", "metadata": { "id": "MfBg1C5NB3X0" }, "source": [ "\n", " \n", " \n", " \n", " \n", "
\n", " View on TensorFlow.org\n", " \n", " Run in Google Colab\n", " \n", " View source on GitHub\n", " \n", " Download notebook\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "r6P32iYYV27b" }, "source": [ "## Automatic Differentiation and Gradients\n", "\n", "[Automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation)\n", "is useful for implementing machine learning algorithms such as\n", "[backpropagation](https://en.wikipedia.org/wiki/Backpropagation) for training\n", "neural networks.\n", "\n", "In this guide, you will explore ways to compute gradients with TensorFlow, especially in eager execution." ] }, { "cell_type": "markdown", "metadata": { "id": "MUXex9ctTuDB" }, "source": [ "## Setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "IqR2PQG4ZaZ0" }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "import tensorflow as tf" ] }, { "cell_type": "markdown", "metadata": { "id": "xHxb-dlhMIzW" }, "source": [ "## Computing gradients\n", "\n", "To differentiate automatically, TensorFlow needs to remember what operations happen in what order during the *forward* pass. Then, during the *backward pass*, TensorFlow traverses this list of operations in reverse order to compute gradients." ] }, { "cell_type": "markdown", "metadata": { "id": "1CLWJl0QliB0" }, "source": [ "## Gradient tapes\n", "\n", "TensorFlow provides the `tf.GradientTape` API for automatic differentiation; that is, computing the gradient of a computation with respect to some inputs, usually `tf.Variable`s.\n", "TensorFlow \"records\" relevant operations executed inside the context of a `tf.GradientTape` onto a \"tape\". TensorFlow then uses that tape to compute the gradients of a \"recorded\" computation using [reverse mode differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation).\n", "\n", "Here is a simple example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Xq9GgTCP7a4A" }, "outputs": [], "source": [ "x = tf.Variable(3.0)\n", "\n", "with tf.GradientTape() as tape:\n", " y = x**2" ] }, { "cell_type": "markdown", "metadata": { "id": "CR9tFAP_7cra" }, "source": [ "Once you've recorded some operations, use `GradientTape.gradient(target, sources)` to calculate the gradient of some target (often a loss) relative to some source (often the model's variables):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "LsvrwF6bHroC" }, "outputs": [], "source": [ "# dy = 2x * dx\n", "dy_dx = tape.gradient(y, x)\n", "dy_dx.numpy()" ] }, { "cell_type": "markdown", "metadata": { "id": "Q2_aqsO25Vx1" }, "source": [ "The above example uses scalars, but `tf.GradientTape` works as easily on any tensor:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "vacZ3-Ws5VdV" }, "outputs": [], "source": [ "w = tf.Variable(tf.random.normal((3, 2)), name='w')\n", "b = tf.Variable(tf.zeros(2, dtype=tf.float32), name='b')\n", "x = [[1., 2., 3.]]\n", "\n", "with tf.GradientTape(persistent=True) as tape:\n", " y = x @ w + b\n", " loss = tf.reduce_mean(y**2)" ] }, { "cell_type": "markdown", "metadata": { "id": "i4eXOkrQ-9Pb" }, "source": [ "To get the gradient of `loss` with respect to both variables, you can pass both as sources to the `gradient` method. The tape is flexible about how sources are passed and will accept any nested combination of lists or dictionaries and return the gradient structured the same way (see `tf.nest`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "luOtK1Da_BR0" }, "outputs": [], "source": [ "[dl_dw, dl_db] = tape.gradient(loss, [w, b])" ] }, { "cell_type": "markdown", "metadata": { "id": "Ei4iVXi6qgM7" }, "source": [ "The gradient with respect to each source has the shape of the source:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "aYbWRFPZqk4U" }, "outputs": [], "source": [ "print(w.shape)\n", "print(dl_dw.shape)" ] }, { "cell_type": "markdown", "metadata": { "id": "dI_SzxHsvao1" }, "source": [ "Here is the gradient calculation again, this time passing a dictionary of variables:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "d73cY6NOuaMd" }, "outputs": [], "source": [ "my_vars = {\n", " 'w': w,\n", " 'b': b\n", "}\n", "\n", "grad = tape.gradient(loss, my_vars)\n", "grad['b']" ] }, { "cell_type": "markdown", "metadata": { "id": "HZ2LvHifEMgO" }, "source": [ "## Gradients with respect to a model\n", "\n", "It's common to collect `tf.Variables` into a `tf.Module` or one of its subclasses (`layers.Layer`, `keras.Model`) for [checkpointing](checkpoint.ipynb) and [exporting](saved_model.ipynb).\n", "\n", "In most cases, you will want to calculate gradients with respect to a model's trainable variables. Since all subclasses of `tf.Module` aggregate their variables in the `Module.trainable_variables` property, you can calculate these gradients in a few lines of code: " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JvesHtbQESc-" }, "outputs": [], "source": [ "layer = tf.keras.layers.Dense(2, activation='relu')\n", "x = tf.constant([[1., 2., 3.]])\n", "\n", "with tf.GradientTape() as tape:\n", " # Forward pass\n", " y = layer(x)\n", " loss = tf.reduce_mean(y**2)\n", "\n", "# Calculate gradients with respect to every trainable variable\n", "grad = tape.gradient(loss, layer.trainable_variables)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "PR_ezr6UFrpI" }, "outputs": [], "source": [ "for var, g in zip(layer.trainable_variables, grad):\n", " print(f'{var.name}, shape: {g.shape}')" ] }, { "cell_type": "markdown", "metadata": { "id": "f6Gx6LS714zR" }, "source": [ "\n", "\n", "## Controlling what the tape watches" ] }, { "cell_type": "markdown", "metadata": { "id": "N4VlqKFzzGaC" }, "source": [ "The default behavior is to record all operations after accessing a trainable `tf.Variable`. The reasons for this are:\n", "\n", "* The tape needs to know which operations to record in the forward pass to calculate the gradients in the backwards pass.\n", "* The tape holds references to intermediate outputs, so you don't want to record unnecessary operations.\n", "* The most common use case involves calculating the gradient of a loss with respect to all a model's trainable variables.\n", "\n", "For example, the following fails to calculate a gradient because the `tf.Tensor` is not \"watched\" by default, and the `tf.Variable` is not trainable:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Kj9gPckdB37a" }, "outputs": [], "source": [ "# A trainable variable\n", "x0 = tf.Variable(3.0, name='x0')\n", "# Not trainable\n", "x1 = tf.Variable(3.0, name='x1', trainable=False)\n", "# Not a Variable: A variable + tensor returns a tensor.\n", "x2 = tf.Variable(2.0, name='x2') + 1.0\n", "# Not a variable\n", "x3 = tf.constant(3.0, name='x3')\n", "\n", "with tf.GradientTape() as tape:\n", " y = (x0**2) + (x1**2) + (x2**2)\n", "\n", "grad = tape.gradient(y, [x0, x1, x2, x3])\n", "\n", "for g in grad:\n", " print(g)" ] }, { "cell_type": "markdown", "metadata": { "id": "RkcpQnLgNxgi" }, "source": [ "You can list the variables being watched by the tape using the `GradientTape.watched_variables` method:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "hwNwjW1eAkib" }, "outputs": [], "source": [ "[var.name for var in tape.watched_variables()]" ] }, { "cell_type": "markdown", "metadata": { "id": "NB9I1uFvB4tf" }, "source": [ "`tf.GradientTape` provides hooks that give the user control over what is or is not watched.\n", "\n", "To record gradients with respect to a `tf.Tensor`, you need to call `GradientTape.watch(x)`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "tVN1QqFRDHBK" }, "outputs": [], "source": [ "x = tf.constant(3.0)\n", "with tf.GradientTape() as tape:\n", " tape.watch(x)\n", " y = x**2\n", "\n", "# dy = 2x * dx\n", "dy_dx = tape.gradient(y, x)\n", "print(dy_dx.numpy())" ] }, { "cell_type": "markdown", "metadata": { "id": "qxsiYnf2DN8K" }, "source": [ "Conversely, to disable the default behavior of watching all `tf.Variables`, set `watch_accessed_variables=False` when creating the gradient tape. This calculation uses two variables, but only connects the gradient for one of the variables:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "7QPzwWvSEwIp" }, "outputs": [], "source": [ "x0 = tf.Variable(0.0)\n", "x1 = tf.Variable(10.0)\n", "\n", "with tf.GradientTape(watch_accessed_variables=False) as tape:\n", " tape.watch(x1)\n", " y0 = tf.math.sin(x0)\n", " y1 = tf.nn.softplus(x1)\n", " y = y0 + y1\n", " ys = tf.reduce_sum(y)" ] }, { "cell_type": "markdown", "metadata": { "id": "TRduLbE1H2IJ" }, "source": [ "Since `GradientTape.watch` was not called on `x0`, no gradient is computed with respect to it:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "e6GM-3evH1Sz" }, "outputs": [], "source": [ "# dys/dx1 = exp(x1) / (1 + exp(x1)) = sigmoid(x1)\n", "grad = tape.gradient(ys, {'x0': x0, 'x1': x1})\n", "\n", "print('dy/dx0:', grad['x0'])\n", "print('dy/dx1:', grad['x1'].numpy())" ] }, { "cell_type": "markdown", "metadata": { "id": "2g1nKB6P-OnA" }, "source": [ "## Intermediate results\n", "\n", "You can also request gradients of the output with respect to intermediate values computed inside the `tf.GradientTape` context." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "7XaPRAwUyYms" }, "outputs": [], "source": [ "x = tf.constant(3.0)\n", "\n", "with tf.GradientTape() as tape:\n", " tape.watch(x)\n", " y = x * x\n", " z = y * y\n", "\n", "# Use the tape to compute the gradient of z with respect to the\n", "# intermediate value y.\n", "# dz_dy = 2 * y and y = x ** 2 = 9\n", "print(tape.gradient(z, y).numpy())" ] }, { "cell_type": "markdown", "metadata": { "id": "ISkXuY7YzIcS" }, "source": [ "By default, the resources held by a `GradientTape` are released as soon as the `GradientTape.gradient` method is called. To compute multiple gradients over the same computation, create a gradient tape with `persistent=True`. This allows multiple calls to the `gradient` method as resources are released when the tape object is garbage collected. For example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "zZaCm3-9zVCi" }, "outputs": [], "source": [ "x = tf.constant([1, 3.0])\n", "with tf.GradientTape(persistent=True) as tape:\n", " tape.watch(x)\n", " y = x * x\n", " z = y * y\n", "\n", "print(tape.gradient(z, x).numpy()) # [4.0, 108.0] (4 * x**3 at x = [1.0, 3.0])\n", "print(tape.gradient(y, x).numpy()) # [2.0, 6.0] (2 * x at x = [1.0, 3.0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "j8bv_jQFg6CN" }, "outputs": [], "source": [ "del tape # Drop the reference to the tape" ] }, { "cell_type": "markdown", "metadata": { "id": "O_ZY-9BUB7vX" }, "source": [ "## Notes on performance\n", "\n", "* There is a tiny overhead associated with doing operations inside a gradient tape context. For most eager execution this will not be a noticeable cost, but you should still use tape context around the areas only where it is required.\n", "\n", "* Gradient tapes use memory to store intermediate results, including inputs and outputs, for use during the backwards pass.\n", "\n", " For efficiency, some ops (like `ReLU`) don't need to keep their intermediate results and they are pruned during the forward pass. However, if you use `persistent=True` on your tape, *nothing is discarded* and your peak memory usage will be higher." ] }, { "cell_type": "markdown", "metadata": { "id": "9dLBpZsJebFq" }, "source": [ "## Gradients of non-scalar targets" ] }, { "cell_type": "markdown", "metadata": { "id": "7pldU9F5duP2" }, "source": [ "A gradient is fundamentally an operation on a scalar." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "qI0sDV_WeXBb" }, "outputs": [], "source": [ "x = tf.Variable(2.0)\n", "with tf.GradientTape(persistent=True) as tape:\n", " y0 = x**2\n", " y1 = 1 / x\n", "\n", "print(tape.gradient(y0, x).numpy())\n", "print(tape.gradient(y1, x).numpy())" ] }, { "cell_type": "markdown", "metadata": { "id": "COEyYp34fxj4" }, "source": [ "Thus, if you ask for the gradient of multiple targets, the result for each source is:\n", "\n", "* The gradient of the sum of the targets, or equivalently\n", "* The sum of the gradients of each target." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "o4a6_YOcfWKS" }, "outputs": [], "source": [ "x = tf.Variable(2.0)\n", "with tf.GradientTape() as tape:\n", " y0 = x**2\n", " y1 = 1 / x\n", "\n", "print(tape.gradient({'y0': y0, 'y1': y1}, x).numpy())" ] }, { "cell_type": "markdown", "metadata": { "id": "uvP-mkBMgbym" }, "source": [ "Similarly, if the target(s) are not scalar the gradient of the sum is calculated:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "DArPWqsSh5un" }, "outputs": [], "source": [ "x = tf.Variable(2.)\n", "\n", "with tf.GradientTape() as tape:\n", " y = x * [3., 4.]\n", "\n", "print(tape.gradient(y, x).numpy())" ] }, { "cell_type": "markdown", "metadata": { "id": "flDbx68Zh5Lb" }, "source": [ "This makes it simple to take the gradient of the sum of a collection of losses, or the gradient of the sum of an element-wise loss calculation.\n", "\n", "If you need a separate gradient for each item, refer to [Jacobians](advanced_autodiff.ipynb#jacobians)." ] }, { "cell_type": "markdown", "metadata": { "id": "iwFswok8RAly" }, "source": [ "In some cases you can skip the Jacobian. For an element-wise calculation, the gradient of the sum gives the derivative of each element with respect to its input-element, since each element is independent:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JQvk_jnMmTDS" }, "outputs": [], "source": [ "x = tf.linspace(-10.0, 10.0, 200+1)\n", "\n", "with tf.GradientTape() as tape:\n", " tape.watch(x)\n", " y = tf.nn.sigmoid(x)\n", "\n", "dy_dx = tape.gradient(y, x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "e_f2QgDPmcPE" }, "outputs": [], "source": [ "plt.plot(x, y, label='y')\n", "plt.plot(x, dy_dx, label='dy/dx')\n", "plt.legend()\n", "_ = plt.xlabel('x')" ] }, { "cell_type": "markdown", "metadata": { "id": "6kADybtQzYj4" }, "source": [ "## Control flow\n", "\n", "Because a gradient tape records operations as they are executed, Python control flow is naturally handled (for example, `if` and `while` statements).\n", "\n", "Here a different variable is used on each branch of an `if`. The gradient only connects to the variable that was used:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ciFLizhrrjy7" }, "outputs": [], "source": [ "x = tf.constant(1.0)\n", "\n", "v0 = tf.Variable(2.0)\n", "v1 = tf.Variable(2.0)\n", "\n", "with tf.GradientTape(persistent=True) as tape:\n", " tape.watch(x)\n", " if x > 0.0:\n", " result = v0\n", " else:\n", " result = v1**2 \n", "\n", "dv0, dv1 = tape.gradient(result, [v0, v1])\n", "\n", "print(dv0)\n", "print(dv1)" ] }, { "cell_type": "markdown", "metadata": { "id": "HKnLaiapsjeP" }, "source": [ "Just remember that the control statements themselves are not differentiable, so they are invisible to gradient-based optimizers.\n", "\n", "Depending on the value of `x` in the above example, the tape either records `result = v0` or `result = v1**2`. The gradient with respect to `x` is always `None`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "8k05WmuAwPm7" }, "outputs": [], "source": [ "dx = tape.gradient(result, x)\n", "\n", "print(dx)" ] }, { "cell_type": "markdown", "metadata": { "id": "egypBxISAHhx" }, "source": [ "## Cases where `gradient` returns `None`\n", "\n", "When a target is not connected to a source, `gradient` will return `None`.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "CU185WDM81Ut" }, "outputs": [], "source": [ "x = tf.Variable(2.)\n", "y = tf.Variable(3.)\n", "\n", "with tf.GradientTape() as tape:\n", " z = y * y\n", "print(tape.gradient(z, x))" ] }, { "cell_type": "markdown", "metadata": { "id": "sZbKpHfBRJym" }, "source": [ "Here `z` is obviously not connected to `x`, but there are several less-obvious ways that a gradient can be disconnected." ] }, { "cell_type": "markdown", "metadata": { "id": "eHDzDOiQ8xmw" }, "source": [ "### 1. Replaced a variable with a tensor\n", "\n", "In the section on [\"controlling what the tape watches\"](#watches) you saw that the tape will automatically watch a `tf.Variable` but not a `tf.Tensor`.\n", "\n", "One common error is to inadvertently replace a `tf.Variable` with a `tf.Tensor`, instead of using `Variable.assign` to update the `tf.Variable`. Here is an example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "QPKY4Tn9zX7_" }, "outputs": [], "source": [ "x = tf.Variable(2.0)\n", "\n", "for epoch in range(2):\n", " with tf.GradientTape() as tape:\n", " y = x+1\n", "\n", " print(type(x).__name__, \":\", tape.gradient(y, x))\n", " x = x + 1 # This should be `x.assign_add(1)`" ] }, { "cell_type": "markdown", "metadata": { "id": "3gwZKxgA97an" }, "source": [ "### 2. Did calculations outside of TensorFlow\n", "\n", "The tape can't record the gradient path if the calculation exits TensorFlow.\n", "For example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "jmoLCDJb_yw1" }, "outputs": [], "source": [ "x = tf.Variable([[1.0, 2.0],\n", " [3.0, 4.0]], dtype=tf.float32)\n", "\n", "with tf.GradientTape() as tape:\n", " x2 = x**2\n", "\n", " # This step is calculated with NumPy\n", " y = np.mean(x2, axis=0)\n", "\n", " # Like most ops, reduce_mean will cast the NumPy array to a constant tensor\n", " # using `tf.convert_to_tensor`.\n", " y = tf.reduce_mean(y, axis=0)\n", "\n", "print(tape.gradient(y, x))" ] }, { "cell_type": "markdown", "metadata": { "id": "p3YVfP3R-tp7" }, "source": [ "### 3. Took gradients through an integer or string\n", "\n", "Integers and strings are not differentiable. If a calculation path uses these data types there will be no gradient.\n", "\n", "Nobody expects strings to be differentiable, but it's easy to accidentally create an `int` constant or variable if you don't specify the `dtype`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9jlHXHqfASU3" }, "outputs": [], "source": [ "x = tf.constant(10)\n", "\n", "with tf.GradientTape() as g:\n", " g.watch(x)\n", " y = x * x\n", "\n", "print(g.gradient(y, x))" ] }, { "cell_type": "markdown", "metadata": { "id": "RsdP_mTHX9L1" }, "source": [ "TensorFlow doesn't automatically cast between types, so, in practice, you'll often get a type error instead of a missing gradient." ] }, { "cell_type": "markdown", "metadata": { "id": "WyAZ7C8qCEs6" }, "source": [ "### 4. Took gradients through a stateful object\n", "\n", "State stops gradients. When you read from a stateful object, the tape can only observe the current state, not the history that lead to it.\n", "\n", "A `tf.Tensor` is immutable. You can't change a tensor once it's created. It has a _value_, but no _state_. All the operations discussed so far are also stateless: the output of a `tf.matmul` only depends on its inputs.\n", "\n", "A `tf.Variable` has internal state—its value. When you use the variable, the state is read. It's normal to calculate a gradient with respect to a variable, but the variable's state blocks gradient calculations from going farther back. For example:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "C1tLeeRFE479" }, "outputs": [], "source": [ "x0 = tf.Variable(3.0)\n", "x1 = tf.Variable(0.0)\n", "\n", "with tf.GradientTape() as tape:\n", " # Update x1 = x1 + x0.\n", " x1.assign_add(x0)\n", " # The tape starts recording from x1.\n", " y = x1**2 # y = (x1 + x0)**2\n", "\n", "# This doesn't work.\n", "print(tape.gradient(y, x0)) #dy/dx0 = 2*(x1 + x0)" ] }, { "cell_type": "markdown", "metadata": { "id": "xKA92-dqF2r-" }, "source": [ "Similarly, `tf.data.Dataset` iterators and `tf.queue`s are stateful, and will stop all gradients on tensors that pass through them." ] }, { "cell_type": "markdown", "metadata": { "id": "HHvcDGIbOj2I" }, "source": [ "## No gradient registered" ] }, { "cell_type": "markdown", "metadata": { "id": "aoc-A6AxVqry" }, "source": [ "Some `tf.Operation`s are **registered as being non-differentiable** and will return `None`. Others have **no gradient registered**.\n", "\n", "The `tf.raw_ops` page shows which low-level ops have gradients registered.\n", "\n", "If you attempt to take a gradient through a float op that has no gradient registered the tape will throw an error instead of silently returning `None`. This way you know something has gone wrong.\n", "\n", "For example, the `tf.image.adjust_contrast` function wraps `raw_ops.AdjustContrastv2`, which could have a gradient but the gradient is not implemented:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "HSb20FXc_V0U" }, "outputs": [], "source": [ "image = tf.Variable([[[0.5, 0.0, 0.0]]])\n", "delta = tf.Variable(0.1)\n", "\n", "with tf.GradientTape() as tape:\n", " new_image = tf.image.adjust_contrast(image, delta)\n", "\n", "try:\n", " print(tape.gradient(new_image, [image, delta]))\n", " assert False # This should not happen.\n", "except LookupError as e:\n", " print(f'{type(e).__name__}: {e}')\n" ] }, { "cell_type": "markdown", "metadata": { "id": "pDoutjzATiEm" }, "source": [ "If you need to differentiate through this op, you'll either need to implement the gradient and register it (using `tf.RegisterGradient`) or re-implement the function using other ops." ] }, { "cell_type": "markdown", "metadata": { "id": "GCTwc_dQXp2W" }, "source": [ "## Zeros instead of None" ] }, { "cell_type": "markdown", "metadata": { "id": "TYDrVogA89eA" }, "source": [ "In some cases it would be convenient to get 0 instead of `None` for unconnected gradients. You can decide what to return when you have unconnected gradients using the `unconnected_gradients` argument:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "U6zxk1sf9Ixx" }, "outputs": [], "source": [ "x = tf.Variable([2., 2.])\n", "y = tf.Variable(3.)\n", "\n", "with tf.GradientTape() as tape:\n", " z = y**2\n", "print(tape.gradient(z, x, unconnected_gradients=tf.UnconnectedGradients.ZERO))" ] } ], "metadata": { "colab": { "collapsed_sections": [ "Tce3stUlHN0L" ], "name": "autodiff.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }