{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Tce3stUlHN0L"
      },
      "source": [
        "##### Copyright 2020 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "tuOe1ymfHZPu"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qFdPvlXBOdUN"
      },
      "source": [
        "# Introduction to gradients and automatic differentiation"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MfBg1C5NB3X0"
      },
      "source": [
        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://www.tensorflow.org/guide/autodiff\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/guide/autodiff.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://github.com/tensorflow/docs/blob/master/site/en/guide/autodiff.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a href=\"https://storage.googleapis.com/tensorflow_docs/docs/site/en/guide/autodiff.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\" />Download notebook</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "r6P32iYYV27b"
      },
      "source": [
        "## Automatic Differentiation and Gradients\n",
        "\n",
        "[Automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation)\n",
        "is useful for implementing machine learning algorithms such as\n",
        "[backpropagation](https://en.wikipedia.org/wiki/Backpropagation) for training\n",
        "neural networks.\n",
        "\n",
        "In this guide, you will explore ways to compute gradients with TensorFlow, especially in eager execution."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MUXex9ctTuDB"
      },
      "source": [
        "## Setup"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "IqR2PQG4ZaZ0"
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
        "\n",
        "import tensorflow as tf"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xHxb-dlhMIzW"
      },
      "source": [
        "## Computing gradients\n",
        "\n",
        "To differentiate automatically, TensorFlow needs to remember what operations happen in what order during the *forward* pass.  Then, during the *backward pass*, TensorFlow traverses this list of operations in reverse order to compute gradients."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1CLWJl0QliB0"
      },
      "source": [
        "## Gradient tapes\n",
        "\n",
        "TensorFlow provides the `tf.GradientTape` API for automatic differentiation; that is, computing the gradient of a computation with respect to some inputs, usually `tf.Variable`s.\n",
        "TensorFlow \"records\" relevant operations executed inside the context of a `tf.GradientTape` onto a \"tape\". TensorFlow then uses that tape to compute the gradients of a \"recorded\" computation using [reverse mode differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation).\n",
        "\n",
        "Here is a simple example:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Xq9GgTCP7a4A"
      },
      "outputs": [],
      "source": [
        "x = tf.Variable(3.0)\n",
        "\n",
        "with tf.GradientTape() as tape:\n",
        "  y = x**2"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CR9tFAP_7cra"
      },
      "source": [
        "Once you've recorded some operations, use `GradientTape.gradient(target, sources)` to calculate the gradient of some target (often a loss) relative to some source (often the model's variables):"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "LsvrwF6bHroC"
      },
      "outputs": [],
      "source": [
        "# dy = 2x * dx\n",
        "dy_dx = tape.gradient(y, x)\n",
        "dy_dx.numpy()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Q2_aqsO25Vx1"
      },
      "source": [
        "The above example uses scalars, but `tf.GradientTape` works as easily on any tensor:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "vacZ3-Ws5VdV"
      },
      "outputs": [],
      "source": [
        "w = tf.Variable(tf.random.normal((3, 2)), name='w')\n",
        "b = tf.Variable(tf.zeros(2, dtype=tf.float32), name='b')\n",
        "x = [[1., 2., 3.]]\n",
        "\n",
        "with tf.GradientTape(persistent=True) as tape:\n",
        "  y = x @ w + b\n",
        "  loss = tf.reduce_mean(y**2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "i4eXOkrQ-9Pb"
      },
      "source": [
        "To get the gradient of `loss` with respect to both variables, you can pass both as sources to the `gradient` method. The tape is flexible about how sources are passed and will accept any nested combination of lists or dictionaries and return the gradient structured the same way (see `tf.nest`)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "luOtK1Da_BR0"
      },
      "outputs": [],
      "source": [
        "[dl_dw, dl_db] = tape.gradient(loss, [w, b])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ei4iVXi6qgM7"
      },
      "source": [
        "The gradient with respect to each source has the shape of the source:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "aYbWRFPZqk4U"
      },
      "outputs": [],
      "source": [
        "print(w.shape)\n",
        "print(dl_dw.shape)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dI_SzxHsvao1"
      },
      "source": [
        "Here is the gradient calculation again, this time passing a dictionary of variables:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "d73cY6NOuaMd"
      },
      "outputs": [],
      "source": [
        "my_vars = {\n",
        "    'w': w,\n",
        "    'b': b\n",
        "}\n",
        "\n",
        "grad = tape.gradient(loss, my_vars)\n",
        "grad['b']"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HZ2LvHifEMgO"
      },
      "source": [
        "## Gradients with respect to a model\n",
        "\n",
        "It's common to collect `tf.Variables` into a `tf.Module` or one of its subclasses (`layers.Layer`, `keras.Model`) for [checkpointing](checkpoint.ipynb) and [exporting](saved_model.ipynb).\n",
        "\n",
        "In most cases, you will want to calculate gradients with respect to a model's trainable variables.  Since all subclasses of `tf.Module` aggregate their variables in the `Module.trainable_variables` property, you can calculate these gradients in a few lines of code: "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "JvesHtbQESc-"
      },
      "outputs": [],
      "source": [
        "layer = tf.keras.layers.Dense(2, activation='relu')\n",
        "x = tf.constant([[1., 2., 3.]])\n",
        "\n",
        "with tf.GradientTape() as tape:\n",
        "  # Forward pass\n",
        "  y = layer(x)\n",
        "  loss = tf.reduce_mean(y**2)\n",
        "\n",
        "# Calculate gradients with respect to every trainable variable\n",
        "grad = tape.gradient(loss, layer.trainable_variables)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "PR_ezr6UFrpI"
      },
      "outputs": [],
      "source": [
        "for var, g in zip(layer.trainable_variables, grad):\n",
        "  print(f'{var.name}, shape: {g.shape}')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "f6Gx6LS714zR"
      },
      "source": [
        "<a id=\"watches\"></a>\n",
        "\n",
        "## Controlling what the tape watches"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "N4VlqKFzzGaC"
      },
      "source": [
        "The default behavior is to record all operations after accessing a trainable `tf.Variable`. The reasons for this are:\n",
        "\n",
        "* The tape needs to know which operations to record in the forward pass to calculate the gradients in the backwards pass.\n",
        "* The tape holds references to intermediate outputs, so you don't want to record unnecessary operations.\n",
        "* The most common use case involves calculating the gradient of a loss with respect to all a model's trainable variables.\n",
        "\n",
        "For example, the following fails to calculate a gradient because the `tf.Tensor` is not \"watched\" by default, and the `tf.Variable` is not trainable:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Kj9gPckdB37a"
      },
      "outputs": [],
      "source": [
        "# A trainable variable\n",
        "x0 = tf.Variable(3.0, name='x0')\n",
        "# Not trainable\n",
        "x1 = tf.Variable(3.0, name='x1', trainable=False)\n",
        "# Not a Variable: A variable + tensor returns a tensor.\n",
        "x2 = tf.Variable(2.0, name='x2') + 1.0\n",
        "# Not a variable\n",
        "x3 = tf.constant(3.0, name='x3')\n",
        "\n",
        "with tf.GradientTape() as tape:\n",
        "  y = (x0**2) + (x1**2) + (x2**2)\n",
        "\n",
        "grad = tape.gradient(y, [x0, x1, x2, x3])\n",
        "\n",
        "for g in grad:\n",
        "  print(g)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RkcpQnLgNxgi"
      },
      "source": [
        "You can list the variables being watched by the tape using the `GradientTape.watched_variables` method:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "hwNwjW1eAkib"
      },
      "outputs": [],
      "source": [
        "[var.name for var in tape.watched_variables()]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NB9I1uFvB4tf"
      },
      "source": [
        "`tf.GradientTape` provides hooks that give the user control over what is or is not watched.\n",
        "\n",
        "To record gradients with respect to a `tf.Tensor`, you need to call `GradientTape.watch(x)`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "tVN1QqFRDHBK"
      },
      "outputs": [],
      "source": [
        "x = tf.constant(3.0)\n",
        "with tf.GradientTape() as tape:\n",
        "  tape.watch(x)\n",
        "  y = x**2\n",
        "\n",
        "# dy = 2x * dx\n",
        "dy_dx = tape.gradient(y, x)\n",
        "print(dy_dx.numpy())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qxsiYnf2DN8K"
      },
      "source": [
        "Conversely, to disable the default behavior of watching all `tf.Variables`, set `watch_accessed_variables=False` when creating the gradient tape. This calculation uses two variables, but only connects the gradient for one of the variables:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "7QPzwWvSEwIp"
      },
      "outputs": [],
      "source": [
        "x0 = tf.Variable(0.0)\n",
        "x1 = tf.Variable(10.0)\n",
        "\n",
        "with tf.GradientTape(watch_accessed_variables=False) as tape:\n",
        "  tape.watch(x1)\n",
        "  y0 = tf.math.sin(x0)\n",
        "  y1 = tf.nn.softplus(x1)\n",
        "  y = y0 + y1\n",
        "  ys = tf.reduce_sum(y)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TRduLbE1H2IJ"
      },
      "source": [
        "Since `GradientTape.watch` was not called on `x0`, no gradient is computed with respect to it:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "e6GM-3evH1Sz"
      },
      "outputs": [],
      "source": [
        "# dys/dx1 = exp(x1) / (1 + exp(x1)) = sigmoid(x1)\n",
        "grad = tape.gradient(ys, {'x0': x0, 'x1': x1})\n",
        "\n",
        "print('dy/dx0:', grad['x0'])\n",
        "print('dy/dx1:', grad['x1'].numpy())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2g1nKB6P-OnA"
      },
      "source": [
        "## Intermediate results\n",
        "\n",
        "You can also request gradients of the output with respect to intermediate values computed inside the `tf.GradientTape` context."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "7XaPRAwUyYms"
      },
      "outputs": [],
      "source": [
        "x = tf.constant(3.0)\n",
        "\n",
        "with tf.GradientTape() as tape:\n",
        "  tape.watch(x)\n",
        "  y = x * x\n",
        "  z = y * y\n",
        "\n",
        "# Use the tape to compute the gradient of z with respect to the\n",
        "# intermediate value y.\n",
        "# dz_dy = 2 * y and y = x ** 2 = 9\n",
        "print(tape.gradient(z, y).numpy())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ISkXuY7YzIcS"
      },
      "source": [
        "By default, the resources held by a `GradientTape` are released as soon as the `GradientTape.gradient` method is called. To compute multiple gradients over the same computation, create a gradient tape with `persistent=True`. This allows multiple calls to the `gradient` method as resources are released when the tape object is garbage collected. For example:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "zZaCm3-9zVCi"
      },
      "outputs": [],
      "source": [
        "x = tf.constant([1, 3.0])\n",
        "with tf.GradientTape(persistent=True) as tape:\n",
        "  tape.watch(x)\n",
        "  y = x * x\n",
        "  z = y * y\n",
        "\n",
        "print(tape.gradient(z, x).numpy())  # [4.0, 108.0] (4 * x**3 at x = [1.0, 3.0])\n",
        "print(tape.gradient(y, x).numpy())  # [2.0, 6.0] (2 * x at x = [1.0, 3.0])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "j8bv_jQFg6CN"
      },
      "outputs": [],
      "source": [
        "del tape   # Drop the reference to the tape"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "O_ZY-9BUB7vX"
      },
      "source": [
        "## Notes on performance\n",
        "\n",
        "* There is a tiny overhead associated with doing operations inside a gradient tape context. For most eager execution this will not be a noticeable cost, but you should still use tape context around the areas only where it is required.\n",
        "\n",
        "* Gradient tapes use memory to store intermediate results, including inputs and outputs, for use during the backwards pass.\n",
        "\n",
        "  For efficiency, some ops (like `ReLU`) don't need to keep their intermediate results and they are pruned during the forward pass. However, if you use `persistent=True` on your tape, *nothing is discarded* and your peak memory usage will be higher."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9dLBpZsJebFq"
      },
      "source": [
        "## Gradients of non-scalar targets"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7pldU9F5duP2"
      },
      "source": [
        "A gradient is fundamentally an operation on a scalar."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "qI0sDV_WeXBb"
      },
      "outputs": [],
      "source": [
        "x = tf.Variable(2.0)\n",
        "with tf.GradientTape(persistent=True) as tape:\n",
        "  y0 = x**2\n",
        "  y1 = 1 / x\n",
        "\n",
        "print(tape.gradient(y0, x).numpy())\n",
        "print(tape.gradient(y1, x).numpy())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "COEyYp34fxj4"
      },
      "source": [
        "Thus, if you ask for the gradient of multiple targets, the result for each source is:\n",
        "\n",
        "* The gradient of the sum of the targets, or equivalently\n",
        "* The sum of the gradients of each target."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "o4a6_YOcfWKS"
      },
      "outputs": [],
      "source": [
        "x = tf.Variable(2.0)\n",
        "with tf.GradientTape() as tape:\n",
        "  y0 = x**2\n",
        "  y1 = 1 / x\n",
        "\n",
        "print(tape.gradient({'y0': y0, 'y1': y1}, x).numpy())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uvP-mkBMgbym"
      },
      "source": [
        "Similarly, if the target(s) are not scalar the gradient of the sum is calculated:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "DArPWqsSh5un"
      },
      "outputs": [],
      "source": [
        "x = tf.Variable(2.)\n",
        "\n",
        "with tf.GradientTape() as tape:\n",
        "  y = x * [3., 4.]\n",
        "\n",
        "print(tape.gradient(y, x).numpy())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "flDbx68Zh5Lb"
      },
      "source": [
        "This makes it simple to take the gradient of the sum of a collection of losses, or the gradient of the sum of an element-wise loss calculation.\n",
        "\n",
        "If you need a separate gradient for each item, refer to [Jacobians](advanced_autodiff.ipynb#jacobians)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iwFswok8RAly"
      },
      "source": [
        "In some cases you can skip the Jacobian. For an element-wise calculation, the gradient of the sum gives the derivative of each element with respect to its input-element, since each element is independent:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "JQvk_jnMmTDS"
      },
      "outputs": [],
      "source": [
        "x = tf.linspace(-10.0, 10.0, 200+1)\n",
        "\n",
        "with tf.GradientTape() as tape:\n",
        "  tape.watch(x)\n",
        "  y = tf.nn.sigmoid(x)\n",
        "\n",
        "dy_dx = tape.gradient(y, x)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "e_f2QgDPmcPE"
      },
      "outputs": [],
      "source": [
        "plt.plot(x, y, label='y')\n",
        "plt.plot(x, dy_dx, label='dy/dx')\n",
        "plt.legend()\n",
        "_ = plt.xlabel('x')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6kADybtQzYj4"
      },
      "source": [
        "## Control flow\n",
        "\n",
        "Because a gradient tape records operations as they are executed, Python control flow is naturally handled (for example, `if` and `while` statements).\n",
        "\n",
        "Here a different variable is used on each branch of an `if`. The gradient only connects to the variable that was used:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ciFLizhrrjy7"
      },
      "outputs": [],
      "source": [
        "x = tf.constant(1.0)\n",
        "\n",
        "v0 = tf.Variable(2.0)\n",
        "v1 = tf.Variable(2.0)\n",
        "\n",
        "with tf.GradientTape(persistent=True) as tape:\n",
        "  tape.watch(x)\n",
        "  if x > 0.0:\n",
        "    result = v0\n",
        "  else:\n",
        "    result = v1**2 \n",
        "\n",
        "dv0, dv1 = tape.gradient(result, [v0, v1])\n",
        "\n",
        "print(dv0)\n",
        "print(dv1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HKnLaiapsjeP"
      },
      "source": [
        "Just remember that the control statements themselves are not differentiable, so they are invisible to gradient-based optimizers.\n",
        "\n",
        "Depending on the value of `x` in the above example, the tape either records `result = v0` or `result = v1**2`. The gradient with respect to `x` is always `None`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "8k05WmuAwPm7"
      },
      "outputs": [],
      "source": [
        "dx = tape.gradient(result, x)\n",
        "\n",
        "print(dx)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "egypBxISAHhx"
      },
      "source": [
        "## Cases where `gradient` returns `None`\n",
        "\n",
        "When a target is not connected to a source, `gradient` will return `None`.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "CU185WDM81Ut"
      },
      "outputs": [],
      "source": [
        "x = tf.Variable(2.)\n",
        "y = tf.Variable(3.)\n",
        "\n",
        "with tf.GradientTape() as tape:\n",
        "  z = y * y\n",
        "print(tape.gradient(z, x))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sZbKpHfBRJym"
      },
      "source": [
        "Here `z` is obviously not connected to `x`, but there are several less-obvious ways that a gradient can be disconnected."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eHDzDOiQ8xmw"
      },
      "source": [
        "### 1. Replaced a variable with a tensor\n",
        "\n",
        "In the section on [\"controlling what the tape watches\"](#watches) you saw that the tape will automatically watch a `tf.Variable` but not a `tf.Tensor`.\n",
        "\n",
        "One common error is to inadvertently replace a `tf.Variable` with a `tf.Tensor`, instead of using `Variable.assign` to update the `tf.Variable`. Here is an example:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "QPKY4Tn9zX7_"
      },
      "outputs": [],
      "source": [
        "x = tf.Variable(2.0)\n",
        "\n",
        "for epoch in range(2):\n",
        "  with tf.GradientTape() as tape:\n",
        "    y = x+1\n",
        "\n",
        "  print(type(x).__name__, \":\", tape.gradient(y, x))\n",
        "  x = x + 1   # This should be `x.assign_add(1)`"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3gwZKxgA97an"
      },
      "source": [
        "### 2. Did calculations outside of TensorFlow\n",
        "\n",
        "The tape can't record the gradient path if the calculation exits TensorFlow.\n",
        "For example:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "jmoLCDJb_yw1"
      },
      "outputs": [],
      "source": [
        "x = tf.Variable([[1.0, 2.0],\n",
        "                 [3.0, 4.0]], dtype=tf.float32)\n",
        "\n",
        "with tf.GradientTape() as tape:\n",
        "  x2 = x**2\n",
        "\n",
        "  # This step is calculated with NumPy\n",
        "  y = np.mean(x2, axis=0)\n",
        "\n",
        "  # Like most ops, reduce_mean will cast the NumPy array to a constant tensor\n",
        "  # using `tf.convert_to_tensor`.\n",
        "  y = tf.reduce_mean(y, axis=0)\n",
        "\n",
        "print(tape.gradient(y, x))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "p3YVfP3R-tp7"
      },
      "source": [
        "### 3. Took gradients through an integer or string\n",
        "\n",
        "Integers and strings are not differentiable. If a calculation path uses these data types there will be no gradient.\n",
        "\n",
        "Nobody expects strings to be differentiable, but it's easy to accidentally create an `int` constant or variable if you don't specify the `dtype`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "9jlHXHqfASU3"
      },
      "outputs": [],
      "source": [
        "x = tf.constant(10)\n",
        "\n",
        "with tf.GradientTape() as g:\n",
        "  g.watch(x)\n",
        "  y = x * x\n",
        "\n",
        "print(g.gradient(y, x))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RsdP_mTHX9L1"
      },
      "source": [
        "TensorFlow doesn't automatically cast between types, so, in practice, you'll often get a type error instead of a missing gradient."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WyAZ7C8qCEs6"
      },
      "source": [
        "### 4. Took gradients through a stateful object\n",
        "\n",
        "State stops gradients. When you read from a stateful object, the tape can only observe the current state, not the history that lead to it.\n",
        "\n",
        "A `tf.Tensor` is immutable. You can't change a tensor once it's created. It has a _value_, but no _state_. All the operations discussed so far are also stateless: the output of a `tf.matmul` only depends on its inputs.\n",
        "\n",
        "A `tf.Variable` has internal state—its value. When you use the variable, the state is read. It's normal to calculate a gradient with respect to a variable, but the variable's state blocks gradient calculations from going farther back. For example:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "C1tLeeRFE479"
      },
      "outputs": [],
      "source": [
        "x0 = tf.Variable(3.0)\n",
        "x1 = tf.Variable(0.0)\n",
        "\n",
        "with tf.GradientTape() as tape:\n",
        "  # Update x1 = x1 + x0.\n",
        "  x1.assign_add(x0)\n",
        "  # The tape starts recording from x1.\n",
        "  y = x1**2   # y = (x1 + x0)**2\n",
        "\n",
        "# This doesn't work.\n",
        "print(tape.gradient(y, x0))   #dy/dx0 = 2*(x1 + x0)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xKA92-dqF2r-"
      },
      "source": [
        "Similarly, `tf.data.Dataset` iterators and `tf.queue`s are stateful, and will stop all gradients on tensors that pass through them."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HHvcDGIbOj2I"
      },
      "source": [
        "## No gradient registered"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aoc-A6AxVqry"
      },
      "source": [
        "Some `tf.Operation`s are **registered as being non-differentiable** and will return `None`. Others have **no gradient registered**.\n",
        "\n",
        "The `tf.raw_ops` page shows which low-level ops have gradients registered.\n",
        "\n",
        "If you attempt to take a gradient through a float op that has no gradient registered the tape will throw an error instead of silently returning `None`. This way you know something has gone wrong.\n",
        "\n",
        "For example, the `tf.image.adjust_contrast` function wraps `raw_ops.AdjustContrastv2`, which could have a gradient but the gradient is not implemented:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "HSb20FXc_V0U"
      },
      "outputs": [],
      "source": [
        "image = tf.Variable([[[0.5, 0.0, 0.0]]])\n",
        "delta = tf.Variable(0.1)\n",
        "\n",
        "with tf.GradientTape() as tape:\n",
        "  new_image = tf.image.adjust_contrast(image, delta)\n",
        "\n",
        "try:\n",
        "  print(tape.gradient(new_image, [image, delta]))\n",
        "  assert False   # This should not happen.\n",
        "except LookupError as e:\n",
        "  print(f'{type(e).__name__}: {e}')\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pDoutjzATiEm"
      },
      "source": [
        "If you need to differentiate through this op, you'll either need to implement the gradient and register it (using `tf.RegisterGradient`) or re-implement the function using other ops."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GCTwc_dQXp2W"
      },
      "source": [
        "## Zeros instead of None"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TYDrVogA89eA"
      },
      "source": [
        "In some cases it would be convenient to get 0 instead of `None` for unconnected gradients.  You can decide what to return when you have unconnected gradients using the `unconnected_gradients` argument:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "U6zxk1sf9Ixx"
      },
      "outputs": [],
      "source": [
        "x = tf.Variable([2., 2.])\n",
        "y = tf.Variable(3.)\n",
        "\n",
        "with tf.GradientTape() as tape:\n",
        "  z = y**2\n",
        "print(tape.gradient(z, x, unconnected_gradients=tf.UnconnectedGradients.ZERO))"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [
        "Tce3stUlHN0L"
      ],
      "name": "autodiff.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}