In this guide, you will explore ways to compute gradients with TensorFlow, especially in eager execution.

Automatic Differentiation and Gradients

Automatic differentiation is useful and powerful for implementing machine learning algorithms such as backpropagation for training neural networks.

Computing gradients

To differentiate automatically, TensorFlow needs to:

  1. remember what operations happen in what order during the forward pass.
  2. Then TensorFlow traverses this list of operations in reverse order to compute gradients during the backward pass.

Gradient tapes

TensorFlow provides the tf.GradientTape API for automatic differentiation; that is:

0. A gradient is fundamentally an operation on a scalar.

The gradient with respect to each source has the shape of the source;

Similarly, if the target(s) are not scalar the gradient of the sum is calculated:

  1. computing the gradient of a computation with respect to some inputs, usually tf.Variables.
  2. TensorFlow "records" relevant operations executed inside the context of a tf.GradientTape onto a "tape".
  3. TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation

    Once you've recorded some operations, use

    GradientTape.gradient(target, sources) to calculate the gradient of some target(often a loss) relative to some source(often the model's variables):
  4. To get the gradient with respect to two or more variables, you can pass a list of those variables as sources to the gradient method.

    The tape is flexible about how sources are passed and will accept any nested combination of lists or dictionaries and return the gradient structured the same way (see tf.nest).
  5. we can passing a dictionary of variables as source: grad_w = tape.gradient( cost, {'w': w, 'b': b} )['w']
  6. By default, the resources held by a GradientTape are released as soon as the GradientTape.gradient method is called.

    so a non-persistent GradientTape can only be used to compute one set of gradients (or jacobians),

    To compute multiple gradients over the same computation, create a gradient tape with persistent=True.

    This allows multiple calls to the gradient method as resources are released when the tape object is garbage collected.
  7. Only call GradientTape.gradient inside the context if you actually want to trace the gradient in order to compute higher order derivatives.

    since calling GradientTape.gradient on a persistent tape inside its context is significantly less efficient than calling it outside the context (it causes the gradient ops to be recorded on the tape, leading to increased CPU and memory usage).
  8. examples:
def comp_gradient_dy_dx(x, persistent=False):
k = np.array(range(20)).reshape(int(x.shape[-1]), int(20/x.shape[-1]))
with tf.GradientTape() as tape: #recording ops.
y = (x**2) @ k
# dy = 2x * dx
dy_dx = tape.gradient(y, x)
return y, dy_dx print("y=%r\ndydx=%r\n" % comp_gradient_dy_dx(tf.Variable([[1.0, 2.0], [3.0, 4.0]])))
# y, [ [2.0, 4.0], [6.0, 8.0] ]
print("y=%r\ndydx=%r\n" % comp_gradient_dy_dx(tf.Variable([[1.0, 2.0, 3, 4], [3.0, 4.0, 5, 6]])))
# y, [ [20.0, 140.0, 360.0, 680.0], [60.0, 280.0, 600.0, 1020.0] ] def linear_model_gradients_DcDw_DcDb(w, x, b):
with tf.GradientTape(persistent=True) as tape:
y = x @ w + b
cost = tf.reduce_mean(y**2)
# here cost is a reduced scalar value.
[dc_dw, dc_db] = tape.gradient(cost, [w, b]) #
return [cost, dc_dw, dc_db] w = tf.Variable(tf.random.normal((3, 2)), name='w')
b = tf.Variable(tf.zeros(2, dtype=tf.float32), name='b')
x = [[1., 2., 3.]] cost, dc_dw, dc_db = linear_model_gradients_DcDw_DcDb(w, x, b)
print( "LMG:\n COST:%r\n DcDw:%r\n DcDb:%r\n" % (
cost.numpy(), dc_dw.numpy(), dc_db.numpy()
))

Gradients with respect to a model

It's common to collect tf.Variables into a tf.Module or one of its subclasses (layers.Layer, keras.Model) for checkpointing and exporting.

In most cases, you will want to calculate gradients with respect to a model's trainable variables.

Since all subclasses of tf.Module aggregate their variables in the Module. trainable_variables property, you can calculate these gradients in a few lines of code:

layer = tf.keras.layers.Dense(2, activation='relu')
x = tf.constant([[1., 2., 3.]])
with tf.GradientTape() as tape:
# Forward pass
y = layer(x)
loss = tf.reduce_mean(y**2)
# Calculate gradients with respect to every trainable variable
grad = tape.gradient(loss, layer.trainable_variables) for var, g in zip(layer.trainable_variables, grad):
print(f'{var.name}, shape: {g.shape}')

SciTech-BigDataAIML-Tensorflow-Introduction to Gradients and Automatic Differentiation的更多相关文章

  1. 吴恩达课后习题第二课第三周:TensorFlow Introduction

    目录 第二课第三周:TensorFlow Introduction Introduction to TensorFlow 1 - Packages 1.1 - Checking TensorFlow ...

  2. (转)自动微分(Automatic Differentiation)简介——tensorflow核心原理

    现代深度学习系统中(比如MXNet, TensorFlow等)都用到了一种技术——自动微分.在此之前,机器学习社区中很少发挥这个利器,一般都是用Backpropagation进行梯度求解,然后进行SG ...

  3. pytorch学习-AUTOGRAD: AUTOMATIC DIFFERENTIATION自动微分

    参考:https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#sphx-glr-beginner-blitz-autog ...

  4. [TensorFlow] Introduction to TensorFlow Datasets and Estimators

    Datasets and Estimators are two key TensorFlow features you should use: Datasets: The best practice ...

  5. PyTorch Tutorials 2 AUTOGRAD: AUTOMATIC DIFFERENTIATION

    %matplotlib inline Autograd: 自动求导机制 PyTorch 中所有神经网络的核心是 autograd 包. 我们先简单介绍一下这个包,然后训练第一个简单的神经网络. aut ...

  6. PyTorch 介绍 | AUTOMATIC DIFFERENTIATION WITH TORCH.AUTOGRAD

    训练神经网络时,最常用的算法就是反向传播.在该算法中,参数(模型权重)会根据损失函数关于对应参数的梯度进行调整. 为了计算这些梯度,PyTorch内置了名为 torch.autograd 的微分引擎. ...

  7. 机器学习资源汇总----来自于tensorflow中文社区

    新手入门完整教程进阶指南 API中文手册精华文章TF社区 INTRODUCTION 1. 新手入门 1.1. 介绍 1.2. 下载及安装 1.3. 基本用法 2. 完整教程 2.1. 总览 2.2.  ...

  8. Effective Tensorflow[转]

    Effective TensorFlow Table of Contents TensorFlow Basics Understanding static and dynamic shapes Sco ...

  9. 使用腾讯云 GPU 学习深度学习系列之二:Tensorflow 简明原理【转】

    转自:https://www.qcloud.com/community/article/598765?fromSource=gwzcw.117333.117333.117333 这是<使用腾讯云 ...

  10. Tensorflow word2vec+manage experiments

    Lecture note 5: word2vec + manage experiments Word2vec Most of you are probably already familiar wit ...

随机推荐

  1. Vue(10)——Vue组件二(data选项、局部组件、组件通信)

    Vue组件二--data选项.局部组件.组件通信 data选项 data选项用于储存组件数据 与实例data差别 必须存储在有返回值的函数当中 数据设置在返回值对象里 1.方式一 data:funct ...

  2. 从零开始的PHP原生反序列化漏洞

    1.写在前面 OK 兄弟们,这几天一直在面试,发现很多 HR 喜欢问反序列化相关的内容,今天咱们就从最简单的 PHP 原生反序列化入手,带大家入门反序列化 2.PHP 序列化 在 PHP 中,有反序列 ...

  3. Join 实现 2 表数据联动

    最近在做一个简单的报表, 用的工具是帆软报表, 一开始觉得有点low, 现在看看还行, 除了界面真的太丑外, 其它还要, 这种大量要使用 sql 的方式, 我觉得是非常灵活高效的, I like . ...

  4. Excel 数据显示到网页

    平时的, 数据分析过程, 会涉及很多表或者, 计算过程嘛, 有的时候, 需要将数据表啥的给同事查看和共享一下, 直接发送, 似乎不够优雅. 直接展示在网页往, 共小伙伴们查看和下载, 不就很香嘛. 其 ...

  5. 解析V8引擎底层原理,探究其优异性能之谜

    @charset "UTF-8"; .markdown-body { line-height: 1.75; font-weight: 400; font-size: 15px; o ...

  6. Ubuntu安装部署Zabbix网络监控平台和设备配置添加

    概述 Zabbix 由 Alexei Vladishev 创建,目前由 Zabbix SIA 主导开发和支持. Zabbix 是一个企业级的开源分布式监控解决方案. Zabbix 是一款监控众多参数的 ...

  7. System.Drawing.Point与System.Windows.Point的异同

    在C#中,System.Drawing.Point 和 System.Windows.Point 是两个不同的结构,分别属于不同的命名空间,用于表示二维平面中的点.尽管它们的功能相似,但在使用场景和实 ...

  8. CSS 之overflow属性简结

    CSS的overflow 属性用来处理一个元素的尺寸超出其容器尺寸的情况.当一个元素包含的内容超粗自身的大小时,就会发生内容溢出,这种情况,可以对内容进行"裁剪",只让一部分内容可 ...

  9. 【公众号搬运】React-Native开发鸿蒙NEXT(7)-上线

    .markdown-body { line-height: 1.75; font-weight: 400; font-size: 16px; overflow-x: hidden; color: rg ...

  10. ArkUI-X中Plugin生命周期开发指南

    ArkUI-X插件用于拓展ArkUI应用的能力,提供管理插件生命周期的能力.本文主要介绍Android平台的ArkUI-X插件生命周期的使用. Android平台创建ArkUI-X插件生命周期 在An ...