SciTech-BigDataAIML-Tensorflow-Introduction to Gradients and Automatic Differentiation

In this guide, you will explore ways to compute gradients with TensorFlow, especially in eager execution.

Automatic Differentiation and Gradients

Automatic differentiation is useful and powerful for implementing machine learning algorithms such as backpropagation for training neural networks.

Computing gradients

To differentiate automatically, TensorFlow needs to:

remember what operations happen in what order during the forward pass.
Then TensorFlow traverses this list of operations in reverse order to compute gradients during the backward pass.

Gradient tapes

TensorFlow provides the tf.GradientTape API for automatic differentiation; that is:

0. A gradient is fundamentally an operation on a scalar.

The gradient with respect to each source has the shape of the source;

Similarly, if the target(s) are not scalar the gradient of the sum is calculated:

computing the gradient of a computation with respect to some inputs, usually tf.Variables.
TensorFlow "records" relevant operations executed inside the context of a tf.GradientTape onto a "tape".
TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation

Once you've recorded some operations, use

GradientTape.gradient(target, sources) to calculate the gradient of some target(often a loss) relative to some source(often the model's variables):
To get the gradient with respect to two or more variables, you can pass a list of those variables as sources to the gradient method.

The tape is flexible about how sources are passed and will accept any nested combination of lists or dictionaries and return the gradient structured the same way (see tf.nest).
we can passing a dictionary of variables as source: grad_w = tape.gradient( cost, {'w': w, 'b': b} )['w']
By default, the resources held by a GradientTape are released as soon as the GradientTape.gradient method is called.

so a non-persistent GradientTape can only be used to compute one set of gradients (or jacobians),

To compute multiple gradients over the same computation, create a gradient tape with persistent=True.

This allows multiple calls to the gradient method as resources are released when the tape object is garbage collected.
Only call GradientTape.gradient inside the context if you actually want to trace the gradient in order to compute higher order derivatives.

since calling GradientTape.gradient on a persistent tape inside its context is significantly less efficient than calling it outside the context (it causes the gradient ops to be recorded on the tape, leading to increased CPU and memory usage).
examples:

def comp_gradient_dy_dx(x, persistent=False):

  k = np.array(range(20)).reshape(int(x.shape[-1]), int(20/x.shape[-1]))

  with tf.GradientTape() as tape: #recording ops.

    y = (x**2) @ k

  # dy = 2x * dx

  dy_dx = tape.gradient(y, x)

  return y, dy_dx

print("y=%r\ndydx=%r\n" % comp_gradient_dy_dx(tf.Variable([[1.0, 2.0], [3.0, 4.0]])))

# y, [ [2.0, 4.0], [6.0, 8.0] ]

print("y=%r\ndydx=%r\n" % comp_gradient_dy_dx(tf.Variable([[1.0, 2.0, 3, 4], [3.0, 4.0, 5, 6]])))

# y, [ [20.0, 140.0, 360.0, 680.0],  [60.0, 280.0, 600.0, 1020.0] ]

def linear_model_gradients_DcDw_DcDb(w, x, b):

  with tf.GradientTape(persistent=True) as tape:

      y = x @ w + b

      cost = tf.reduce_mean(y**2)

      # here cost is a reduced scalar value.

      [dc_dw, dc_db] = tape.gradient(cost, [w, b]) #

      return  [cost, dc_dw, dc_db]

w = tf.Variable(tf.random.normal((3, 2)), name='w')

b = tf.Variable(tf.zeros(2, dtype=tf.float32), name='b')

x = [[1., 2., 3.]]

cost, dc_dw, dc_db = linear_model_gradients_DcDw_DcDb(w, x, b)

print( "LMG:\n  COST:%r\n  DcDw:%r\n  DcDb:%r\n" % (

cost.numpy(), dc_dw.numpy(), dc_db.numpy()

))

Gradients with respect to a model

It's common to collect tf.Variables into a tf.Module or one of its subclasses (layers.Layer, keras.Model) for checkpointing and exporting.

In most cases, you will want to calculate gradients with respect to a model's trainable variables.

Since all subclasses of tf.Module aggregate their variables in the Module. trainable_variables property, you can calculate these gradients in a few lines of code:

layer = tf.keras.layers.Dense(2, activation='relu')

x = tf.constant([[1., 2., 3.]])

with tf.GradientTape() as tape:

    # Forward pass

    y = layer(x)

    loss = tf.reduce_mean(y**2)

# Calculate gradients with respect to every trainable variable

grad = tape.gradient(loss, layer.trainable_variables)

for var, g in zip(layer.trainable_variables, grad):

  print(f'{var.name}, shape: {g.shape}')

SciTech-BigDataAIML-Tensorflow-Introduction to Gradients and Automatic Differentiation的更多相关文章

吴恩达课后习题第二课第三周：TensorFlow Introduction
目录第二课第三周:TensorFlow Introduction Introduction to TensorFlow 1 - Packages 1.1 - Checking TensorFlow ...
（转）自动微分(Automatic Differentiation)简介——tensorflow核心原理
现代深度学习系统中(比如MXNet, TensorFlow等)都用到了一种技术——自动微分.在此之前,机器学习社区中很少发挥这个利器,一般都是用Backpropagation进行梯度求解,然后进行SG ...
pytorch学习-AUTOGRAD: AUTOMATIC DIFFERENTIATION自动微分
参考:https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#sphx-glr-beginner-blitz-autog ...
[TensorFlow] Introduction to TensorFlow Datasets and Estimators
Datasets and Estimators are two key TensorFlow features you should use: Datasets: The best practice ...
PyTorch Tutorials 2 AUTOGRAD: AUTOMATIC DIFFERENTIATION
%matplotlib inline Autograd: 自动求导机制 PyTorch 中所有神经网络的核心是 autograd 包. 我们先简单介绍一下这个包,然后训练第一个简单的神经网络. aut ...
PyTorch 介绍 | AUTOMATIC DIFFERENTIATION WITH TORCH.AUTOGRAD
训练神经网络时,最常用的算法就是反向传播.在该算法中,参数(模型权重)会根据损失函数关于对应参数的梯度进行调整. 为了计算这些梯度,PyTorch内置了名为 torch.autograd 的微分引擎. ...
机器学习资源汇总----来自于tensorflow中文社区
新手入门完整教程进阶指南 API中文手册精华文章TF社区 INTRODUCTION 1. 新手入门 1.1. 介绍 1.2. 下载及安装 1.3. 基本用法 2. 完整教程 2.1. 总览 2.2. ...
Effective Tensorflow[转]
Effective TensorFlow Table of Contents TensorFlow Basics Understanding static and dynamic shapes Sco ...
使用腾讯云 GPU 学习深度学习系列之二：Tensorflow 简明原理【转】
转自:https://www.qcloud.com/community/article/598765?fromSource=gwzcw.117333.117333.117333 这是<使用腾讯云 ...
Tensorflow word2vec+manage experiments
Lecture note 5: word2vec + manage experiments Word2vec Most of you are probably already familiar wit ...

随机推荐

【大前端攻城狮之路】用 Typewriter-SSE 实现打字机效果
在现代 Web 开发中,实现动态文本渲染的需求日益增多.无论是聊天应用.实时通知,还是交互式界面,打字机风格的文本渲染都能显著提升用户体验.最近新写了一个开源的 NPM 包--Typewriter-S ...
smail log插桩（模板）
即插即用后面都用hook了,但是为了方便,还是分享下吧 Log const-string v0, "MYTAG" const-string v1, "Message&q ...
【代码】Python3｜Requests 库怎么继承 Selenium 的 Headers （2024，Chrome）
本文使用的版本: Chrome 124 Python 12 Selenium 4.19.0 版本过旧可能会出现问题,但只要别差异太大,就可以看本文,因为本文对新老版本都有讲解. 文章目录 1 难点解析 ...
关于正点原子input子系统，驱动中按键中断只检测了上升或下降沿却可以实现连按（EV_REP）的原因
问题在学习到Linux内核input子系统时,产生了一个疑惑.可以看到,我们改造按键中断驱动程序(请见keyinputdriver.c(内核驱动代码)),通过检测按键的上升沿和下降沿,在中断处理函数 ...
Linux多线程-线程同步
线程同步当多个线程同时对一个共享数据进行操作时,会导致数据竞争,下面例子展示了数据竞争的情况: 1 #include <pthread.h> 2 #include <stdio.h ...
使用qt+网上的api做股票查看器
股票球,采用的是qt和新浪的api来设计,目前只有查看功能,2021年1月17日开始开发,后续可能会持续更新(可能跟心情有关) k线图在 Windows下获取数据有问题,还没来得及做,目前在Linux ...
L1-3、Promp常见任务类型与写法模板
--掌握任务类型,写 Prompt 就像套模板想把 AI 当成好用的工具,第一步不是写 Prompt,而是识别任务类型.只有你先知道"我到底要它干嘛",才能说出"它该怎 ...
elasticsearch分词
阅读说明: 1.如果有排版格式问题,请移步https://www.yuque.com/mrhuang-ire4d/oufb8x/gmzl30v8ofqg3ua3?singleDoc# <elas ...
JuiceFS v1.3-Beta2：集成 Apache Ranger，实现更精细化的权限控制
在大数据场景中,文件系统和应用组件的权限管理至关重要.在最新发布的 JuiceFS 社区版 v1.3-Beta 2 中,JuiceFS 引入了与 Apache Ranger 的集成,提供了更为灵活和细 ...
dev-cpp简单使用教程
最近在准备蓝桥的比赛,而蓝桥要用dev-cpp,但自己第一次接触,不太会用.防止大家出现和我一样的问题,所以简单分享一下如何使用 1.打开软件界面,弹窗只是一些使用技巧,直接关闭就好 2.文件-新建文 ...

SciTech-BigDataAIML-Tensorflow-Introduction to Gradients and Automatic Differentiation

SciTech-BigDataAIML-Tensorflow-Introduction to Gradients and Automatic Differentiation的更多相关文章

随机推荐

热门专题