SciTech-BigDataAIML-Tensorflow-Introduction to Gradients and Automatic Differentiation
In this guide, you will explore ways to compute gradients with TensorFlow, especially in eager execution.
Automatic Differentiation and Gradients
Automatic differentiation is useful and powerful for implementing machine learning algorithms such as backpropagation for training neural networks.
Computing gradients
To differentiate automatically, TensorFlow needs to:
- remember what operations happen in what order during the forward pass.
- Then TensorFlow traverses this list of operations in reverse order
to compute gradients during the backward pass.
Gradient tapes
TensorFlow provides the tf.GradientTape API for automatic differentiation; that is:
0. A gradient is fundamentally an operation on a scalar.
The gradient with respect to each source has the shape of the source;
Similarly, if the target(s) are not scalar the gradient of the sum is calculated:
- computing the gradient of a computation
with respect to some inputs, usuallytf.Variables. - TensorFlow "records" relevant operations executed inside the context of a
tf.GradientTapeonto a "tape". - TensorFlow
then uses that tapeto compute the gradients of a "recorded" computation using reverse mode differentiation
Once you've recorded some operations, use
GradientTape.gradient(target, sources)to calculate the gradient of some target(often a loss) relative to some source(often the model's variables): - To get the gradient with respect to two or more variables, you can pass a list of those variables as sources to the gradient method.
The tape is flexible about how sources are passed and will accept any nested combination of lists or dictionaries and return the gradient structured the same way (see tf.nest). - we can passing a dictionary of variables as source: grad_w = tape.gradient( cost, {'w': w, 'b': b} )['w']
- By default, the resources held by a
GradientTapeare released as soon as the GradientTape.gradient method is called.
so a non-persistentGradientTapecan only be used to computeone set of gradients(orjacobians),
To compute multiple gradients over the same computation, create a gradient tape withpersistent=True.
This allows multiple calls tothe gradient methodas resources are released when the tape object is garbage collected. - Only call
GradientTape.gradientinside the context if you actually want to trace the gradient in order to compute higher order derivatives.
since callingGradientTape.gradienton a persistent tape inside its context is significantly less efficient than calling it outside the context (it causes the gradient ops to be recorded on the tape, leading to increased CPU and memory usage). - examples:
def comp_gradient_dy_dx(x, persistent=False):
k = np.array(range(20)).reshape(int(x.shape[-1]), int(20/x.shape[-1]))
with tf.GradientTape() as tape: #recording ops.
y = (x**2) @ k
# dy = 2x * dx
dy_dx = tape.gradient(y, x)
return y, dy_dx
print("y=%r\ndydx=%r\n" % comp_gradient_dy_dx(tf.Variable([[1.0, 2.0], [3.0, 4.0]])))
# y, [ [2.0, 4.0], [6.0, 8.0] ]
print("y=%r\ndydx=%r\n" % comp_gradient_dy_dx(tf.Variable([[1.0, 2.0, 3, 4], [3.0, 4.0, 5, 6]])))
# y, [ [20.0, 140.0, 360.0, 680.0], [60.0, 280.0, 600.0, 1020.0] ]
def linear_model_gradients_DcDw_DcDb(w, x, b):
with tf.GradientTape(persistent=True) as tape:
y = x @ w + b
cost = tf.reduce_mean(y**2)
# here cost is a reduced scalar value.
[dc_dw, dc_db] = tape.gradient(cost, [w, b]) #
return [cost, dc_dw, dc_db]
w = tf.Variable(tf.random.normal((3, 2)), name='w')
b = tf.Variable(tf.zeros(2, dtype=tf.float32), name='b')
x = [[1., 2., 3.]]
cost, dc_dw, dc_db = linear_model_gradients_DcDw_DcDb(w, x, b)
print( "LMG:\n COST:%r\n DcDw:%r\n DcDb:%r\n" % (
cost.numpy(), dc_dw.numpy(), dc_db.numpy()
))
Gradients with respect to a model
It's common to collect tf.Variables into a tf.Module or one of its subclasses (layers.Layer, keras.Model) for checkpointing and exporting.
In most cases, you will want to calculate gradients with respect to a model's trainable variables.
Since all subclasses of tf.Module aggregate their variables in the Module. trainable_variables property, you can calculate these gradients in a few lines of code:
layer = tf.keras.layers.Dense(2, activation='relu')
x = tf.constant([[1., 2., 3.]])
with tf.GradientTape() as tape:
# Forward pass
y = layer(x)
loss = tf.reduce_mean(y**2)
# Calculate gradients with respect to every trainable variable
grad = tape.gradient(loss, layer.trainable_variables)
for var, g in zip(layer.trainable_variables, grad):
print(f'{var.name}, shape: {g.shape}')
SciTech-BigDataAIML-Tensorflow-Introduction to Gradients and Automatic Differentiation的更多相关文章
- 吴恩达课后习题第二课第三周:TensorFlow Introduction
目录 第二课第三周:TensorFlow Introduction Introduction to TensorFlow 1 - Packages 1.1 - Checking TensorFlow ...
- (转)自动微分(Automatic Differentiation)简介——tensorflow核心原理
现代深度学习系统中(比如MXNet, TensorFlow等)都用到了一种技术——自动微分.在此之前,机器学习社区中很少发挥这个利器,一般都是用Backpropagation进行梯度求解,然后进行SG ...
- pytorch学习-AUTOGRAD: AUTOMATIC DIFFERENTIATION自动微分
参考:https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#sphx-glr-beginner-blitz-autog ...
- [TensorFlow] Introduction to TensorFlow Datasets and Estimators
Datasets and Estimators are two key TensorFlow features you should use: Datasets: The best practice ...
- PyTorch Tutorials 2 AUTOGRAD: AUTOMATIC DIFFERENTIATION
%matplotlib inline Autograd: 自动求导机制 PyTorch 中所有神经网络的核心是 autograd 包. 我们先简单介绍一下这个包,然后训练第一个简单的神经网络. aut ...
- PyTorch 介绍 | AUTOMATIC DIFFERENTIATION WITH TORCH.AUTOGRAD
训练神经网络时,最常用的算法就是反向传播.在该算法中,参数(模型权重)会根据损失函数关于对应参数的梯度进行调整. 为了计算这些梯度,PyTorch内置了名为 torch.autograd 的微分引擎. ...
- 机器学习资源汇总----来自于tensorflow中文社区
新手入门完整教程进阶指南 API中文手册精华文章TF社区 INTRODUCTION 1. 新手入门 1.1. 介绍 1.2. 下载及安装 1.3. 基本用法 2. 完整教程 2.1. 总览 2.2. ...
- Effective Tensorflow[转]
Effective TensorFlow Table of Contents TensorFlow Basics Understanding static and dynamic shapes Sco ...
- 使用腾讯云 GPU 学习深度学习系列之二:Tensorflow 简明原理【转】
转自:https://www.qcloud.com/community/article/598765?fromSource=gwzcw.117333.117333.117333 这是<使用腾讯云 ...
- Tensorflow word2vec+manage experiments
Lecture note 5: word2vec + manage experiments Word2vec Most of you are probably already familiar wit ...
随机推荐
- 【渗透 Tips】解决Edge的IE模式下无法抓包情况
问题说明 在日常渗透中往往避免不了站点的环境适配问题,有一些站点只能使用IE模式访问,此时便会想着可能使用内置proxy插件代理至抓包软件即可,事实上这并不能很好解决. 如上图所示,即使挂上了yaki ...
- 痞子衡嵌入式:不处理i.MXRT1064片内Flash的RESET#引脚可能会导致无法启动或程序跑飞
大家好,我是痞子衡,是正经搞技术的痞子.今天痞子衡给大家介绍的是i.MXRT1064片内Flash的RESET#引脚对程序启动和运行的影响. 上一篇文章 <i.MXRT1024/1064片内4M ...
- MySQL高可用搭建方案之MHA
MHA架构介绍 MHA是Master High Availability的缩写,它是目前MySQL高可用方面的一个相对成熟的解决方案,其核心是使用perl语言编写的一组脚本,是一套优秀的作为MySQL ...
- 【.NET必读】RabbitMQ 4.0+重大变更!C#开发者必须掌握的6大升级要点
RabbitMQ 作为一款广受欢迎的消息队列中间件,近年来从 3.x 版本升级到 4.0+,带来了显著的功能增强和架构调整.与此同时,其官方 C# 客户端也从 6.x 版本跃升至 7.0,引入了全新的 ...
- TVM:Schedule的理解
schedule与计算逻辑分离是自动代码生成技术的核心概念,由MIT CASIL组的Jonathan Ragan-Kelley在2012年发表在SIGGRAPH上的文章率先提出,然后在2013年发表在 ...
- [Java/模板渲染引擎/技术选型] 模板引擎-技术调研
概述: 模板渲染引擎 := 模板引擎 为什么要使用[模板(渲染)引擎]?模板(渲染)引擎的作用? 模板引擎可以让(网站)程序实现界面与数据分离,业务代码与逻辑代码的分离,大大提升了开发效率,良好的设计 ...
- CSP-S 2020全国开放赛前冲刺模拟训练题1信友队集训队员周镇东出题 小结
题目&官方题解下载 我的题解 CSP-S 2020模拟训练题1-信友队T1 四平方和 CSP-S 2020模拟训练题1-信友队T2 挑战NPC 我仍未看懂那天所看到的T3的题解 CSP-S 2 ...
- 基于AI来汉化Joomla扩展的尝试
之前Joomla中文网的汉化平台使用Goolge翻译API和百度翻译API来实现自动汉化,这种方案存在很大的一个问题就是没有足够的上下文支持使得翻译的结果并不理想,另外,API接口处理包含HTML字符 ...
- 【中英】【吴恩达课后编程作业】Course 4 -卷积神经网络 - 第四周作业
[中文][吴恩达课后编程作业]Course 4 - 卷积神经网络 - 第四周作业 - 人脸识别与神经风格转换 上一篇:[课程4 - 第四周测验]※※※※※ [回到目录]※※※※※下一篇:[待撰写-课程 ...
- vLLM部署实战重难点
Kubernetes + 容器化部署 vLLM官方docker镜像: vllm/vllm-openai 这是官方提供的 Docker 镜像,可以用来快速部署 vLLM 服务,便于容器化管理. 实战: ...