Debugging TensorFlow models 调试 TensorFlow 模型
Debugging TensorFlow models
Symbolic nature of TensorFlow makes it relatively more difficult to debug TensorFlow code compared to regular python code. Here we introduce a number of tools included with TensorFlow that make debugging much easier.
Probably the most common error one can make when using TensorFlow is passing Tensors of wrong shape to ops. Many TensorFlow ops can operate on tensors of different ranks and shapes. This can be convenient when using the API, but may lead to extra headache when things go wrong.
For example, consider the tf.matmul op, it can multiply two matrices:
a = tf.random_uniform([2, 3])
b = tf.random_uniform([3, 4])
c = tf.matmul(a, b) # c is a tensor of shape [2, 4]
But the same function also does batch matrix multiplication:
a = tf.random_uniform([10, 2, 3])
b = tf.random_uniform([10, 3, 4])
tf.matmul(a, b) # c is a tensor of shape [10, 2, 4]
Another example that we talked about before in the broadcasting section is add operation which supports broadcasting:
a = tf.constant([[1.], [2.]])
b = tf.constant([1., 2.])
c = a + b # c is a tensor of shape [2, 2]
Validating your tensors with tf.assert* ops
One way to reduce the chance of unwanted behavior is to explicitly verify the rank or shape of intermediate tensors with tf.assert* ops.
a = tf.constant([[1.], [2.]])
b = tf.constant([1., 2.])
check_a = tf.assert_rank(a, 1) # This will raise an InvalidArgumentError exception
check_b = tf.assert_rank(b, 1)
with tf.control_dependencies([check_a, check_b]):
c = a + b # c is a tensor of shape [2, 2]
Remember that assertion nodes like other operations are part of the graph and if not evaluated would get pruned during Session.run(). So make sure to create explicit dependencies to assertion ops, to force TensorFlow to execute them.
You can also use assertions to validate the value of tensors at runtime:
check_pos = tf.assert_positive(a)
See the official docs for a full list of assertion ops.
Logging tensor values with tf.Print
Another useful built-in function for debugging is tf.Print which logs the given tensors to the standard error:
input_copy = tf.Print(input, tensors_to_print_list)
Note that tf.Print returns a copy of its first argument as output. One way to force tf.Print to run is to pass its output to another op that gets executed. For example if we want to print the value of tensors a and b before adding them we could do something like this:
a = ...
b = ...
a = tf.Print(a, [a, b])
c = a + b
Alternatively we could manually define a control dependency.
Check your gradients with tf.compute_gradient_error
Not all the operations in TensorFlow come with gradients, and it's easy to unintentionally build graphs for which TensorFlow can not compute the gradients.
Let's look at an example:
import tensorflow as tf
def non_differentiable_entropy(logits):
probs = tf.nn.softmax(logits)
return tf.nn.softmax_cross_entropy_with_logits(labels=probs, logits=logits)
w = tf.get_variable('w', shape=[5])
y = -non_differentiable_entropy(w)
opt = tf.train.AdamOptimizer()
train_op = opt.minimize(y)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(10000):
sess.run(train_op)
print(sess.run(tf.nn.softmax(w)))
We are using tf.nn.softmax_cross_entropy_with_logits to define entropy over a categorical distribution. We then use Adam optimizer to find the weights with maximum entropy. If you have passed a course on information theory, you would know that uniform distribution contains maximum entropy. So you would expect for the result to be [0.2, 0.2, 0.2, 0.2, 0.2]. But if you run this you may get unexpected results like this:
[ 0.34081486 0.24287023 0.23465775 0.08935683 0.09230034]
It turns out tf.nn.softmax_cross_entropy_with_logits has undefined gradients with respect to labels! But how may we spot this if we didn't know?
Fortunately for us TensorFlow comes with a numerical differentiator that can be used to find symbolic gradient errors. Let's see how we can use it:
with tf.Session():
diff = tf.test.compute_gradient_error(w, [5], y, [])
print(diff)
If you run this, you would see that the difference between the numerical and symbolic gradients are pretty high (0.06 - 0.1 in my tries).
Now let's fix our function with a differentiable version of the entropy and check again:
import tensorflow as tf
import numpy as np
def entropy(logits, dim=-1):
probs = tf.nn.softmax(logits, dim)
nplogp = probs * (tf.reduce_logsumexp(logits, dim, keep_dims=True) - logits)
return tf.reduce_sum(nplogp, dim)
w = tf.get_variable('w', shape=[5])
y = -entropy(w)
print(w.get_shape())
print(y.get_shape())
with tf.Session() as sess:
diff = tf.test.compute_gradient_error(w, [5], y, [])
print(diff)
The difference should be ~0.0001 which looks much better.
Now if you run the optimizer again with the correct version you can see the final weights would be:
[ 0.2 0.2 0.2 0.2 0.2]
which are exactly what we wanted.
TensorFlow summaries, and tfdbg (TensorFlow Debugger) are other tools that can be used for debugging. Please refer to the official docs to learn more.
更多教程:http://www.tensorflownews.com/
Debugging TensorFlow models 调试 TensorFlow 模型的更多相关文章
- 移动端目标识别(3)——使用TensorFlow Lite将tensorflow模型部署到移动端(ssd)之Running on mobile with TensorFlow Lite (写的很乱,回头更新一个简洁的版本)
承接移动端目标识别(2) 使用TensorFlow Lite在移动设备上运行 在本节中,我们将向您展示如何使用TensorFlow Lite获得更小的模型,并允许您利用针对移动设备优化 ...
- tensorflow tfdbg 调试手段
https://blog.csdn.net/gubenpeiyuan/article/details/82710163 TensorFlow 调试程序 tfdbg 是 TensorFlow 的专用调试 ...
- tensorflow学习笔记2:c++程序静态链接tensorflow库加载模型文件
首先需要搞定tensorflow c++库,搜了一遍没有找到现成的包,于是下载tensorflow的源码开始编译: tensorflow的contrib中有一个makefile项目,极大的简化的接下来 ...
- 移动端目标识别(1)——使用TensorFlow Lite将tensorflow模型部署到移动端(ssd)之TensorFlow Lite简介
平时工作就是做深度学习,但是深度学习没有落地就是比较虚,目前在移动端或嵌入式端应用的比较实际,也了解到目前主要有 caffe2,腾讯ncnn,tensorflow,因为工作用tensorflow比较多 ...
- 移动端目标识别(2)——使用TENSORFLOW LITE将TENSORFLOW模型部署到移动端(SSD)之TF Lite Developer Guide
TF Lite开发人员指南 目录: 1 选择一个模型 使用一个预训练模型 使用自己的数据集重新训练inception-V3,MovileNet 训练自己的模型 2 转换模型格式 转换tf.GraphD ...
- 【TensorFlow】基于ssd_mobilenet模型实现目标检测
最近工作的项目使用了TensorFlow中的目标检测技术,通过训练自己的样本集得到模型来识别游戏中的物体,在这里总结下. 本文介绍在Windows系统下,使用TensorFlow的object det ...
- 【6】TensorFlow光速入门-python模型转换为tfjs模型并使用
本文地址:https://www.cnblogs.com/tujia/p/13862365.html 系列文章: [0]TensorFlow光速入门-序 [1]TensorFlow光速入门-tenso ...
- 【4】TensorFlow光速入门-保存模型及加载模型并使用
本文地址:https://www.cnblogs.com/tujia/p/13862360.html 系列文章: [0]TensorFlow光速入门-序 [1]TensorFlow光速入门-tenso ...
- tensorflow models api:ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: 'Tensor("arg0:0", shape=(), dtype=float32, device=/device:CPU:0)'
tensorflow models api:ValueError: Tensor conversion requested dtype string for Tensor with dtype flo ...
随机推荐
- 在Vim按了ctrl+s后
在windows我们码代码的时候习惯ctrl+s保存: 但在vim中使用ctrl+s之后终端就没反应了... vim: ctrl+s终止屏幕输出,敲的东西都有效,就是看不见. ctrl+q恢复:
- jenkins简单安装及配置(Windows环境)
jenkins是一款跨平台的持续集成和持续交付.基于Java开发的开源软件,提供任务构建,持续集成监控的功能,可以使开发测试人员更方便的构建软件项目,提高工作效率. Windows平台下,一般安装方法 ...
- 说说Java代理模式
代理实现可以分为静态代理和动态代理. 静态代理 静态代理模式其实很常见,比如买火车票这件小事:黄牛相当于是火车站的代理,我们可以通过黄牛买票,但只能去火车站进行改签和退票.在代码实现中相当于为一个委托 ...
- emqtt 试用(六)系统主题
$SYS-系统主题 EMQ 消息服务器周期性发布自身运行状态.MQTT 协议统计.客户端上下线状态到 $SYS/ 开头系统主题. $SYS 主题路径以 "$SYS/brokers/{node ...
- SpringCloud用户自定义配置信息的定义和查看
一.概念 在SpringCloud项目中,用户自己定义的配置信息也可以放在application.*,需要以 info打头,以便使用公用基础设施 /info 查看! 本文讲解基于 ConfigServ ...
- Docker学习笔记 - Docker容器内部署redis
Docker学习笔记(2-4)Docker应用实验-redist server 和client的安装使用 一.获取redis容器(含客户端和服务端) 二.创建服务端容器 1.在终端A中运行redis- ...
- ubuntu 虚拟机上的 django 服务,在外部Windows系统上无法访问
背景介绍 今天尝试着写了一个最简单的django 服务程序,使用虚拟机(Ubuntu16.02 LTS)上的浏览器访问程序没有问题.但是在物理机器上(win10 Home) 就出现错误 解决方法 在 ...
- android webview重定向 返回按钮死循环问题修改
当HTML有重定向的时候,回退时会不断往跳转进入死循环.尝试修改webview缓存加载策略以后,不起作用.在网上查阅资料以后,跟 shouldOverrideUrlLoading的返回值为true还是 ...
- python/数据库操作补充—模板—Session
python/数据库操作补充—模板—Session 一.创建一个app目录 在models.py只能类进行进行创建表 class Foo: xx= 字段(数据库数据类型) 字段类型 字符串 Email ...
- Java-NIO(二):缓冲区(Buffer)的数据存取
缓冲区(Buffer): 一个用于特定基本数据类行的容器.有java.nio包定义的,所有缓冲区都是抽象类Buffer的子类. Java NIO中的Buffer主要用于与NIO通道进行交互,数据是从通 ...