tensorflow nan

https://github.com/tensorflow/tensorflow/issues/3212

NaNs usually indicate something wrong with your training. Perhaps your learning rate is too high, perhaps you have invalid data. Maybe you have an invalid operation like a divide by zero. Tensorflow refusing to write any NaNs is giving you a warning that something has gone wrong with your training.

If you still suspect there is an underlying bug, you need to provide us a reproducible test case (as small as possible), plus information about what environment (please see the issue submission template).

https://stackoverflow.com/questions/33712178/tensorflow-nan-bug?newreg=c7e31a867765444280ba3ca50b657a07

Actually, it turned out to be something stupid. I'm posting this in case anyone else would run into a similar error.

cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))

is actually a horrible way of computing the cross-entropy. In some samples, certain classes could be excluded with certainty after a while, resulting in y_conv=0 for that sample. That's normally not a problem since you're not interested in those, but in the way cross_entropy is written there, it yields 0*log(0) for that particular sample/class. Hence the NaN.

Replacing it with

cross_entropy = -tf.reduce_sum(y_*tf.log(tf.clip_by_value(y_conv,1e-10,1.0)))

https://stackoverflow.com/questions/33922937/why-does-tensorflow-return-nan-nan-instead-of-probabilities-from-a-csv-file

Try throwing in a few of these. Instead of this line:

tf_softmax = tf.nn.softmax(tf.matmul(tf_in,tf_weight) + tf_bias)

Try:

tf_bias = tf.Print(tf_bias, [tf_bias], "Bias: ")

tf_weight = tf.Print(tf_weight, [tf_weight], "Weight: ")

tf_in = tf.Print(tf_in, [tf_in], "TF_in: ")

matmul_result = tf.matmul(tf_in, tf_weight)

matmul_result = tf.Print(matmul_result, [matmul_result], "Matmul: ")

tf_softmax = tf.nn.softmax(matmul_result + tf_bias)

to see what Tensorflow thinks the intermediate values are. If the NaNs are showing up earlier in the pipeline, it should give you a better idea of where the problem lies. Good luck! If you get some data out of this, feel free to follow up and we'll see if we can get you further.

Updated to add: Here's a stripped-down debugging version to try, where I got rid of the input functions and just generated some random data:

https://stackoverflow.com/questions/38810424/how-does-one-debug-nan-values-in-tensorflow

There are a couple of reasons WHY you can get a NaN-result, often it is because of too high a learning rate but plenty other reasons are possible like for example corrupt data in your input-queue or a log of 0 calculation.

Anyhow, debugging with a print as you describe cannot be done by a simple print (as this would result only in the printing of the tensor-information inside the graph and not print any actual values).

However, if you use tf.print as an op in bulding the graph (tf.print) then when the graph gets executed you will get the actual values printed (and it IS a good exercise to watch these values to debug and understand the behavior of your net).

However, you are using the print-statement not entirely in the correct manner. This is an op, so you need to pass it a tensor and request a result-tensor that you need to work with later on in the executing graph. Otherwise the op is not going to be executed and no printing occurs. Try this:

Z = tf.sqrt(Delta_tilde)

Z = tf.Print(Z,[Z], message="my Z-values:") # <-------- TF PRINT STATMENT

Z = Transform(Z) # potentially some transform, currently I have it to return Z for debugging (the identity)

Z = tf.pow(Z, 2.0)

tensorflow nan的更多相关文章

常用深度学习框——Caffe/ TensorFlow / Keras/ PyTorch/MXNet
常用深度学习框--Caffe/ TensorFlow / Keras/ PyTorch/MXNet 一．概述近几年来,深度学习的研究和应用的热潮持续高涨,各种开源深度学习框架层出不穷,包括Tenso ...
解决tensorflow在训练的时候权重是nan问题
搭建普通的卷积CNN网络. nan表示的是无穷或者是非数值,比如说你在tensorflow中使用一个数除以0,那么得到的结果就是nan. 在一个matrix中,如果其中的值都为nan很有可能是因为采用 ...
TensorFlow | ReluGrad input is not finite. Tensor had NaN values
问题的出现 Question 这个问题是我基于TensorFlow使用CNN训练MNIST数据集的时候遇到的.关键的相关代码是以下这部分: cross_entropy = -tf.reduce_sum ...
tensorflow训练中出现nan
问题暂记: 之后看 https://blog.csdn.net/qq_23142123/article/details/80526931 https://www.zhihu.com/question/ ...
tensorflow 训练网络loss突然出现nan的情况
1.问题描述:开始训练一切都是那么的平静,很正常! 突然loss变为nan,瞬间懵逼! 2.在网上看了一些解答,可能是梯度爆炸,可能是有关于0的计算.然后我觉得可能是关于0的吧,然后进行了验证. 3. ...
tensorflow 训练的时候loss=nan
出现loss为nan 可能是使用了relu激活函数,导致的.因为在负半轴上输出都是0
深度学习中损失值(loss值)为nan（以tensorflow为例）
我做的是一个识别验证码的深度学习模型,识别的图片如下验证码图片识别4个数字,数字间是有顺序的,设立标签时设计了四个onehot向量链接起来,成了一个长度为40的向量,然后模型的输入也是40维向量用s ...
tensorflow学习
tensorflow安装时遇到gcc: error trying to exec 'as': execvp: No such file or directory. 截止到2016年11月13号,源码编 ...
Tensorflow 实现稠密输入数据的逻辑回归二分类
首先实现一个尽可能少调用tf.nn模块儿的,自己手写相关的function import tensorflow as tf import numpy as np import melt_da ...

随机推荐

LeetCode——300. Longest Increasing Subsequence
一.题目链接:https://leetcode.com/problems/longest-increasing-subsequence/ 二.题目大意: 给定一个没有排序的数组,要求从该数组中找到一个 ...
【剑指offer】反转链表
输入一个链表,反转链表后,输出新链表的表头. *与之前的问题不同,这里需要修改链表的指向(之前的问题,不需要修改结点的指针,只需使用栈保存每个结点的值) *注意非空处理以及最后一个结点指针的修改 /* ...
python之路——12
王二学习python的笔记以及记录,如有雷同,那也没事,欢迎交流,wx:wyb199594 复习 1.装饰器开发原则:开放封闭原则作用:不改变原函数的调用方式,为函数前后扩展功能本质:闭包函数 ...
SAS 删除数据和对缺失值处理代码程序
%INCLUDE '00@HEADER.SAS'; %LET dir=..\04@Model;LIBNAME cc "&dir"; %MACRO ModelVariable ...
解决strcmp的错误以及VS的快捷键
主要是C++数组作业中发现的一些问题. 第一点是关于strcat函数我用VS2018调用strcat的时候报错,错误信息提示strcat不安全(?)要用strcat_s.修改后,可成功运行. 但这两 ...
Linux权限管理之ACL权限
注:转载自:https://www.cnblogs.com/ysocean/p/7801329.html 目录 1.什么是 ACL 权限? 2.查看分区 ACL 权限是否开启:dump2fs ①.查看 ...
Linux输入子系统详解
input输入子系统框架 linux输入子系统(linux input subsystem)从上到下由三层实现,分别为:输入子系统事件处理层(EventHandler).输入子系统核心层(Input ...
python 爬虫启航2.0
文章解析: 1.正则表达式解析 2.beautifulsoup,BeautifulSoup是一个复杂的树形结构,她的每一个节点都是一个python对象,获取网页的内容就是一个提取对象内容的过程,它的提 ...
Python代码教你批量将PDF转为Word
很多时候在学习时发现许多文档都是PDF格式,PDF格式却不利于学习使用,因此需要将PDF转换为Word文件,但或许你从网上下载了很多软件,但只能转换前五页(如WPS等),要不就是需要收费,那有没有免费 ...
【Noip模拟 20160929】花坛迷宫
题目描述圣玛格丽特学园的一角有一个巨大.如迷宫般的花坛.大约有一个人这么高的大型花坛,做成迷宫的形状,深受中世纪贵族的喜爱.维多利加的小屋就坐落在这迷宫花坛的深处.某一天早晨,久城同学要穿过这巨大的 ...

tensorflow nan

tensorflow nan的更多相关文章

随机推荐

热门专题