debugNN
Introduction to debugging neural networks
The following advice is targeted at beginners to neural networks, and is based
on my experience giving advice to neural net newcomers in industry and at
Stanford. Neural nets are fundamentally harder to debug than most programs,
because most neural net bugs don't result in type errors or runtime errors.
They just cause poor convergence. Especially when you're new, this can be very
frustrating! But an experienced neural net trainer will be able to
systematically overcome the difficulty in spite of the ubiquitous and
seemingly ambiguous error message:

Performance Error: your neural net did not train well.

To the uninitiated, the message is daunting. But to the experienced, this is a
great error. It means the boilerplate coding is out of the way, and it's time
to dig in!
How to deal with NaNs
By far the most common first question I get from students is, "Why am I
getting NaNs." Occasionally, this has a complicated answer. But most often,
the NaNs come in the first 100 iterations, and the answer is simple: your
learning rate is too high. When the learning rate is very high, you will get
NaNs in the first 100 iterations of training. Try reducing the learning rate
by a factor of 3 until you no longer get NaNs in the first 100 iterations. As
soon as this works, you'll have a pretty good learning rate to get started
with. In my experience, the best heavily validated learning rates are 1-10x
below the range where you get NaNs.

If you are getting NaNs beyond the first 100 iterations, there are 2 further
common causes. 1) If you are using RNNs, make sure that you are using "gradient
clipping", which caps the global L2 norm of the gradients. RNNs tend to
produce gradients early in training where 10% or fewer of the batches have
learning spikes, where the gradient magnitude is very high. Without clipping,
these spikes can cause NaNs. 2) If you have written any custom layers
yourself, there is a good chance your own custom layer is causing the problems
in a division by zero scenario. Another notoriously NaN producing layer is
the softmax layer. The softmax computation involves an exp(x) term in both the
numerator and denominator, which can divide Inf by Inf and produce NaNs. Make
sure you are using a stabilized softmax implementation.
What to do when your neural net isn't learning anything
Once you stop getting NaNs, you are often rewarded with a neural net that runs
smoothly for many thousand iterations, but never reduces the training loss
after the initial fidgeting of the first few hundred iterations. When you're
first constructing your code base, waiting for more than 2000 iterations is
rarely the answer. This is not because all networks can start learning in
under 2000 iterations. Rather, the chance you've introduced a bug when coding
up a network from scratch is so high that you'll want to go into a special
early debugging mode before waiting on high iteration counts. The name of the
game here is to reduce the scope of the problem over and over again until you
have a network that trains in less than 2000 iterations. Fortunately, there
are always 2 good dimensions to reduce complexity.

1) Reduce the size of the training set to 10 instances. Working neural nets
can usually overfit to 10 instances within just a few hundred iterations. Many
coding bugs will prevent this from happening. If you're network is not able to
overfit to 10 instances of the training set, make sure your data and labels
are hooked up correctly. Try reducing the batch size to 1 to check for batch
computation errors. Add print statements throughout the code to make sure
things look like you expect. Usually, you'll be able to find these bugs
through sheer brute force. Once you can train on 10 instances, try training on
100. If this works okay, but not great, you're ready for the next step.

2) Solve the simplest version of the problem that you're interested in. If
you're translating sentences, try to build a language model for the target
language first. Once that works, try to predict the first word of the
translation given only the first 3 words of the source. If you're trying to
detect objects in images, try classifying the number of objects in each image
before training a regression network. There is a trade-off between getting
a good sub-problem you're sure the network can solve, and spending the
least amount of time plumbing the code to hook up the appropriate data.
Creativity will help here.

The trick to scaling up a neural net for a new idea is to slowly relax the
simplifications made in the above two steps. This is a form of coordinate
ascent, and it works great. First, you show that the neural net can at least
memorize a few examples. Then you show that it's able to really generalize to
the validation set on a dumbed down version of the problem. You slowly up the
difficulty while making steady progress. It's not as fun as hotshotting it
the first time Karpathy style, but at least it works. At some point, you'll
find the problem is difficult enough that it can no longer be learned in 2000
iterations. That's great! But it should rarely take more than 10 times the
iterations of the previous complexity level of the problem. If you're finding
that to be the case, try to search for an intermediate level of complexity.
Tuning hyperparameters
Now that your networks is learning things, you're probably in pretty good
shape. But you may find that your network is just not capable of solving the
most difficult versions of your problem. Hyperparameter tuning will be key
here. Some people who just download a CNN package and ran it on their dataset
will tell you hyperparameter tuning didn't make a difference. Realize that
they're solving an existing problem with an existing architecture. If you're
solving a new problem that demands a new architecture, hyperparameter tuning
to get within the ballpark of a good setting is a must. You're best bet is
to read a hyperparameter tutorial for your specific problem, but I'll list
a few basic ideas here for completeness.
*
* Visualization is key. Don't be afraid to take the time to write
* yourself nice visualization tools throughout training. If your method
* of visualization is watching the loss bump around from the terminal,
* consider an upgrade.
* Weight initializations are important. Generally, larger magnitude initial
* weights are a good idea, but too large will get you NaNs. Thus, weight
* initialization will need to be simultaneously tuned with the learning rate.
*
* Make sure the weights look "healthy". To learn what this means, I
* recommend opening weights from existing networks in an ipython notebook.
* Take some time to get used to what weight histograms should look
* like for your components in mature nets trained on standard datasets
* like ImageNet or the Penn Tree Bank.
*
* Neural nets are not scale invariant w.r.t. inputs, especially
* when trained with SGD rather than second order methods, as SGD is
* not a scale-invariant method. Take the time to scale your input
* data and output labels in the same way that others before you have
* scaled them.
*
* Decreasing your learning rate towards the end of training will
* almost always give you a boost. The best decay schedules usually take the
* form: after k epochs, divide the learning rate by 1.5 every n epochs,
* where k > n.
*
* Use hyperparameter config files, although it's okay to put
* hyperparameters in the code until you start trying out different values.
* I use json files that I load in with a command line argument as in
* https://github.com/Russell91/tensorbox, but the exact format is not
* important. Avoid the urge to refactor your code as it becomes a
* hyperparameter loading mess! Refactors introduce bugs that cost you
* training cycles, and can be avoided until after you have a network
* you like.
*
* Randomize your hyperparameter search if you can afford it. Random
* search generates hyperparmeter combinations you wouldn't have thought
* of and removes a great deal of effort once your intuition is already
* trained on how to think about the impact of a given hyperparameter.
*
Conclusion
Debugging neural nets can be more laborious than traditional programs because
almost all errors get projected onto the single dimension of overall network
performance. Nonetheless, binary search is still your friend. By alternately
1) changing the difficulty of your problem, and 2) using a small number of
training examples, you can quickly work through the initial bugs.
Hyperparameter tuning and long periods of diligent waiting will get you the
rest of the way.

记录Debug神经网络的方法的更多相关文章

  1. SQL Server 数据库查找重复记录的几种方法

    http://www.hanyu123.cn/html/c61/6790.html 一.查某一列(或多列)的重复值.(只可以查出重复记录的值,不能查出整个记录的信息) 例如:查找stuid,stuna ...

  2. 【转】oracle 中随机取一条记录的两种方法

    oracle 中随机取一条记录的两种方法 V_COUNT INT:=0; V_NUM INT :=0; 1:TBL_MYTABLE 表中要有一个值连续且唯一的列FID BEGIN SELECT COU ...

  3. PHP获取MySql新增记录ID值的方法

    今天发现用mysql_insert_id()获取到的新增记录的id不正确, 虽然发现源代码的事务写的有问题,但是最根本的原因是,我插入数据的id类型是bigint型 获取MySql新增记录ID值的方法 ...

  4. java中的List记录是否完全匹配方法

    今天要说的是给List分组,然后用Map来封装,可能你看了以后还是有一些模糊. 先看一下项目结构图: User类是一个VO类,主要逻辑还是在MapTestBak上面. 运行效果: 原理图: 1.在st ...

  5. Apache日志不记录图片文件设置方法和来源日志的配置

    Apache日志不记录图片文件设置方法 <FilesMatch "\.(ico|gif|jpg|swf)">SetEnv IMAG 1</FilesMatch&g ...

  6. Nginx正确记录post日志的方法

    Nginx正确记录post日志的方法 事实上可以很简单,这取决于把 access_log 放在哪个 location 里面. 一,放到包含fastcgi_pass或proxy_pass的Locatio ...

  7. spring+hibernate删除单条记录的几种方法

    spring+hibernate删除单条记录的几种方法

  8. Linux实战案例(4)CentOS清除用户登录记录和命令历史方法

    CentOS清除用户登录记录和命令历史方法 清除登陆系统成功的记录[root@localhost root]# echo > /var/log/wtmp //此文件默认打开时乱码,可查到ip等信 ...

  9. oracle中查找和删除重复记录的几种方法总结

    平时工作中可能会遇到当试图对库表中的某一列或几列创建唯一索引时,系统提示 ORA-01452 :不能创建唯一索引,发现重复记录. 下面总结一下几种查找和删除重复记录的方法(以表CZ为例): 表CZ的结 ...

随机推荐

  1. mssql手工注入1

    强制字符转成数字, 所以报错, 能获得数据 查版本号: http: -- 查数据库版本: http: -- 查当前数据库用户(如果看到dbo 那么多半当前数据库的用户是dba权限): http: -- ...

  2. nmap导出处理脚本

    import sys log = open("result.gnmap","r") xls = open("output.csv",&quo ...

  3. 自动化测试===Httprunner测试框架介绍

    项目地址: https://github.com/HttpRunner/HttpRunner 中文手册: http://cn.httprunner.org/ 首先是环境搭建: pip install ...

  4. 在linux下有没有什么软件可以连接windows上的MSSQL SERVER

    在linux下有没有什么软件可以连接windows上的MSSQL SERVER GUI的http://dbeaver.jkiss.org/ http://bbs.csdn.net/topics/391 ...

  5. 【bzoj4551】TJOI2016&HEOI2016树

    这题嘛…… 子树询问什么的,直接dfs序线段树无脑写,是吧…… 然后几分钟之内zcy就写出了这样的东西: #include<bits/stdc++.h> #define N 100005 ...

  6. 转载: GIt远程操作详解

    Git远程操作详解   作者: 阮一峰 日期: 2014年6月12日 Git是目前最流行的版本管理系统,学会Git几乎成了开发者的必备技能. Git有很多优势,其中之一就是远程操作非常简便.本文详细介 ...

  7. ubuntu 10.04打开错误

    打开ubuntu时,出现的错误如下: Invalid configuration file. File "E:\Ubuntu12.04.vmwarevm\Ubuntu12.04.vmx&qu ...

  8. Mysql 数据库学习笔记03 存储过程

    一.存储过程:如下           通过 out .inout 将结果输出,可以输出多个值. * 调用存储过程: call 存储名称(参数1,参数2,...); 如指定参数不符合要求,返回 Emp ...

  9. django使用JWT保存用户登录信息

    在使用前必须弄明白JWT的原理,原理可以看我的另一篇博文:https://www.cnblogs.com/chichung/p/9966027.html JWT的流程 1.签发JWT 在用户正确输入账 ...

  10. django “如何”系列5:如何编写自定义存储系统

    如果你需要提供一个自定义的文件存储-一个常见的例子便是在远程系统上存储文件-你可以通过定义一个自己的存储类来做这件事情,你将通过一下步骤: 你自定义的存储系统一定是django.core.files. ...