深度学习有不少的trick,而且这些trick有时还挺管用的,所以,了解一些trick还是必要的。上篇说的normalization、initialization就是trick的一种,下面再总结一下自己看Deep Learning Summer School, Montreal 2016 总结的一些trick。请路过大牛指正~~~

early stop

“早停止”很好理解,就是在validation的error开始上升之前,就把网络的训练停止了。说到这里,把数据集分成train、validation和test也算是一种trick吧。看一张图就明白了:

L1、L2 regularization

第一次接触正则化是在学习PRML的回归模型,那时还不太明白,现在可以详细讲讲其中的原理了。额~~~~(I will write the surplus of this article in English)

At the very outset, we take a insight on L1, L2 regularization. Assume the loss function of a linear regression model as . In fact, L1, L2 regularization can be seen as introducing prior distribution for the parameters. Here, L1 regularization can be interpreted as Laplace prior, and Guass prior for L2 regularization.Such tricks are used to reduce the complexity of a model.

L2 regularization

As before, we make a hypothesis that the target variable  is determined by the function  with an additive guass noise ,  resulting in, where is a zero mean Guassian random variable with precision . Thus we know  and its loglikelihood function form can be wrote as

Finding the optimal solution for this problem is equal to minimize the least squre function .   If a zero mean  variance Guass prior is introduced for ,  we get the posterior distribution for  from the function (need to know prior + likelihood = posterior)

rewrite the equaltion in loglikelihood form, we will get

at the present,  you see . This derivation process accounts for why L2 regularization could be interpreted as Guass Prior.

L1 regularization

Generally, solving the the L1 regularization is called LASSO problem, which will force some coefficients to zero thus create a sparse model. You can get reference from here.

  

From the above images(two-dimensionality), something can be discovered that for the L1 regularization, the interaction points always locate on the axis, and this will forces some of the variables to be zero. Addtionally, we can learn that L2 regularization doesn't produce a sparse model(the right side image).

Fine tuning

It's easy to understand fine tuning, which could be interpreted as a way to initailizes the weight parameters then proceed a supervised learning. Parameters value can be migrated from a well-trained network, such as ImageNet. However, sometimes you may need an autoencoder, which is an unsupervised learning method and seems not very popular now. An autoencoder will learn the latent structure of the data in hidden layers, whose input and output should be same.

In fact, supervised learning of fine tune just performs regular feed forward process as in a feed-forwad network.

Then, I want to show you an image about autoencoder, also you can get more information in detail through this website.

Data argumentation

If training data is not large enough, it's a neccessary to expand the data by fliping, transferring or rotating to avoid overfit. However, according to the paper "Visualizing and Understanding Convolutional Networks",  CNN may not be as stable when fliping for image as transferring and rotating, except a symmetrical image.

Pooling

I have introduced why pooling works well in the ealier article in this blog, if interested, you may need patience to look through the catalogue. In the paper "Visualizing and Understanding Convolutional Networks", you will see pooling sometimes a good way to control the invariance, I want to write down my notes about this classical paper in next article.

Dropout and Drop layer

Dropout now has become a prevalent technique to prevent overfitting, proposed by Hinton, etc. The dropout program will inactivate some nodes according to a probability p, which has been fixed before training. It makes the network more thinner. But why does it works ?  There may exist two main reasons, (1) for a batch of input data, nodes can be trained better in a thinner network, for they trained with more data in average.  (2) dropout inject noise into network which will do help.

Inspired by dropout and ResNet, Huang, etc propose drop layer, which make use of the shortcut connection in ResNet. It will enforce the information flow through the shortcut connection directly and obstruct the normal way according to a probability just like what dropout does. This technique makes the network  shorter in training, and can be used to trained a deeper network even more than 1000 layers.

Depth is important

The chart below interprets the reason why depth is important,

as the number of layer increasing, the classification accuracy arises. While visualizing the feature extracted from layers using decovnet("Visualizing and Understanding Convolutional Networks"), the deeper layer the more abstract feature extracted. And abstract feature will strengthen the  robustness, i.e. the invariance to transformation.

【CV知识学习】early stop、regularation、fine-tuning and some other trick to be known的更多相关文章

  1. 【CV知识学习】神经网络梯度与归一化问题总结+highway network、ResNet的思考

    这是一篇水货写的笔记,希望路过的大牛可以指出其中的错误,带蒟蒻飞啊~ 一.    梯度消失/梯度爆炸的问题 首先来说说梯度消失问题产生的原因吧,虽然是已经被各大牛说烂的东西.不如先看一个简单的网络结构 ...

  2. 【CV知识学习】Fisher Vector

    在论文<action recognition with improved trajectories>中看到fisher vector,所以学习一下.但网上很多的资料我觉得都写的不好,查了一 ...

  3. 【CV知识学习】【转】beyond Bags of features for rec scenen categories。基于词袋模型改进的自然场景识别方法

    原博文地址:http://www.cnblogs.com/nobadfish/articles/5244637.html 原论文名叫Byeond bags of features:Spatial Py ...

  4. L23模型微调fine tuning

    resnet185352 链接:https://pan.baidu.com/s/1EZs9XVUjUf1MzaKYbJlcSA 提取码:axd1 9.2 微调 在前面的一些章节中,我们介绍了如何在只有 ...

  5. 网络知识学习2---(IP地址、子网掩码)(学习还不深入,待完善)

    紧接着:网络知识学习1 1.IP地址    IP包头的结构如图 A.B.C网络类别的IP地址范围(图表) A.B.C不同的分配网络数和主机的方式(A是前8个IP地址代表网络,后24个代表主机:B是16 ...

  6. (原)caffe中fine tuning及使用snapshot时的sh命令

    转载请注明出处: http://www.cnblogs.com/darkknightzh/p/5946041.html 参考网址: http://caffe.berkeleyvision.org/tu ...

  7. HTML5标签汇总及知识学习线路总结

    HTML5标签汇总,以及知识学习线路总结.

  8. 安全测试3_Web后端知识学习

    其实中间还应该学习下web服务和数据库的基础,对于web服务大家可以回家玩下tomcat或者wamp等东西,数据库的话大家掌握基本的增删该查就好了,另外最好掌握下数据库的内置函数,如:concat() ...

  9. GCC基础知识学习

    GCC基础知识学习 一.GCC编译选项解析 常用编译选项 命令格式:gcc [选项] [文件名] -E:仅执行编译预处理: -S:将C代码转换为汇编代码: -c:仅执行编译操作,不进行连接操作: -o ...

随机推荐

  1. 深入理解java虚拟机---垃圾收集器和分配策略-1

    博文重点: 学习目标:哪些内存需要回收 什么时候回收    如何回收 在基于概念讨论的模型中,主要对Java堆和方法区进行讨论. why?:一个接口中的多个实现类需要的内存可能不一样,一个方法中的多个 ...

  2. Friday Q&A 2015-11-20:协变与逆变

    作者:Mike Ash,原文链接,原文日期:2015-11-20译者:Cee:校对:千叶知风:定稿:numbbbbb 在现代的编程语言中,子类型(Subtypes)和超类型(Supertypes)已经 ...

  3. CAD使用SetxDataLong写数据(com接口)

    主要用到函数说明: MxDrawEntity::SetxDataLong 写一个long扩展数据,详细说明如下: 参数 说明 [in] BSTR val 字符串值 szAppName 扩展数据名称 n ...

  4. 【计算机网络】2.5 DNS:因特网的目录服务

    第二章第五节 因特网的目录服务 DNS(域名系统)提供了一种能运行主机名到IP地址转换的因特网目录服务:一方面,他让人能够记住如taobao.com这样的主机别名:另一方面,他提供给路由器可理解的IP ...

  5. echo追加字符串到文件末尾

    1.覆盖    echo  "string" >  filename 2.追加   echo "string"  >>  filename

  6. qrcode.js

    (function(r){r.fn.qrcode=function(h){var s;function u(a){this.mode=s;this.data=a}function o(a,c){thi ...

  7. python 3 廖雪峰博客笔记(三) 命令行模式与交互模式

    python 的代码一般保存为 .py结尾的文本文件格式 比如 add.py 里写下如下内容 100 + 200 执行 add.py有两种方式: 1. 命令行方式:将python代码写入脚本中执行 p ...

  8. python3.x Day4 模块!!

    json and pickle模块 用途是为了持久化信息,这种持久化方式可以和其他程序语言兼容,一般都支持json,json只能持久化数据,pickle是python特有的方式,可以持久化所有信息和数 ...

  9. hihoCoder 必须吐槽hihoCoder的一点 (¯Д¯)ノ

    代码输入窗口太小,mac下经常误操作:双指滑动浏览器后退 而且输入框不会自动保存,一不小心后退一下,啥都..都没了...码了好久的代码就没了..遇到不止一次了 (  ̄ .  ̄ ) (゜ ロ゜;) ( ̄ ...

  10. [BZOJ1264][AHOI2006]基因匹配Match(DP + 树状数组)

    传送门 有点类似LCS,可以把 a[i] 在 b 串中的位置用一个链式前向星串起来,由于链式前向星是从后往前遍历,所以可以直接搞. 状态转移方程 f[i] = max(f[j]) + 1 ( 1 &l ...