大师Geoff Hinton关于Deep Neural Networks的建议


Note: This covers suggestions from Geoff Hinton’s talk given at UBC which was recorded May 30, 2013. It does not cover bleeding edge techniques.

主要分为如下几点展开:


  • Have a Deep Network.

    1-2个hidden layers被认为是一个shallow network,浅浅的神经网络,当hidden layers数量多时,会造成local optima,缺乏数据等。

    因为deep neural network相比shallow neural network,最大的区别就是greater representational power,这个能力随着layer的增加而增加。

PS:理论上,只有一个单层hidden layer但是有很多unit的神经网络(large breadth,宽度,not deep),具有与deeper network相似的representational power,但是目前还不知道有哪种方法来训练这样的network。


  • Pretrain if you do not have a lot of unlabelled training data. If you do skip it.

    pre-training 又叫做greedy layer-wise training,如果没有足够的标签样本就需要执行greedy layer-wise pretraining,如果有足够多的样本,只需执行正常的full network stack 的训练即可。

    pre-training可以让parameters能够站在一个较好的初始值上,当你有足够的无标签样本时,这一点就无意义了。

Side Note: An interesting paper shows that unsupervised pretraining encourages sparseness in DNN. Link is here.


  • Initialize the weight to sensible values.

    可以将权重设置为小的随机数,这些小随机数权重的分布取决于在network中使用的nonlinearity,如果使用的是rectified linear units,可以设置为小的正数。

It makes calculating the gradient during back propagation trivial. It is 0 if x < 0 and 1 elsewhere. This speeds up the training of the network.



ReLU units are more biologically plausible then the other activation functions, since they model the biological neuron’s responses in their area of operation. While sigmoid and tanh activation functions are biologically implausible. A sigmoid has a steady state of around 12 and after initlizing with small weights fire at half their saturation potential.


  • Have many more parameters than training examples.

    确保整个参数的数量(a single weight in your network counts as one parameter)超过训练样本的数量一大截,总是使得neural network overfit,然后强力的regularize它,比如,一个例子是有1000个训练样本,须有1百万个参数。

这样做的理由是模仿大脑的机制,突触的数量要比经验多得多,在一次活动中,只不过大部分都没有激活。


  • Use dropout to regularize it instead of L1 and L2 regularization.

    dropout是一项用来在一个隐含层中丢掉或者遗漏某些隐含单元的技术,每当训练样本被送入network时就发生。随机从隐含层中进行子采样。一种不同的架构是all sharing weights。

这是一种模型平均或者近似的形式,是一种很强的regularization方法,不像常用的L1或者L2 regularization将参数拉至0,subsample或者sharing weights使参数拉至合理的值。比较neat。


  • Convolutional Frontend (optional)

    如果数据包含任何空间结构信息,比如voice,images,video等,可以使用卷积前段。

    可以参看我的博文《卷积神经网络(CNN)

卷积可以看作诗一个滤波器,算子等,可以从原始的pixel等中抽取边缘等特征,或者表示与卷积核的相似度等等。采用卷积可以对空间信息进行编码。

参考文献:

http://343hz.com/general-guidelines-for-deep-neural-networks/


2015-9-11 艺少

大师Geoff Hinton关于Deep Neural Networks的建议的更多相关文章

  1. [C4] Andrew Ng - Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

    About this Course This course will teach you the "magic" of getting deep learning to work ...

  2. On Explainability of Deep Neural Networks

    On Explainability of Deep Neural Networks « Learning F# Functional Data Structures and Algorithms is ...

  3. Classifying plankton with deep neural networks

    Classifying plankton with deep neural networks The National Data Science Bowl, a data science compet ...

  4. (Deep) Neural Networks (Deep Learning) , NLP and Text Mining

    (Deep) Neural Networks (Deep Learning) , NLP and Text Mining 最近翻了一下关于Deep Learning 或者 普通的Neural Netw ...

  5. Must Know Tips/Tricks in Deep Neural Networks

    Must Know Tips/Tricks in Deep Neural Networks (by Xiu-Shen Wei)   Deep Neural Networks, especially C ...

  6. Must Know Tips/Tricks in Deep Neural Networks (by Xiu-Shen Wei)

    http://lamda.nju.edu.cn/weixs/project/CNNTricks/CNNTricks.html Deep Neural Networks, especially Conv ...

  7. (转)Understanding, generalisation, and transfer learning in deep neural networks

    Understanding, generalisation, and transfer learning in deep neural networks FEBRUARY 27, 2017   Thi ...

  8. 为什么深度神经网络难以训练Why are deep neural networks hard to train?

    Imagine you're an engineer who has been asked to design a computer from scratch. One day you're work ...

  9. 论文翻译:2018_Source localization using deep neural networks in a shallow water environment

    论文地址:https://asa.scitation.org/doi/abs/10.1121/1.5036725 深度神经网络在浅水环境中的源定位 摘要: 深度神经网络(DNNs)在表征复杂的非线性关 ...

随机推荐

  1. Linux中三种SCSI target的介绍之SCST

    版权声明:本文为博主原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明. 本文链接:https://blog.csdn.net/scaleqiao/article/deta ...

  2. noi.ac #44 链表+树状数组+思维

    \(des\) 给出长度为 \(n\) 的序列,全局变量 \(t\),\(m\) 次询问,询问区间 \([l, r]\) 内出现次数为 \(t\) 的数的个数 \(sol\) 弱化问题:求区间 \([ ...

  3. 64位内核开发第二讲.内核编程注意事项,以及UNICODE_STRING

    目录 一丶驱动是如何运行的 1.服务注册驱动 二丶Ring3跟Ring0通讯的几种方式 1.IOCTRL_CODE 控制代码的几种IO 2.非控制 缓冲区的三种方式. 三丶Ring3跟Ring0开发区 ...

  4. MATLAB曲线拟合函数

    一.多项式拟合 ployfit(x,y,n) :找到次数为 n 的多项式系数,对于数据集合 {(x_i,y_i)},满足差的平方和最小 [P,E] = ployfit(x,y,n) :返回同上的多项式 ...

  5. shell 一次性杀掉相同名称的进程

    这条命令将会杀掉所有含有关键字"LOCAL=NO"的进程 ps -ef|grep LOCAL=NO|grep -v grep|cut -c -|xargs kill - 另一种方法 ...

  6. 大数据学习之路之HBASE

    Hadoop之HBASE 一.HBASE简介 HBase是一个开源的.分布式的,多版本的,面向列的,半结构化的NoSql数据库,提供高性能的随机读写结构化数据的能力.它可以直接使用本地文件系统,也可以 ...

  7. Leet Code 1.两数之和

    给定一个整数nums和一个目标值target,请你在该数组中找出和为目标值的那两个整数,并返回他们的数组下标. 可以假设每种输入只会对应一个答案.但是,不能重复利用这个数组中同样的元素. 题解 提交代 ...

  8. Thingsboard 重新启动docker-compose容器基础数据存在的问题

    在重启了thingsboard的容器后,想再次重新启动容器,发现已经出现了错误 查看posttres中,持久化的地址是tb-node/postgres中 再查看相应的文件夹 删除以上log和postg ...

  9. 深度学习面试题20:GoogLeNet(Inception V1)

    目录 简介 网络结构 对应代码 网络说明 参考资料 简介 2014年,GoogLeNet和VGG是当年ImageNet挑战赛(ILSVRC14)的双雄,GoogLeNet获得了第一名.VGG获得了第二 ...

  10. cropper.js移动端使用

    cropper.js移动端使用 一.总结 一句话总结: 启示:找对关键词,找对相关方面的应用,效果真的非常好 比如 cropper.js移动端使用,这样设置了(dragMode: 'move',//拖 ...