Lecture 11 — Hopfield Nets

Lecture 12 — Boltzmann machine learning

Ref: 能量模型(EBM)、限制波尔兹曼机(RBM)

高大上的模型和理论。


Hopfield Nets

看了能量函数,发现:

These look very much like the weights and biases of a neural network.

【点到为止】

Boltzmann machine learning

From: A Beginner’s Tutorial for Restricted Boltzmann Machines

Frankly, 这玩意晦涩难懂,且在卷积神经网络的大趋势下,工业界并没有什么优势,那又何必华这么大的力气在此呢?

Each visible node takes a low-level feature from an item in the dataset to be learned. For example, from a dataset of grayscale images, each visible node would receive one pixel-value for each pixel in one image. (MNIST images have 784 pixels, so neural nets processing them must have 784 input nodes on the visible layer.)

Now let’s follow that single pixel value, x, through the two-layer net. At node 1 of the hidden layer, x is multiplied by a weight and added to a so-called bias. The result of those two operations is fed into an activation function, which produces the node’s output, or the strength of the signal passing through it, given input x.

activation f((weight w * input x) + bias b ) = output a

Next, let’s look at how several inputs would combine at one hidden node. Each x is multiplied by a separate weight, the products are summed, added to a bias, and again the result is passed through an activation function to produce the node’s output.

Because inputs from all visible nodes are being passed to all hidden nodes, an RBM can be defined as a symmetrical bipartite graph. 【目前为止,与全连接相比没有太大新意】

Symmetrical means that each visible node is connected with each hidden node (see below). Bipartite means it has two parts, or layers, and the graph is a mathematical term for a web of nodes.

At each hidden node, each input x is multiplied by its respective weight w. That is, a single input x would have three weights here, making 12 weights altogether (4 input nodes x 3 hidden nodes). The weights between two layers will always form a matrix where the rows are equal to the input nodes, and the columns are equal to the output nodes.

Each hidden node receives the four inputs multiplied by their respective weights. The sum of those products is again added to a bias (which forces at least some activations to happen), and the result is passed through the activation algorithm producing one output for each hidden node.

If these two layers were part of a deeper neural network, the outputs of hidden layer no. 1 would be passed as inputs to hidden layer no. 2, and from there through as many hidden layers as you like until they reach a final classifying layer. (For simple feed-forward movements, the RBM nodes function as an autoencoder and nothing more.)

【到此为止,暂无新事】

Reconstructions

But in this introduction to restricted Boltzmann machines, we’ll focus on how they learn to reconstruct data by themselves in an unsupervised fashion (unsupervised means without ground-truth labels in a test set), making several forward and backward passes between the visible layer and hidden layer no. 1 without involving a deeper network.

In the reconstruction phase, the activations of hidden layer no. 1 become the input in a backward pass. They are multiplied by the same weights, one per internode edge, just as x was weight-adjusted on the forward pass. The sum of those products is added to a visible-layer bias at each visible node, and the output of those operations is a reconstruction; i.e. an approximation of the original input. This can be represented by the following diagram:

Because the weights of the RBM are randomly initialized, the difference between the reconstructions and the original input is often large. You can think of reconstruction error as the difference between the values of r and the input values, and that error is then backpropagated against the RBM’s weights, again and again, in an iterative learning process until an error minimum is reached.

A more thorough explanation of backpropagation is here.

As you can see, on its forward pass, an RBM uses inputs to make predictions about node activations, or the probability of output given a weighted xp(a|x; w).

But on its backward pass, when activations are fed in and reconstructions, or guesses about the original data, are spit out, an RBM is attempting to estimate the probability of inputs x given activations a, which are weighted with the same coefficients as those used on the forward pass. This second phase can be expressed as p(x|a; w).

Together, those two estimates will lead you to the joint probability distribution of inputs x and activations a, or p(x, a).

Reconstruction does something different from regression, which estimates a continous value based on many inputs, and different from classification, which makes guesses about which discrete label to apply to a given input example.

Reconstruction is making guesses about the probability distribution of the original input; i.e. the values of many varied points at once. This is known as generative learning, which must be distinguished from the so-called discriminative learning performed by classification, which maps inputs to labels, effectively drawing lines between groups of data points.

Let’s imagine that both the input data and the reconstructions are normal curves of different shapes, which only partially overlap.

To measure the distance between its estimated probability distribution and the ground-truth distribution of the input, RBMs use Kullback Leibler Divergence. A thorough explanation of the math can be found on Wikipedia.

KL-Divergence measures the non-overlapping, or diverging, areas under the two curves, and an RBM’s optimization algorithm attempts to minimize those areas so that the shared weights, when multiplied by activations of hidden layer one, produce a close approximation of the original input. On the left is the probability distibution of a set of original input, p, juxtaposed with the reconstructed distribution q; on the right, the integration of their differences.

【如何理解Boltzmann machine是生成式模型】

3. 深度信念网络(DBN)

深度信念网络(Deep Belief Network,DBN)是早期深度生成式模型的典型代表,它由多层神经元构成,这些神经元又分为可见神经元和隐性神经元,可见单元用于接受输入,隐单元用于提取特征。网络最顶上的两层间的连接是无向的,组成联合内存 (associative memory),较低的其他层之间有连接上下的有向连接。最底层代表了数据向量 (data vectors),每一个神经元代表数据向量的一维。

DBN的组成元件是受限玻尔兹曼机(Restricted Boltzmann Machines ,RBM)。单个RBM由两层网络组成:

  • 一层叫做可见层 (visible layer),由可见单元 (visible units) 组成,用于输入训练数据;
  • 另一层叫做隐层 (Hidden layer),由隐单元 (hidden units) 组成,用作特征检测器 (feature detectors)。

RBM既是一个生成模型,也是一个无监督模型,因为它使用隐变量来描述输入数据的分布,而且这个过程没有涉及数据的标签信息。单层RBM网络的学习目标是无监督地训练网络,使得可见层节点v的分布p(v)最大可能地拟合输入样本所在样本空间的真实分布q(v)。通过计算可见向量p(v)的对数似然log p(v)的梯度来更新RBM的权值【看样子既是局部输入也是局部输出】,这个计算过程涉及到了求解RBM模型所确定分布上的期望。

对于生成式模型概率推断过程中遇到的计算某分布下函数的期望、计算边缘概率分布等复杂问题,可以采用蒙特卡洛思想近似求解。

DBN采用对比散度(Contrastive Divergence, CD-k)算法,利用Gibbs采样的方法来估计RBM的对数似然梯度。

多个RBM堆叠组成一个DBN,将隐单元的激活概率(activation probabilities)作为下一层RBM的可见层输入数据,从底向上逐层预训练。DBN是一种生成模型,通过训练其神经元间的权重,我们可以让整个神经网络按照最大概率来生成训练数据。

生成样本时,使用训练好的随机隐单元状态值,首先在网络最顶两层进行多次Gibbs采样,生成该分布下的采样,然后向下传播,得到每层的状态和最终的样本。 

训练

貌似有更好的改进,如下:

[Hinton] Neural Networks for Machine Learning - Hopfield Nets and Boltzmann Machine的更多相关文章

  1. [C3] Andrew Ng - Neural Networks and Deep Learning

    About this Course If you want to break into cutting-edge AI, this course will help you do so. Deep l ...

  2. Neural Networks and Deep Learning

    Neural Networks and Deep Learning This is the first course of the deep learning specialization at Co ...

  3. 第四节,Neural Networks and Deep Learning 一书小节(上)

    最近花了半个多月把Mchiael Nielsen所写的Neural Networks and Deep Learning这本书看了一遍,受益匪浅. 该书英文原版地址地址:http://neuralne ...

  4. 【DeepLearning学习笔记】Coursera课程《Neural Networks and Deep Learning》——Week2 Neural Networks Basics课堂笔记

    Coursera课程<Neural Networks and Deep Learning> deeplearning.ai Week2 Neural Networks Basics 2.1 ...

  5. 【DeepLearning学习笔记】Coursera课程《Neural Networks and Deep Learning》——Week1 Introduction to deep learning课堂笔记

    Coursera课程<Neural Networks and Deep Learning> deeplearning.ai Week1 Introduction to deep learn ...

  6. Neural Networks and Deep Learning学习笔记ch1 - 神经网络

    近期開始看一些深度学习的资料.想学习一下深度学习的基础知识.找到了一个比較好的tutorial,Neural Networks and Deep Learning,认真看完了之后觉得收获还是非常多的. ...

  7. [Hinton] Neural Networks for Machine Learning - Basic

    Link: Neural Networks for Machine Learning - 多伦多大学 Link: Hinton的CSC321课程笔记1 Link: Hinton的CSC321课程笔记2 ...

  8. [Hinton] Neural Networks for Machine Learning - RNN

    Link: Neural Networks for Machine Learning - 多伦多大学 Link: Hinton的CSC321课程笔记 补充: 参见cs231n 2017版本,ppt写得 ...

  9. [Hinton] Neural Networks for Machine Learning - Converage

    Link: Neural Networks for Machine Learning - 多伦多大学 Link: Hinton的CSC321课程笔记 Ref: 神经网络训练中的Tricks之高效BP ...

随机推荐

  1. git配置SSH Key,上传本地代码至github

    git配置全局的name和email git config --global user.name "name" git config --global user.email &qu ...

  2. Chrome上网问题解决记录

    浏览器打开网站缓慢,甚至等待很久后直接烂页面 注意观察浏览器左下角会显示: 正在建立安全连接... 环境: 操作系统:Win10 Chrome版本: 69.x 解决办法:http://bbs.360. ...

  3. 构造函数与getter和setter的区别

    构造函数是用于初始化类的属性,且只有在创建对象时才会调用构造函数,用于给对象分配地址 无参的构造函数,创建对象时默认调用,当程序没有明确写出有参的构造函数,系统会默认的创建一个. 有参的构造函数,创建 ...

  4. Hessian学习总结(二)——使用hessian上传文件

    hessian较早版本通过 byte[] 进行文件传输:4.0之后支持 InputStream 作为参数或返回值进行传输. 注意:hessian会读取整个文件,如果文件过大,会导致JVM内存溢出.可以 ...

  5. [Android Pro] 关于Android 7.0无法进行https抓包的问题

    cp from  : https://www.cnblogs.com/wytings/p/6954293.html 在App进行数据请求的时候,如果每次都打印log去判断是一件很不“人性化”的操作行为 ...

  6. iOS:NSFileHandle和NSFileManger的使用

    一.介绍 利用NSFilehandle类提供的方法,允许更有效地使用文件. 一般而言,处理文件时都要经历以下三个步骤: 1.打开文件,并获取一个NSFileHandle对象,以便在后面的I/O操作中引 ...

  7. PAM 認 證 模 組

    作者:陳柏菁 E-mail 作用:限制哪些用户或者组可以从哪里登陆,或者可以建立/etc/nologin立即阻止一般用户登陆,限制 user可以使用的资源.例如cpu,文件,登陆数量,某些服务的登陆时 ...

  8. Spring导出可以运行的jar包

    最近需要解决Maven项目导入可执行的jar包的问题,如果项目不包含Spring,那么使用mvn assembly:assembly即可,详情可以参考:http://www.cnblogs.com/l ...

  9. angularjs drag and drop

    angular-dragula Drag and drop so simple it hurts 480 live demo angular-drag-and-drop-lists Angular d ...

  10. IntelliJ IDEA 2017.3/2018.1激活与汉化

    本文以IntelliJ IDEA 2017.3以及2018.1为例进行讲解.(持续更新:2018年5月28日) 适用版本(其它版本未测试): IntelliJ IDEA 2017.2.6.2017.3 ...