Lecture 11 — Hopfield Nets

Lecture 12 — Boltzmann machine learning

Ref: 能量模型(EBM)、限制波尔兹曼机(RBM)

高大上的模型和理论。


Hopfield Nets

看了能量函数,发现:

These look very much like the weights and biases of a neural network.

【点到为止】

Boltzmann machine learning

From: A Beginner’s Tutorial for Restricted Boltzmann Machines

Frankly, 这玩意晦涩难懂,且在卷积神经网络的大趋势下,工业界并没有什么优势,那又何必华这么大的力气在此呢?

Each visible node takes a low-level feature from an item in the dataset to be learned. For example, from a dataset of grayscale images, each visible node would receive one pixel-value for each pixel in one image. (MNIST images have 784 pixels, so neural nets processing them must have 784 input nodes on the visible layer.)

Now let’s follow that single pixel value, x, through the two-layer net. At node 1 of the hidden layer, x is multiplied by a weight and added to a so-called bias. The result of those two operations is fed into an activation function, which produces the node’s output, or the strength of the signal passing through it, given input x.

activation f((weight w * input x) + bias b ) = output a

Next, let’s look at how several inputs would combine at one hidden node. Each x is multiplied by a separate weight, the products are summed, added to a bias, and again the result is passed through an activation function to produce the node’s output.

Because inputs from all visible nodes are being passed to all hidden nodes, an RBM can be defined as a symmetrical bipartite graph. 【目前为止,与全连接相比没有太大新意】

Symmetrical means that each visible node is connected with each hidden node (see below). Bipartite means it has two parts, or layers, and the graph is a mathematical term for a web of nodes.

At each hidden node, each input x is multiplied by its respective weight w. That is, a single input x would have three weights here, making 12 weights altogether (4 input nodes x 3 hidden nodes). The weights between two layers will always form a matrix where the rows are equal to the input nodes, and the columns are equal to the output nodes.

Each hidden node receives the four inputs multiplied by their respective weights. The sum of those products is again added to a bias (which forces at least some activations to happen), and the result is passed through the activation algorithm producing one output for each hidden node.

If these two layers were part of a deeper neural network, the outputs of hidden layer no. 1 would be passed as inputs to hidden layer no. 2, and from there through as many hidden layers as you like until they reach a final classifying layer. (For simple feed-forward movements, the RBM nodes function as an autoencoder and nothing more.)

【到此为止,暂无新事】

Reconstructions

But in this introduction to restricted Boltzmann machines, we’ll focus on how they learn to reconstruct data by themselves in an unsupervised fashion (unsupervised means without ground-truth labels in a test set), making several forward and backward passes between the visible layer and hidden layer no. 1 without involving a deeper network.

In the reconstruction phase, the activations of hidden layer no. 1 become the input in a backward pass. They are multiplied by the same weights, one per internode edge, just as x was weight-adjusted on the forward pass. The sum of those products is added to a visible-layer bias at each visible node, and the output of those operations is a reconstruction; i.e. an approximation of the original input. This can be represented by the following diagram:

Because the weights of the RBM are randomly initialized, the difference between the reconstructions and the original input is often large. You can think of reconstruction error as the difference between the values of r and the input values, and that error is then backpropagated against the RBM’s weights, again and again, in an iterative learning process until an error minimum is reached.

A more thorough explanation of backpropagation is here.

As you can see, on its forward pass, an RBM uses inputs to make predictions about node activations, or the probability of output given a weighted xp(a|x; w).

But on its backward pass, when activations are fed in and reconstructions, or guesses about the original data, are spit out, an RBM is attempting to estimate the probability of inputs x given activations a, which are weighted with the same coefficients as those used on the forward pass. This second phase can be expressed as p(x|a; w).

Together, those two estimates will lead you to the joint probability distribution of inputs x and activations a, or p(x, a).

Reconstruction does something different from regression, which estimates a continous value based on many inputs, and different from classification, which makes guesses about which discrete label to apply to a given input example.

Reconstruction is making guesses about the probability distribution of the original input; i.e. the values of many varied points at once. This is known as generative learning, which must be distinguished from the so-called discriminative learning performed by classification, which maps inputs to labels, effectively drawing lines between groups of data points.

Let’s imagine that both the input data and the reconstructions are normal curves of different shapes, which only partially overlap.

To measure the distance between its estimated probability distribution and the ground-truth distribution of the input, RBMs use Kullback Leibler Divergence. A thorough explanation of the math can be found on Wikipedia.

KL-Divergence measures the non-overlapping, or diverging, areas under the two curves, and an RBM’s optimization algorithm attempts to minimize those areas so that the shared weights, when multiplied by activations of hidden layer one, produce a close approximation of the original input. On the left is the probability distibution of a set of original input, p, juxtaposed with the reconstructed distribution q; on the right, the integration of their differences.

【如何理解Boltzmann machine是生成式模型】

3. 深度信念网络(DBN)

深度信念网络(Deep Belief Network,DBN)是早期深度生成式模型的典型代表,它由多层神经元构成,这些神经元又分为可见神经元和隐性神经元,可见单元用于接受输入,隐单元用于提取特征。网络最顶上的两层间的连接是无向的,组成联合内存 (associative memory),较低的其他层之间有连接上下的有向连接。最底层代表了数据向量 (data vectors),每一个神经元代表数据向量的一维。

DBN的组成元件是受限玻尔兹曼机(Restricted Boltzmann Machines ,RBM)。单个RBM由两层网络组成:

  • 一层叫做可见层 (visible layer),由可见单元 (visible units) 组成,用于输入训练数据;
  • 另一层叫做隐层 (Hidden layer),由隐单元 (hidden units) 组成,用作特征检测器 (feature detectors)。

RBM既是一个生成模型,也是一个无监督模型,因为它使用隐变量来描述输入数据的分布,而且这个过程没有涉及数据的标签信息。单层RBM网络的学习目标是无监督地训练网络,使得可见层节点v的分布p(v)最大可能地拟合输入样本所在样本空间的真实分布q(v)。通过计算可见向量p(v)的对数似然log p(v)的梯度来更新RBM的权值【看样子既是局部输入也是局部输出】,这个计算过程涉及到了求解RBM模型所确定分布上的期望。

对于生成式模型概率推断过程中遇到的计算某分布下函数的期望、计算边缘概率分布等复杂问题,可以采用蒙特卡洛思想近似求解。

DBN采用对比散度(Contrastive Divergence, CD-k)算法,利用Gibbs采样的方法来估计RBM的对数似然梯度。

多个RBM堆叠组成一个DBN,将隐单元的激活概率(activation probabilities)作为下一层RBM的可见层输入数据,从底向上逐层预训练。DBN是一种生成模型,通过训练其神经元间的权重,我们可以让整个神经网络按照最大概率来生成训练数据。

生成样本时,使用训练好的随机隐单元状态值,首先在网络最顶两层进行多次Gibbs采样,生成该分布下的采样,然后向下传播,得到每层的状态和最终的样本。 

训练

貌似有更好的改进,如下:

[Hinton] Neural Networks for Machine Learning - Hopfield Nets and Boltzmann Machine的更多相关文章

  1. [C3] Andrew Ng - Neural Networks and Deep Learning

    About this Course If you want to break into cutting-edge AI, this course will help you do so. Deep l ...

  2. Neural Networks and Deep Learning

    Neural Networks and Deep Learning This is the first course of the deep learning specialization at Co ...

  3. 第四节,Neural Networks and Deep Learning 一书小节(上)

    最近花了半个多月把Mchiael Nielsen所写的Neural Networks and Deep Learning这本书看了一遍,受益匪浅. 该书英文原版地址地址:http://neuralne ...

  4. 【DeepLearning学习笔记】Coursera课程《Neural Networks and Deep Learning》——Week2 Neural Networks Basics课堂笔记

    Coursera课程<Neural Networks and Deep Learning> deeplearning.ai Week2 Neural Networks Basics 2.1 ...

  5. 【DeepLearning学习笔记】Coursera课程《Neural Networks and Deep Learning》——Week1 Introduction to deep learning课堂笔记

    Coursera课程<Neural Networks and Deep Learning> deeplearning.ai Week1 Introduction to deep learn ...

  6. Neural Networks and Deep Learning学习笔记ch1 - 神经网络

    近期開始看一些深度学习的资料.想学习一下深度学习的基础知识.找到了一个比較好的tutorial,Neural Networks and Deep Learning,认真看完了之后觉得收获还是非常多的. ...

  7. [Hinton] Neural Networks for Machine Learning - Basic

    Link: Neural Networks for Machine Learning - 多伦多大学 Link: Hinton的CSC321课程笔记1 Link: Hinton的CSC321课程笔记2 ...

  8. [Hinton] Neural Networks for Machine Learning - RNN

    Link: Neural Networks for Machine Learning - 多伦多大学 Link: Hinton的CSC321课程笔记 补充: 参见cs231n 2017版本,ppt写得 ...

  9. [Hinton] Neural Networks for Machine Learning - Converage

    Link: Neural Networks for Machine Learning - 多伦多大学 Link: Hinton的CSC321课程笔记 Ref: 神经网络训练中的Tricks之高效BP ...

随机推荐

  1. Oracle INTERVAL

    转自:http://www.cnblogs.com/ungshow/archive/2009/04/11/1433747.html INTERVAL DAY TO SECOND数据类型 Oracle语 ...

  2. ArcGIS教程:曲率

    摘要 计算栅格表面的曲率,包括剖面曲率和平面曲率. 用法 · 主要输出结果为每个像元的表面曲率,该值通过将该像元与八个相邻像元拟合而得.曲率是表面的二阶导数,或者可称之为坡度的坡度.可供选择的输出曲率 ...

  3. SSH框架搭建详细图文教程(转)

    这篇文章看的我醍醐灌顶的感觉,比之前本科时候学习的SSH架构 要清晰数倍  非常感觉这篇博主的文章 文章链接为:http://blog.sina.com.cn/s/blog_a6a6b3cd01017 ...

  4. 用oc写爬虫之HTMLParser

    最近刚好有一些小需求,就尝试用OC来写一些简单的爬虫抓取网页上的数据: 发现了一个非常好用的HTMLParser工具 ; 语法是OC的,通过分析网页结构:可以很轻松的抓取需要的数据: 但是提前需要了解 ...

  5. Cache Line 伪共享发现与优化

    https://yq.aliyun.com/articles/465504 Cache Line 伪共享发现与优化 作者:吴一昊,杨勇 1. 关于本文 本文基于 Joe Mario 的一篇博客 改编而 ...

  6. Nginx关于跨域的配置

    参考:https://www.cnblogs.com/sunmmi/articles/5956554.html 最终我们项目中的配置如下: upstream boss_gateway { server ...

  7. 基于Centos搭建Python Web 环境搭建教程

    CentOS 7.2 64 位操作系统 安装 setuptools 工具 安装 因为之后我们需要安装 Django ,而 Django 需要用这个工具,所以我们需要先安装 setuptools 工具. ...

  8. 1 翻译系列:什么是Code First(EF 6 Code First 系列)

    原文链接:http://www.entityframeworktutorial.net/code-first/what-is-code-first.aspx EF 6 Code-First系列文章目录 ...

  9. [转]数据库性能优化之SQL语句优化1

    一.问题的提出 在应用系统开发初期,由于开发数据库数据比较少,对于查询SQL语句,复杂视图的的编写等体会不出SQL语句各种写法的性能优劣,但是如果将应用系统提交实际应用后,随着数据库中数据的增加,系统 ...

  10. .NET+MVC+ORACLE存储分页查询一前端实现

    MemberList.cshtml @{    ViewBag.Title = "用户列表";    Layout = null;} <!DOCTYPE html> & ...