Machine Learning Trick of the Day (2): Gaussian Integral Trick

Today's trick, the Gaussian integral trick, is one that allows us to re-express a (potentially troublesome) function in an alternative form, in particular, as an integral of a Gaussian against another function — integrals against a Gaussian turn out not to be too troublesome and can provide many statistical and computational benefits. One popular setting where we can exploit such an alternative representation is for inference in discrete undirected graphical models (think Boltzmann machines or discrete Markov random fields). In such cases, this trick lets us transform our discrete problem into one that has an underlying continuous (Gaussian) representation, which we can then solve using our other machine learning tricks. But this is part of a more general strategy that is used throughout machine learning, whether in Bayesian posterior analysis, deep learning or kernel machines. This trick has many facets, and this post explores the Gaussian integral trick and its more general form, auxiliary variable augmentation.

Gaussian integral trick state expansion.

Gaussian Integral Trick

The Gaussian integral trick is one of the statistical flavour and allows us to turn a function that is an exponential in x2 into an exponential that is linear in x. We do this by augmenting a linear function with auxiliary variables and then integrating over these auxiliary variables, hence a form of auxiliary variable augmentation. The simplest form of this trick is to apply the following identity:

∫exp(−ay2+xy)dy=πa−−√exp(14ax2)

We can prove this to ourselves by exploiting our knowledge of Gaussian distributions (which this looks strikingly similar to) and our ability to complete the square when we see such quadratic forms. Separating out the scaling factor a we get:

∫exp(−ay2+xy)dy=∫exp{−a(y2−xay)}dy

Which by completing the square becomes:

∫exp{−a(y−x2a)2+(x24a)}dy
=exp(x24a)∫exp{−a(y−x2a)2}dy
=exp(x24a)πa−−√

where the last integral is solved by matching it to a Gaussian with mean μ=x2a and variance σ2=12a, which we know has a normalisation of 2πσ2−−−−√ — this last step shows how this trick got its name.

The 'Gaussian integral trick' was coined and initially described by Hertz et al. [Ch10, pg 253] [1], and is closely related to the Hubbard-Stratonovich transform (which provides the augmentation for exp(−x2)).

Transforming Binary MRFs

This trick is also valid in the multivariate case, which is what we will most often be interested in. One good place to see this trick in action is when applied to binary MRFs or Boltzmann machines. Binary MRFs have a joint probability, for binary random variables x:

p(x)=1Zexp(θ⊤x+x⊤Wx)

where Z, is the normalising constant. The (multivariate) Gaussian integral trick can be applied to the quadratic term in this energy function allowing for an insightful analysis andinteresting reparameterisation that allows for alternative inference methods to be used. For example:

Variable Augmentation

Graphical model for a general augmentation.

This trick is a special case of a more general strategy called variable (or data) augmentation — I prefervariable augmentation to data augmentation [4], since it will not be confused with observed data preprocessing and manipulation. In this setting, the introduction of auxiliary variables has been most often used to develop better mixing Markov chain Monte Carlo samplers. This is because after augmentation, the conditional distributions of the model often have highly convenient and easy-to-sample-from forms.

One recent example of variable augmentation (and that parallels our initial trick) is the Polya-Gamma variable augmentation. In this case, we can express the sigmoid function that appears when computing the mean of the Bernoulli distribution, as:

σ(x)=11+exp(−x)=12√exp(x2)∫∞0exp(−12yx2)p(y)dy

where p(y) has a Polya-Gamma distribution [5]. This nicely transforms the sigmoid into a Gaussian convolution (integrated against a Polya-Gamma random variable) — and gives us a different type of Gaussian integral trick. In fact, similar Gaussian integral tricks are abound, and are typically described under the heading of Gaussian scale-mixture distributions.

There are many examples of variable augmentation to be found, especially for binary and categorical distributions. Much guidance is available, and some papers that demonstrate this are:

Summary

The Gaussian integral trick is just one from a large class of variable augmentation strategies that are widely used in statistics and machine learning. They work by introducing auxiliary variables into our problems that induce an alternative representation, and that then give us additional statistical and computational benefits. Such methods lie at the heart of efficient inference algorithms, whether these be Monte Carlo or deterministic approximate inference schemes, making variable augmentation a favourite in our box of machine learning tricks.


Some References
[1] John Hertz, Anders Krogh, Richard G Palmer, Introduction to the theory of neural computation, , 1991
[2] Yichuan Zhang, Zoubin Ghahramani, Amos J Storkey, Charles A Sutton, Continuous relaxations for discrete hamiltonian monte carlo, Advances in Neural Information Processing Systems, 2012
[3] Ari Pakman, Liam Paninski, Auxiliary-variable exact Hamiltonian Monte Carlo samplers for binary distributions, Advances in Neural Information Processing Systems, 2013
[4] James H Albert, Siddhartha Chib, Bayesian analysis of binary and polychotomous response data, Journal of the American statistical Association, 1993
[5] Nicholas G Polson, James G Scott, Jesse Windle, Bayesian inference for logistic models using P\'olya--Gamma latent variables, Journal of the American Statistical Association, 2013

Machine Learning Trick of the Day (2): Gaussian Integral Trick的更多相关文章

  1. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  2. Machine Learning Trick of the Day (1): Replica Trick

    Machine Learning Trick of the Day (1): Replica Trick 'Tricks' of all sorts are used throughout machi ...

  3. Kernel Functions for Machine Learning Applications

    In recent years, Kernel methods have received major attention, particularly due to the increased pop ...

  4. Machine Learning for Developers

    Machine Learning for Developers Most developers these days have heard of machine learning, but when ...

  5. 学习笔记之Machine Learning Crash Course | Google Developers

    Machine Learning Crash Course  |  Google Developers https://developers.google.com/machine-learning/c ...

  6. How do I learn mathematics for machine learning?

    https://www.quora.com/How-do-I-learn-mathematics-for-machine-learning   How do I learn mathematics f ...

  7. Machine Learning and Data Mining(机器学习与数据挖掘)

    Problems[show] Classification Clustering Regression Anomaly detection Association rules Reinforcemen ...

  8. [C2P1] Andrew Ng - Machine Learning

    About this Course Machine learning is the science of getting computers to act without being explicit ...

  9. 【机器学习Machine Learning】资料大全

    昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...

随机推荐

  1. VS2013的安装与C#进行简单单元测试(英文版教程)

    这次安装这个软件可是花了我不少时间,其中遇到的问题不言而喻,下面讲解一下我完成这次作业以及分享一些个人体会吧! 第一步:提供下载地址(https://www.visualstudio.com/down ...

  2. JVM的GC策略

    1 前言 GC(Garbage Collect)是jvm对于内存管理的核心功能,正是因为它才让Java程序员从内存释放的苦海中脱离出来,所以作为一个程序员都有必要去了解一下他的原理. 说一句题外话,我 ...

  3. Java网络编程一:基础知识详解

    网络基础知识 1.OSI分层模型和TCP/IP分层模型的对应关系 这里对于7层模型不展开来讲,只选择跟这次系列主题相关的知识点介绍. 2.七层模型与协议的对应关系 网络层   ------------ ...

  4. Java实现小学四则运算练习

     Github项目地址:https://github.com/feser-xuan/Arithmetic.git 1.需求分析 软件基本功能要求如下: 程序可接收一个输入参数n,然后随机产生n道加减乘 ...

  5. SEO优化之HTML代码优化最重要的5个标签

    众所周知,HTML代码一直是搜索引擎抓取的重点.搜索引擎会将HTML中的某些元素标签作为影响网页排名的重要依据 在我们之前的文章中也或多或少地向大家介绍了有关HTML代码的优化技巧,接下来将系统地讲解 ...

  6. [转帖] IPsec相关知识 --未知来源

    目  录 IPsec IPsec简介 IPsec的协议实现 IPsec基本概念 加密卡 IPsec虚拟隧道接口 使用IPsec保护IPv6路由协议 IKE IKE简介 IKE的安全机制 IKE的交换过 ...

  7. C++模式学习------适配器模式

    适配器模式: 适配器模式属于结构型的设计模式,是将一个类的接口转换成使用方希望的另外一个接口,这样使得原本由于接口不兼容而不能一起工作的那些类可以一起工作. 适配器模式有两种: 1.类的适配器:继承不 ...

  8. python selenium wait方法

    遇到一个网站运行很慢,所以要等待某个元素显示出来之后再进行操作,自己手上的书上没有例子可以直接用 发现一篇文章:http://www.cnblogs.com/yoyoketang/p/6517477. ...

  9. http站点如何启用为https站点?对收录的影响

    首先看一下百度官方对https站点的态度:百度开放收录https站点公告 百度搜索再次推出:全面支持https页面直接收录:另外从相关性的角度,百度搜索引擎认为权值相同的站点,采用https协议的页面 ...

  10. MT【149】和式变形

    (2018浙江省赛14题)将$2n(n\ge2)$个不同的整数分成两组$a_1,a_2,\cdots,a_n;b_1,b_2,\cdots,b_n$.证明:$\sum\limits_{1\le i\l ...