Machine Learning Trick of the Day (1): Replica Trick
Machine Learning Trick of the Day (1): Replica Trick
'Tricks' of all sorts are used throughout machine learning, in both research and in production settings. These tricks allow us to address many different types of data analysis problems, being roughly of either an analytical, statistical, algorithmic, or numerical flavour. Today's trick is in the analytical class and comes to us from statistical physics: the popular Replica trick.
The replica trick [1][2][3] is used for analytical computation of log-normalising constants (or log-partition functions). More formally, the replica trick provides one of the tools needed for a replica analysis of a probabilistic model — a theoretical analysis of the the properties and expected behaviour of a model. Replica analysis has been used to provide an insight into almost all model classes popular in machine learning today, including linear (GLM) regression, latent variable models, multi-layer neural networks, and Gaussian processes, amongst others.
We are often interested in making statements about the generalisation ability of our models; whereas approaches such as PAC learning provide for a worse-case analysis, replica analysis allows for statements in the average case that can be more useful, especially in verifying our numerical implementations. Replica analysis can also be used to provide insight into transitions that might occur during learning, to show how non-linearities within our models affect learning, and to study the effect of noise in the learning dynamics [4]. This post aims to provide a brief review of the replica trick, the steps typically involved in a replica analysis, and links to the many ways it has been used to provide insight into popular machine learning approaches.
Replica Trick
Consider a probabilistic model whose normalising constant is Z(x), for data x. The replica trick says that:
The left-hand side is often called the quenched free energy (free energy averaged over multiple data sets). The replica trick transforms the expectation of the log into the log of an expectation, with the result involving the nth-power of the normalising constant Z, i.e. the expectation is computed by replicating the normalising constant n times.
To see how we obtained this expression, we will exploit two useful identities.
1. Exponential identity: xn=exp(nlnx)=limn→01+nlnx
The application of this identity is often referred to as the replica trick, since it is what allows us to rewrite the initial expectation in its replicated form.
We can use these two identities to show that:
which is the identity we sought out. But this alone does not help us much since the partition function it is not any easier to compute. Part of the trick lies in performing the analysis assuming that n is an integer, and computing the integral n times, i.e. using nreplicas. To compute the limit we do a continuation to the real line (and hope that the result will be valid). This final step is hard to justify and is one of the critiques of replica analysis.
Replica Analysis
Given the replica trick, we can now conduct a replica analysis, also known as a Gardener analysis, of our model[1][5]. This typically involves the following steps:
- Apply the replica trick to the integral problem (computation of average free energy).
- Solving the unknown integral, typically by the saddle-point integration scheme (sometimes known as the method of steepest descent). This will not always be easy to do, but certain assumptions can help.
- Perform an analytic continuation to determine the limit n→0. This step (and the previous one) involves the assumption (or ansatz) of the structure of the solution, with the typical ansatz known as replica symmetry. Replica symmetry assumes that the replicas are symmetric under permutation of their labels. This is reasonable if we work with models with i.i.d. data assumptions where the specific allocation of data to any of the replicas will not matter. More advanced analyses make use other assumptions.
Following these steps, some of the popular models for which you can see replica analysis in action are:
- Binary Perceptron learning
- Linear (GLM) regression/perceptrons
- Multilayer (Bayesian) Neural networks
- Sparse Bayesian Classification
- Gaussian processes
- Factor analysis
- Sparse factor analysis
- Compressed sensing
Summary
One trick we have available in machine learning for the theoretical analysis of our models is the replica trick. The replica trick, when combined with an analytic continuation that allows us to compute limits, and the saddle-point method for integration, allows us to perform a replica analysis. This analysis allows us to examine the generalisation ability of our models, study transitions that might occur during learning, understand how non-linearities within our models affect learning, and to study the effect of noise in the learning dynamics. Such analysis provides us with a deeper insight into our models and how to train them, and provides needed stepping stones on our path towards building ever more powerful machine learning systems.
Some References
| [1] | Andreas Engel, Christian Van den Broeck, Statistical mechanics of learning, , 2001 |
| [2] | Kevin John Sharp, Effective Bayesian inference for sparse factor analysis models, , 2011 |
| [3] | Manfred Opper, Statistical mechanics of learning: Generalization, The Handbook of Brain Theory and Neural Networks,, 1995 |
| [4] | HS Seung, Haim Sompolinsky, N Tishby, Statistical mechanics of learning from examples, Physical Review A, 1992 |
| [5] | Tommaso Castellani, Andrea Cavagna, Spin-glass theory for pedestrians, Journal of Statistical Mechanics: Theory and Experiment, 2005 |
Machine Learning Trick of the Day (1): Replica Trick的更多相关文章
- Machine Learning Trick of the Day (2): Gaussian Integral Trick
Machine Learning Trick of the Day (2): Gaussian Integral Trick Today's trick, the Gaussian integral ...
- Kernel Functions for Machine Learning Applications
In recent years, Kernel methods have received major attention, particularly due to the increased pop ...
- Advice for applying Machine Learning
https://jmetzen.github.io/2015-01-29/ml_advice.html Advice for applying Machine Learning This post i ...
- Machine Learning for Developers
Machine Learning for Developers Most developers these days have heard of machine learning, but when ...
- GoodReads: Machine Learning (Part 3)
In the first installment of this series, we scraped reviews from Goodreads. In thesecond one, we per ...
- 学习笔记之Machine Learning Crash Course | Google Developers
Machine Learning Crash Course | Google Developers https://developers.google.com/machine-learning/c ...
- How do I learn mathematics for machine learning?
https://www.quora.com/How-do-I-learn-mathematics-for-machine-learning How do I learn mathematics f ...
- 5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics
5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics Where d ...
- Portal:Machine learning机器学习:门户
Machine learning Machine learning is a scientific discipline that explores the construction and stud ...
随机推荐
- 在新的电脑上部署 Hexo,保留原有博客的方法
用U盘从旧的电脑拷贝整个blog文件夹. 在新的电脑上装好git并配置好用户名和密钥. 安装 node.js 安装 hexo:npm install hexo-cli -g 用U盘把blog文件夹拷贝 ...
- Week2-作业1-part2.阅读与思考
第一章.概论 原文: 在成熟的航空工业中,一个飞机发动机从构思到最后运行,不知道经历过多少人.多少工序.多少流程.多少相关知识的验证.我们无法想象,某个商用型号的发动机在飞行时发现问题,最初的设计师会 ...
- 转 彻底理解js中的&&和||
javascript中,&&和||的用法比较神奇,经常用在对象上,例如a || b,如果a不存在,则返回b.a && b,如果a存在,则返回b,否则返回a. 光这样看, ...
- rpm安装和二进制安装
rpm包安装 Tomcat RPM安装(先安装JDK + 再安装Tomcat) 1:升级系统自带的JDK(也可以使用oracle的JDK) yum install -y java-1.8.0-open ...
- [转帖]升级 Ubuntu,解决登录时提示有软件包可以更新的问题
升级 Ubuntu,解决登录时提示有软件包可以更新的问题 2017年12月05日 11:58:17 阅读数:2953更多 个人分类: ubuntu Connecting to ... Connecti ...
- Qt之美(一):d指针/p指针详解(解释二进制兼容,以及没有D指针就会崩溃的例子。有了D指针,所使用的对象大小永远不会改变,它就是该指针的大小。这个指针就被称作D指针)good
Translated by mznewfacer 2011.11.16 首先,看了Xizhi Zhu 的这篇Qt之美(一):D指针/私有实现,对于很多批评不美的同路人,暂且不去评论,只是想支持 ...
- hdu 1556 Color the ball(树状数组)
链接:http://acm.hdu.edu.cn/showproblem.php?pid=1556 题意:N个气球排成一排,从左到右依次编号为1,2,3....N.每次给定2个整数[a,b]之间的气球 ...
- 【Java并发编程】之九:死锁
当线程需要同时持有多个锁时,有可能产生死锁.考虑如下情形: 线程A当前持有互斥所锁lock1,线程B当前持有互斥锁lock2.接下来,当线程A仍然持有lock1时,它试图获取lock2,因为线程B ...
- C++模式学习------单例模式
单例(Singleton)模式,是一种常用的软件设计模式.在应用这个模式时,单例对象的类必须保证只有一个实例存在.许多时候整个系统只需要拥有一个的全局对象,这样有利于我们协调系统整体的行为.例如一些类 ...
- 在CentOS中安装与配置SVN的方法
安装说明系统环境:CentOS-6.4安装方式:yum install (源码安装容易产生版本兼容的问题)安装软件:系统自动下载SVN软件 1.检查已安装版本 #检查是否安装了低版本的SVN[root ...