Machine Learning Trick of the Day (1): Replica Trick

'Tricks' of all sorts are used throughout machine learning, in both research and in production settings. These tricks allow us to address many different types of data analysis problems, being roughly of either an analytical, statistical, algorithmic, or numerical flavour. Today's trick is in the analytical class and comes to us from statistical physics: the popular Replica trick.

The replica trick [1][2][3] is used for analytical computation of log-normalising constants (or log-partition functions). More formally, the replica trick provides one of the tools needed for a replica analysis of a probabilistic model — a theoretical analysis of the the properties and expected behaviour of a model. Replica analysis has been used to provide an insight into almost all model classes popular in machine learning today, including linear (GLM) regressionlatent variable modelsmulti-layer neural networks, and Gaussian processes, amongst others.

We are often interested in making statements about the generalisation ability of our models; whereas approaches such as PAC learning provide for a worse-case analysis, replica analysis allows for statements in the average case that can be more useful, especially in verifying our numerical implementations. Replica analysis can also be used to provide insight into transitions that might occur during learning, to show how non-linearities within our models affect learning, and to study the effect of noise in the learning dynamics [4]. This post aims to provide a brief review of the replica trick, the steps typically involved in a replica analysis, and links to the many ways it has been used to provide insight into popular machine learning approaches.

Replica Trick

Consider a probabilistic model whose normalising constant is Z(x), for data x. The replica trick says that:

E[lnZ]=limn→01nlnE[Zn]

The left-hand side is often called the quenched free energy (free energy averaged over multiple data sets). The replica trick transforms the expectation of the log into the log of an expectation, with the result involving the nth-power of the normalising constant Z, i.e. the expectation is computed by replicating the normalising constant n times.

To see how we obtained this expression, we will exploit two useful identities.

1. Exponential identity: xn=exp(nlnx)=limn→01+nlnx

⇒lnx=limn→0xn−1n(=limn→0ddnxn)

The application of this identity is often referred to as the replica trick, since it is what allows us to rewrite the initial expectation in its replicated form.

2. Logarithmic identity:

ln(1+nx)≈nx, if nx≪1

We can use these two identities to show that:

nE[lnZ]=limn→0ln(1+nE[lnZ]) using (1)
E[lnZ]=limn→01nln(1+nE[lnZ])
=limn→01nln(1+nE[Zn]−1n) using (2) 
∴E[lnZ]=limn→01nlnE[Zn]

which is the identity we sought out. But this alone does not help us much since the partition function it is not any easier to compute. Part of the trick lies in performing the analysis assuming that n is an integer, and computing the integral n times, i.e. using nreplicas. To compute the limit we do a continuation to the real line (and hope that the result will be valid). This final step is hard to justify and is one of the critiques of replica analysis.

Replica Analysis

Given the replica trick, we can now conduct a replica analysis, also known as a Gardener analysis, of our model[1][5]. This typically involves the following steps:

  • Apply the replica trick to the integral problem (computation of average free energy).
  • Solving the unknown integral, typically by the saddle-point integration scheme (sometimes known as the method of steepest descent). This will not always be easy to do, but certain assumptions can help.
  • Perform an analytic continuation to determine the limit n→0. This step (and the previous one) involves the assumption (or ansatz) of the structure of the solution, with the typical ansatz known as replica symmetry. Replica symmetry assumes that the replicas are symmetric under permutation of their labels. This is reasonable if we work with models with i.i.d. data assumptions where the specific allocation of data to any of the replicas will not matter. More advanced analyses make use other assumptions.

Following these steps, some of the popular models for which you can see replica analysis in action are:

Summary

One trick we have available in machine learning for the theoretical analysis of our models is the replica trick. The replica trick, when combined with an analytic continuation that allows us to compute limits, and the saddle-point method for integration, allows us to perform a replica analysis. This analysis allows us to examine the generalisation ability of our models, study transitions that might occur during learning, understand how non-linearities within our models affect learning, and to study the effect of noise in the learning dynamics. Such analysis provides us with a deeper insight into our models and how to train them, and provides needed stepping stones on our path towards building ever more powerful machine learning systems.


Some References
[1] Andreas Engel, Christian Van den Broeck, Statistical mechanics of learning, , 2001
[2] Kevin John Sharp, Effective Bayesian inference for sparse factor analysis models, , 2011
[3] Manfred Opper, Statistical mechanics of learning: Generalization, The Handbook of Brain Theory and Neural Networks,, 1995
[4] HS Seung, Haim Sompolinsky, N Tishby, Statistical mechanics of learning from examples, Physical Review A, 1992
[5] Tommaso Castellani, Andrea Cavagna, Spin-glass theory for pedestrians, Journal of Statistical Mechanics: Theory and Experiment, 2005

Machine Learning Trick of the Day (1): Replica Trick的更多相关文章

  1. Machine Learning Trick of the Day (2): Gaussian Integral Trick

    Machine Learning Trick of the Day (2): Gaussian Integral Trick Today's trick, the Gaussian integral ...

  2. Kernel Functions for Machine Learning Applications

    In recent years, Kernel methods have received major attention, particularly due to the increased pop ...

  3. Advice for applying Machine Learning

    https://jmetzen.github.io/2015-01-29/ml_advice.html Advice for applying Machine Learning This post i ...

  4. Machine Learning for Developers

    Machine Learning for Developers Most developers these days have heard of machine learning, but when ...

  5. GoodReads: Machine Learning (Part 3)

    In the first installment of this series, we scraped reviews from Goodreads. In thesecond one, we per ...

  6. 学习笔记之Machine Learning Crash Course | Google Developers

    Machine Learning Crash Course  |  Google Developers https://developers.google.com/machine-learning/c ...

  7. How do I learn mathematics for machine learning?

    https://www.quora.com/How-do-I-learn-mathematics-for-machine-learning   How do I learn mathematics f ...

  8. 5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics

    5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics Where d ...

  9. Portal:Machine learning机器学习:门户

    Machine learning Machine learning is a scientific discipline that explores the construction and stud ...

随机推荐

  1. Maven解读:强大的依赖体系

    Github地址:https://github.com/zwjlpeng/Maven_Detail Maven最大的好处就是能够很方便的管理项目对第三方Jar包的依赖,只需在Pom文件中添加几行配置文 ...

  2. 弹出提示框的方式——java

    1.显示一个错误对话框,该对话框显示的 message 为 'alert': JOptionPane.showMessageDialog(null, "alert", " ...

  3. [2017BUAA软工]个人项目心得体会:数独

    心得体会 回顾此次个人项目,感受比较复杂,最明显的一点是--累!代码编写.单元测试.代码覆盖.性能优化,环环相扣,有种从作业发布开始就一直在赶DDL的感觉,但是很充实,也学习到和体验了很多东西.最令人 ...

  4. Qt4程序在windows平台下打包发布

    一.打包成绿色版 将源码编译成release版,运行*.exe文件,提示缺少*.dll,在Qt安装目录中找到相应的dll文件(一般在bin目录下),将dll文件复制到exe文件目录下即可. 二.打包成 ...

  5. C 输入 & 输出——Day03

    当我们提到输入时,这意味着要向程序填充一些数据.输入可以是以文件的形式或从命令行中进行.C 语言提供了一系列内置的函数来读取给定的输入,并根据需要填充到程序中. 当我们提到输出时,这意味着要在屏幕上. ...

  6. FutureTask 源码解析

    FutureTask 源码解析 版权声明:本文为本作者原创文章,转载请注明出处.感谢 码梦为生| 刘锟洋 的投稿 站在使用者的角度,future是一个经常在多线程环境下使用的Runnable,使用它的 ...

  7. 何登成大神对Innodb加锁的分析

    背景 MySQL/InnoDB的加锁分析,一直是一个比较困难的话题.我在工作过程中,经常会有同事咨询这方面的问题.同时,微博上也经常会收到MySQL锁相关的私信,让我帮助解决一些死锁的问题.本文,准备 ...

  8. Go 示例测试实现原理剖析

    简介 示例测试相对于单元测试和性能测试来说,其实现机制比较简单.它没有复杂的数据结构,也不需要额外的流程控制,其核心工作原理在于收集测试过程中的打印日志,然后与期望字符串做比较,最后得出是否一致的报告 ...

  9. CF600E Lomsat gelral 【线段树合并】

    题目链接 CF600E 题解 容易想到就是线段树合并,维护每个权值区间出现的最大值以及最大值位置之和即可 对于每个节点合并一下两个子节点的信息 要注意叶子节点信息的合并和非叶节点信息的合并是不一样的 ...

  10. 【bzoj2594】 Wc2006—水管局长数据加强版

    http://www.lydsy.com/JudgeOnline/problem.php?id=2594 (题目链接) 题意 给出一个带边权的无向简单,要求维护两个操作,删除${u,v}$之间的连边: ...