Machine Learning Trick of the Day (1): Replica Trick

'Tricks' of all sorts are used throughout machine learning, in both research and in production settings. These tricks allow us to address many different types of data analysis problems, being roughly of either an analytical, statistical, algorithmic, or numerical flavour. Today's trick is in the analytical class and comes to us from statistical physics: the popular Replica trick.

The replica trick [1][2][3] is used for analytical computation of log-normalising constants (or log-partition functions). More formally, the replica trick provides one of the tools needed for a replica analysis of a probabilistic model — a theoretical analysis of the the properties and expected behaviour of a model. Replica analysis has been used to provide an insight into almost all model classes popular in machine learning today, including linear (GLM) regressionlatent variable modelsmulti-layer neural networks, and Gaussian processes, amongst others.

We are often interested in making statements about the generalisation ability of our models; whereas approaches such as PAC learning provide for a worse-case analysis, replica analysis allows for statements in the average case that can be more useful, especially in verifying our numerical implementations. Replica analysis can also be used to provide insight into transitions that might occur during learning, to show how non-linearities within our models affect learning, and to study the effect of noise in the learning dynamics [4]. This post aims to provide a brief review of the replica trick, the steps typically involved in a replica analysis, and links to the many ways it has been used to provide insight into popular machine learning approaches.

Replica Trick

Consider a probabilistic model whose normalising constant is Z(x), for data x. The replica trick says that:

E[lnZ]=limn→01nlnE[Zn]

The left-hand side is often called the quenched free energy (free energy averaged over multiple data sets). The replica trick transforms the expectation of the log into the log of an expectation, with the result involving the nth-power of the normalising constant Z, i.e. the expectation is computed by replicating the normalising constant n times.

To see how we obtained this expression, we will exploit two useful identities.

1. Exponential identity: xn=exp(nlnx)=limn→01+nlnx

⇒lnx=limn→0xn−1n(=limn→0ddnxn)

The application of this identity is often referred to as the replica trick, since it is what allows us to rewrite the initial expectation in its replicated form.

2. Logarithmic identity:

ln(1+nx)≈nx, if nx≪1

We can use these two identities to show that:

nE[lnZ]=limn→0ln(1+nE[lnZ]) using (1)
E[lnZ]=limn→01nln(1+nE[lnZ])
=limn→01nln(1+nE[Zn]−1n) using (2) 
∴E[lnZ]=limn→01nlnE[Zn]

which is the identity we sought out. But this alone does not help us much since the partition function it is not any easier to compute. Part of the trick lies in performing the analysis assuming that n is an integer, and computing the integral n times, i.e. using nreplicas. To compute the limit we do a continuation to the real line (and hope that the result will be valid). This final step is hard to justify and is one of the critiques of replica analysis.

Replica Analysis

Given the replica trick, we can now conduct a replica analysis, also known as a Gardener analysis, of our model[1][5]. This typically involves the following steps:

  • Apply the replica trick to the integral problem (computation of average free energy).
  • Solving the unknown integral, typically by the saddle-point integration scheme (sometimes known as the method of steepest descent). This will not always be easy to do, but certain assumptions can help.
  • Perform an analytic continuation to determine the limit n→0. This step (and the previous one) involves the assumption (or ansatz) of the structure of the solution, with the typical ansatz known as replica symmetry. Replica symmetry assumes that the replicas are symmetric under permutation of their labels. This is reasonable if we work with models with i.i.d. data assumptions where the specific allocation of data to any of the replicas will not matter. More advanced analyses make use other assumptions.

Following these steps, some of the popular models for which you can see replica analysis in action are:

Summary

One trick we have available in machine learning for the theoretical analysis of our models is the replica trick. The replica trick, when combined with an analytic continuation that allows us to compute limits, and the saddle-point method for integration, allows us to perform a replica analysis. This analysis allows us to examine the generalisation ability of our models, study transitions that might occur during learning, understand how non-linearities within our models affect learning, and to study the effect of noise in the learning dynamics. Such analysis provides us with a deeper insight into our models and how to train them, and provides needed stepping stones on our path towards building ever more powerful machine learning systems.


Some References
[1] Andreas Engel, Christian Van den Broeck, Statistical mechanics of learning, , 2001
[2] Kevin John Sharp, Effective Bayesian inference for sparse factor analysis models, , 2011
[3] Manfred Opper, Statistical mechanics of learning: Generalization, The Handbook of Brain Theory and Neural Networks,, 1995
[4] HS Seung, Haim Sompolinsky, N Tishby, Statistical mechanics of learning from examples, Physical Review A, 1992
[5] Tommaso Castellani, Andrea Cavagna, Spin-glass theory for pedestrians, Journal of Statistical Mechanics: Theory and Experiment, 2005

Machine Learning Trick of the Day (1): Replica Trick的更多相关文章

  1. Machine Learning Trick of the Day (2): Gaussian Integral Trick

    Machine Learning Trick of the Day (2): Gaussian Integral Trick Today's trick, the Gaussian integral ...

  2. Kernel Functions for Machine Learning Applications

    In recent years, Kernel methods have received major attention, particularly due to the increased pop ...

  3. Advice for applying Machine Learning

    https://jmetzen.github.io/2015-01-29/ml_advice.html Advice for applying Machine Learning This post i ...

  4. Machine Learning for Developers

    Machine Learning for Developers Most developers these days have heard of machine learning, but when ...

  5. GoodReads: Machine Learning (Part 3)

    In the first installment of this series, we scraped reviews from Goodreads. In thesecond one, we per ...

  6. 学习笔记之Machine Learning Crash Course | Google Developers

    Machine Learning Crash Course  |  Google Developers https://developers.google.com/machine-learning/c ...

  7. How do I learn mathematics for machine learning?

    https://www.quora.com/How-do-I-learn-mathematics-for-machine-learning   How do I learn mathematics f ...

  8. 5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics

    5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics Where d ...

  9. Portal:Machine learning机器学习:门户

    Machine learning Machine learning is a scientific discipline that explores the construction and stud ...

随机推荐

  1. 灵悟礼品网上专卖店——第三阶段Sprint

    一.小组成员: 洪雪意(产品负责人) 陈淑筠(Master) 二.组内人员任务情况 已完成的任务: 陈淑筠:主页面的设计 洪雪意:导航条的改进和页面中插入页面的功能 正在进行的任务: 陈淑筠:主页面的 ...

  2. 【Coursera】经验风险最小化

    一.经验风险最小化 1.有限假设类情形 对于Chernoff bound 不等式,最直观的解释就是利用高斯分布的图象.而且这个结论和中心极限定律没有关系,当m为任意值时Chernoff bound均成 ...

  3. pktgen-dpdk 运行 run.py 报错 Config file 'default' not found 解决方法

    pktgen 操作手册:http://pktgen-dpdk.readthedocs.io/en/latest/getting_started.html 执行到这一步时: $ cd <Pktge ...

  4. 个人作业4——alpha阶段个人总结(201521123003 董美凤)

    一.个人总结 在alpha 结束之后, 每位同学写一篇个人博客, 总结自己的alpha 过程: 请用自我评价表:http://www.cnblogs.com/xinz/p/3852177.html 有 ...

  5. Beta阶段冲刺前准备

    第 1 篇 Scrum 冲刺博客 1.新成员 暂无新成员,等一个有缘人 团队成员: 刘阳航(captain) 陈文俊 林庭亦 郑子熙 2.讨论是否需要更换团队的PM 经过团队讨论,我们决定不更换团队P ...

  6. 【Web Shell】- 技术剖析中国菜刀 - Part II

    在第一部分,简单描述了中国菜刀的基本功能.本文我将剖析中国菜刀的平台多功能性.传输机制.交互模式和检测.我希望通过我的讲解,您能够根据您的环境检测出并清除它. 平台 那么中国菜刀可以在哪些平台上运行? ...

  7. Memcache CAS协议介绍及使用

    1.什么是CAS 所谓CAS,check and set,在写操作时,先检查是否被别的线程修改过. 基本原理非常简单,一言以蔽之,就是"版本号".每个存储的数据对象,多有一个版本号 ...

  8. JAVA相关概念(一)

    依赖注入和控制反转 首先,这两个词是同一个概念的不同角度的说法,依赖注入感觉是对描述了如何实现,而控制反转则像是描述了一种思想. 依赖注入的流行可以说是由spring的流行带动的,只要是做过sprin ...

  9. Spring Boot 学习笔记1---初体验之3分钟启动你的Web应用

    前言 早在去年就简单的使用了一下Spring Boot,当时就被其便捷的功能所震惊.但是那是也没有深入的研究,随着其在业界被应用的越来越广泛,因此决定好好地深入学习一下,将自己的学习心得在此记录,本文 ...

  10. Zju1610 Count the Colors

    题面: 画一些颜色段在一行上,一些较早的颜色就会被后来的颜色覆盖了. 你的任务就是要数出你随后能看到的不同颜色的段的数目. Input: 每组测试数据第一行只有一个整数n, 1 <= n < ...