Machine Learning Trick of the Day (1): Replica Trick

'Tricks' of all sorts are used throughout machine learning, in both research and in production settings. These tricks allow us to address many different types of data analysis problems, being roughly of either an analytical, statistical, algorithmic, or numerical flavour. Today's trick is in the analytical class and comes to us from statistical physics: the popular Replica trick.

The replica trick [1][2][3] is used for analytical computation of log-normalising constants (or log-partition functions). More formally, the replica trick provides one of the tools needed for a replica analysis of a probabilistic model — a theoretical analysis of the the properties and expected behaviour of a model. Replica analysis has been used to provide an insight into almost all model classes popular in machine learning today, including linear (GLM) regressionlatent variable modelsmulti-layer neural networks, and Gaussian processes, amongst others.

We are often interested in making statements about the generalisation ability of our models; whereas approaches such as PAC learning provide for a worse-case analysis, replica analysis allows for statements in the average case that can be more useful, especially in verifying our numerical implementations. Replica analysis can also be used to provide insight into transitions that might occur during learning, to show how non-linearities within our models affect learning, and to study the effect of noise in the learning dynamics [4]. This post aims to provide a brief review of the replica trick, the steps typically involved in a replica analysis, and links to the many ways it has been used to provide insight into popular machine learning approaches.

Replica Trick

Consider a probabilistic model whose normalising constant is Z(x), for data x. The replica trick says that:

E[lnZ]=limn→01nlnE[Zn]

The left-hand side is often called the quenched free energy (free energy averaged over multiple data sets). The replica trick transforms the expectation of the log into the log of an expectation, with the result involving the nth-power of the normalising constant Z, i.e. the expectation is computed by replicating the normalising constant n times.

To see how we obtained this expression, we will exploit two useful identities.

1. Exponential identity: xn=exp(nlnx)=limn→01+nlnx

⇒lnx=limn→0xn−1n(=limn→0ddnxn)

The application of this identity is often referred to as the replica trick, since it is what allows us to rewrite the initial expectation in its replicated form.

2. Logarithmic identity:

ln(1+nx)≈nx, if nx≪1

We can use these two identities to show that:

nE[lnZ]=limn→0ln(1+nE[lnZ]) using (1)
E[lnZ]=limn→01nln(1+nE[lnZ])
=limn→01nln(1+nE[Zn]−1n) using (2) 
∴E[lnZ]=limn→01nlnE[Zn]

which is the identity we sought out. But this alone does not help us much since the partition function it is not any easier to compute. Part of the trick lies in performing the analysis assuming that n is an integer, and computing the integral n times, i.e. using nreplicas. To compute the limit we do a continuation to the real line (and hope that the result will be valid). This final step is hard to justify and is one of the critiques of replica analysis.

Replica Analysis

Given the replica trick, we can now conduct a replica analysis, also known as a Gardener analysis, of our model[1][5]. This typically involves the following steps:

  • Apply the replica trick to the integral problem (computation of average free energy).
  • Solving the unknown integral, typically by the saddle-point integration scheme (sometimes known as the method of steepest descent). This will not always be easy to do, but certain assumptions can help.
  • Perform an analytic continuation to determine the limit n→0. This step (and the previous one) involves the assumption (or ansatz) of the structure of the solution, with the typical ansatz known as replica symmetry. Replica symmetry assumes that the replicas are symmetric under permutation of their labels. This is reasonable if we work with models with i.i.d. data assumptions where the specific allocation of data to any of the replicas will not matter. More advanced analyses make use other assumptions.

Following these steps, some of the popular models for which you can see replica analysis in action are:

Summary

One trick we have available in machine learning for the theoretical analysis of our models is the replica trick. The replica trick, when combined with an analytic continuation that allows us to compute limits, and the saddle-point method for integration, allows us to perform a replica analysis. This analysis allows us to examine the generalisation ability of our models, study transitions that might occur during learning, understand how non-linearities within our models affect learning, and to study the effect of noise in the learning dynamics. Such analysis provides us with a deeper insight into our models and how to train them, and provides needed stepping stones on our path towards building ever more powerful machine learning systems.


Some References
[1] Andreas Engel, Christian Van den Broeck, Statistical mechanics of learning, , 2001
[2] Kevin John Sharp, Effective Bayesian inference for sparse factor analysis models, , 2011
[3] Manfred Opper, Statistical mechanics of learning: Generalization, The Handbook of Brain Theory and Neural Networks,, 1995
[4] HS Seung, Haim Sompolinsky, N Tishby, Statistical mechanics of learning from examples, Physical Review A, 1992
[5] Tommaso Castellani, Andrea Cavagna, Spin-glass theory for pedestrians, Journal of Statistical Mechanics: Theory and Experiment, 2005

Machine Learning Trick of the Day (1): Replica Trick的更多相关文章

  1. Machine Learning Trick of the Day (2): Gaussian Integral Trick

    Machine Learning Trick of the Day (2): Gaussian Integral Trick Today's trick, the Gaussian integral ...

  2. Kernel Functions for Machine Learning Applications

    In recent years, Kernel methods have received major attention, particularly due to the increased pop ...

  3. Advice for applying Machine Learning

    https://jmetzen.github.io/2015-01-29/ml_advice.html Advice for applying Machine Learning This post i ...

  4. Machine Learning for Developers

    Machine Learning for Developers Most developers these days have heard of machine learning, but when ...

  5. GoodReads: Machine Learning (Part 3)

    In the first installment of this series, we scraped reviews from Goodreads. In thesecond one, we per ...

  6. 学习笔记之Machine Learning Crash Course | Google Developers

    Machine Learning Crash Course  |  Google Developers https://developers.google.com/machine-learning/c ...

  7. How do I learn mathematics for machine learning?

    https://www.quora.com/How-do-I-learn-mathematics-for-machine-learning   How do I learn mathematics f ...

  8. 5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics

    5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics Where d ...

  9. Portal:Machine learning机器学习:门户

    Machine learning Machine learning is a scientific discipline that explores the construction and stud ...

随机推荐

  1. 团队作业7——第二次项目冲刺(Beta版本12.04——12.07)

    1.当天站立式会议照片 本次会议在5号公寓3楼召开,本次会议内容:①:熟悉每个人想做的模块.②:根据项目要求还没做的完成. 2.每个人的工作 经过会议讨论后确定了每个人的分工 组员 任务 陈福鹏 实现 ...

  2. [转帖]amzon最新的产品outposts

    2018年12月3日,全球领先的企业软件创新者VMware(NYSE: VMW)发布两款运行于AWS Outposts的全新解决方案预览:VMware Cloud on AWS Outposts与VM ...

  3. DTCping 的简单使用与排错

    1. 工具下载路径 https://support.microsoft.com/zh-cn/help/918331/how-to-troubleshoot-connectivity-issues-in ...

  4. Oracle与SQLSERVER 批处理执行 DDL 语句

    1. 公司里面的 很多同名的数据库 的一个表都错误的多了一个列 要是每个都用数据库连接工具打开 感觉太废时间了. 比如写个sql命令来执行. 具体方法: Oracle 使用 sqlplus sqlpl ...

  5. d指针在Qt上的应用及实现(有图,很清楚)

    Qt为了使其动态库最大程度上实现二进制兼容,引入了d指针的概念.那么为什么d指针能实现二进制兼容呢?为了回答这个问题,首先弄清楚什么是二进制兼容?所谓二进制兼容动态库,指的是一个在老版本库下运行的程序 ...

  6. Java并发编程之线程安全、线程通信

    Java多线程开发中最重要的一点就是线程安全的实现了.所谓Java线程安全,可以简单理解为当多个线程访问同一个共享资源时产生的数据不一致问题.为此,Java提供了一系列方法来解决线程安全问题. syn ...

  7. 解决Slave SQL线程Waiting for binlog lock

       最近在我们线上库物理备份的时候出现一个奇怪的现象:    我们备份都在从库上备份的,在业务低一般是在晚上2点钟开始备份.有天发现从库的延迟一直在增加,登录上实例,通过show processli ...

  8. Introduction to One-class Support Vector Machines

    Traditionally, many classification problems try to solve the two or multi-class situation. The goal ...

  9. BZOJ3246 IOI2013Dreaming

    如果将森林里每棵树都各自看做一个点,那么最后所连成的树应该是一颗菊花,否则将叶子节点父亲改为根不会更劣. 对于每个点所代表的树,其和根节点相连的点应该是到其他点距离最大值最小的点.这个点显然是直径的中 ...

  10. 数据库事物 jdbc事物 spring事物 隔离级别:脏幻不可重复读

    1.数据库事物: 事物的概念 a给b打100块钱的例子 2.jdbc事物: 通过下面代码实现 private Connection conn = null; private PreparedState ...