paper 23 ：Kullback–Leibler divergence KL散度（2）

Kullback–Leibler divergence KL散度

In probability theory and information theory, the Kullback–Leibler divergence[1][2][3] (also information divergence,information gain, relative entropy, or KLIC) is a non-symmetric measure of the difference between two probability distributions P and Q. KL measures the expected number of extra bits required to code samples from P when using a code based on Q, rather than using a code based on P. Typically P represents the "true" distribution of data, observations, or a precise calculated theoretical distribution. The measure Q typically represents a theory, model, description, or approximation of P.

Although it is often intuited as a distance metric, the KL divergence is not a true metric – for example, the KL from P to Q is not necessarily the same as the KL from Q to P.

KL divergence is a special case of a broader class of divergences called f-divergences. Originally introduced by Solomon Kullbackand Richard Leibler in 1951 as the directed divergence between two distributions, it is not the same as a divergence incalculus. However, the KL divergence can be derived from the Bregman divergence.

注意P通常指数据集，我们已有的数据集，Q表示理论结果，所以KL divergence 的物理含义就是当用Q来编码P中的采样时，比用P来编码P中的采用需要多用的位数！

KL散度，也有人称为KL距离，但是它并不是严格的距离概念，其不满足三角不等式

KL散度是不对称的，当然，如果希望把它变对称，

Ds(p1, p2) = [D(p1, p2) + D(p2, p1)] / 2

下面是KL散度的离散和连续定义！

注意的一点是p(x) 和q(x)分别是pq两个随机变量的PDF,D(P||Q)是一个数值，而不是一个函数，看下图！

注意：KL Area to be Integrated!

KL 散度一个很强大的性质：

The Kullback–Leibler divergence is always non-negative,

a result known as Gibbs' inequality, with D_KL(P||Q) zero if and only if P = Q.

计算KL散度的时候，注意问题是在稀疏数据集上KL散度计算通常会出现分母为零的情况！

Matlab中的函数：KLDIV给出了两个分布的KL散度

Description

KLDIV Kullback-Leibler or Jensen-Shannon divergence between two distributions.

KLDIV(X,P1,P2) returns the Kullback-Leibler divergence between two distributions specified over the M variable values in vector X. P1 is a length-M vector of probabilities representing distribution 1, and P2 is a length-M vector of probabilities representing distribution 2. Thus, the probability of value X(i) is P1(i) for distribution 1 and P2(i) for distribution 2. The Kullback-Leibler divergence is given by:

KL(P1(x),P2(x)) = sum[P1(x).log(P1(x)/P2(x))]

If X contains duplicate values, there will be an warning message, and these values will be treated as distinct values. (I.e., the actual values do not enter into the computation, but the probabilities for the two duplicate values will be considered as probabilities corresponding to two unique values.) The elements of probability vectors P1 and P2 must each sum to 1 +/- .00001.

A "log of zero" warning will be thrown for zero-valued probabilities. Handle this however you wish. Adding 'eps' or some other small value to all probabilities seems reasonable. (Renormalize if necessary.)

KLDIV(X,P1,P2,'sym') returns a symmetric variant of the Kullback-Leibler divergence, given by [KL(P1,P2)+KL(P2,P1)]/2. See Johnson and Sinanovic (2001).

KLDIV(X,P1,P2,'js') returns the Jensen-Shannon divergence, given by [KL(P1,Q)+KL(P2,Q)]/2, where Q = (P1+P2)/2. See the Wikipedia article for "Kullback–Leibler divergence". This is equal to 1/2 the so-called "Jeffrey divergence." See Rubner et al. (2000).

EXAMPLE: Let the event set and probability sets be as follow:
   X = [1 2 3 3 4]';
   P1 = ones(5,1)/5;
   P2 = [0 0 .5 .2 .3]' + eps;
Note that the event set here has duplicate values (two 3's). These will be treated as DISTINCT events by KLDIV. If you want these to be treated as the SAME event, you will need to collapse their probabilities together before running KLDIV. One way to do this is to use UNIQUE to find the set of unique events, and then iterate over that set, summing probabilities for each instance of each unique event. Here, we just leave the duplicate values to be treated independently (the default):
   KL = kldiv(X,P1,P2);
   KL =
        19.4899

Note also that we avoided the log-of-zero warning by adding 'eps' to all probability values in P2. We didn't need to renormalize because we're still within the sum-to-one tolerance.

REFERENCES:
1) Cover, T.M. and J.A. Thomas. "Elements of Information Theory," Wiley, 1991.
2) Johnson, D.H. and S. Sinanovic. "Symmetrizing the Kullback-Leibler distance." IEEE Transactions on Information Theory (Submitted).
3) Rubner, Y., Tomasi, C., and Guibas, L. J., 2000. "The Earth Mover's distance as a metric for image retrieval." International Journal of Computer Vision, 40(2): 99-121.
4) <a href="http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence">Kullback–Leiblerdivergence</a>. Wikipedia, The Free Encyclopedia.

paper 23 ：Kullback–Leibler divergence KL散度（2）的更多相关文章

KL散度(Kullback–Leibler divergence)
KL散度是度量两个分布之间差异的函数.在各种变分方法中,都有它的身影. 转自:https://zhuanlan.zhihu.com/p/22464760 一维高斯分布的KL散度多维高斯分布的KL散度 ...
paper 22：kl-divergence（KL散度）实现代码
这个函数很重要: function KL = kldiv(varValue,pVect1,pVect2,varargin) %KLDIV Kullback-Leibler or Jensen-Shan ...
【原】浅谈KL散度（相对熵）在用户画像中的应用
最近做用户画像,用到了KL散度,发现效果还是不错的,现跟大家分享一下,为了文章的易读性,不具体讲公式的计算,主要讲应用,不过公式也不复杂,具体可以看链接. 首先先介绍一下KL散度是啥.KL散度全称Ku ...
浅谈KL散度
一.第一种理解相对熵(relative entropy)又称为KL散度(Kullback–Leibler divergence,简称KLD),信息散度(information divergence) ...
ELBO 与 KL散度
浅谈KL散度一.第一种理解相对熵(relative entropy)又称为KL散度(Kullback–Leibler divergence,简称KLD),信息散度(information dive ...
交叉熵cross entropy和相对熵（kl散度）
交叉熵可在神经网络(机器学习)中作为损失函数,p表示真实标记的分布,q则为训练后的模型的预测标记分布,交叉熵损失函数可以衡量真实分布p与当前训练得到的概率分布q有多么大的差异. 相对熵(relativ ...
KL散度的理解（GAN网络的优化）
原文地址Count Bayesie 这篇文章是博客Count Bayesie上的文章Kullback-Leibler Divergence Explained 的学习笔记,原文对 KL散度的概念诠释 ...
KL散度与JS散度
1.KL散度 KL散度( Kullback–Leibler divergence)是描述两个概率分布P和Q差异的一种测度.对于两个概率分布P.Q,二者越相似,KL散度越小. KL散度的性质:P表示真实 ...
KL散度非负性证明
1 KL散度 KL散度(Kullback–Leibler divergence) 定义如下: $D_{K L}=\sum\limits_{i=1}^{n} P\left(x_{i}\right) \t ...

随机推荐

ManualResetEvent和AutoResetEvent的区别实例
ManualResetEvent和AutoResetEvent的作用可以理解为在线程执行中插入停顿点flag终止程序运行,然后通过设置flag的状态来使得程序继续运行. 两者的区别是:ManualRe ...
WordPress显示备案号
备案时,需要显示备案号,而wordpress默认模板本身不带这个信息,为了更快速应付备案,解决方案如下: 根据wp-config.php的提示 .......... /** * zh_CN本地化设置: ...
Android笔记：真机调试
参考:android通过USB使用真机调试程序手机:华为Y511-U00 1.将手机USB调试模式连接到PC(直接用数据线连接到PC一般可以自动切换) 2.连接成功后提示安装驱动,我选择的是自动安装 ...
c#中的linq一
c#中的linq 测试数据: using System; using System.Collections.Generic; using System.Linq; using System.Text; ...
Tuning SQL via case when statement
原SQL如下:SQL的主要问题是红色部分居然通过标量查询,反复的查找与SQL相同的基表,很显然这个可以用case when来简化. select a.TRAN_ID,a.AMOUNT,a.BALANC ...
jenkins+git实现docker持续部署
jenkins所做的事情很简单,就拿我现在的情况来说吧,(1).每次开发完成,我都会push到我的远程仓库:(2).我再将我push到远程仓库的代码pull到我的测试服务器上:(3).在测试服务器上, ...
WPF中model属性即时改变
新建一个model作为说明即可,以便查阅. 添加引用:using System.ComponentModel ; public class Test:INotifyPropertyChanged { ...
Android仿QQ窗口的抖动的动画效果
就是仿照QQ窗口的抖动效果,在项目的res下创建anim文件夹,再创建两个xml文件:cycle.xml . myanim.xml cycle.xml : <?xml version ...
Spring Boot 1 创建Demo
Spring Boot的主要优点: 为所有Spring开发者更快的入门开箱即用,提供各种默认配置来简化项目配置内嵌式容器简化Web项目没有冗余代码生成和XML配置的要求入门操作: 1.打开ht ...
Java 收集的代码 transient
public class Main { public static void main(String[] args) { ((NULL)null).haha(); } } class NULL { p ...

paper 23 ：Kullback–Leibler divergence KL散度（2）

Kullback–Leibler divergence KL散度

paper 23 ：Kullback–Leibler divergence KL散度（2）的更多相关文章

随机推荐

热门专题