The EMD is based on the minimal cost that must be paid to transform one distribution into the other.Intuitively,given two distributions,one can be seen as a mass of earth properly spread in space,the other as a collection of holes in that same space.Then,the EMD measures the least amount of work needed to fill the holes with earth.Here,a unit of work corresponds to transporting a unit of earth by a unit of ground distance.

  This can be formalized as the following linear programming problem:

    Let P={(p1,wp1),...,(pm,wpm)}

be the first signture with m clusters,where pis the cluster representative and wpi is the weight of the cluster;

    Q={(q1,wq1),...,(qn,wqn)}

the second signature with n cluster; and

    D=[dij]

the ground distance matrix where dij is the ground distance between cluster pi and qj .

  We want to find a flow

    F=[fij]

with fij the flow between pi and qj, that minimizes the overall cost 

  

subject to the following constranits:

Constraint (1) allows moving "supplis" from P to Q and not vice versa. Constraint (2) limits the amount of supplies that can be sent by the clusters in P to their weights.Constaint (3) limits the clusters in Q to receive no more supplies than their weights; and constraint (4) forces to move the maximum amount of supplies possible. We call this amount the total flow. Once the transportation problem is solved, and we hve found the optimal flow F, the earth mover's distance is defined as the resulting work normalied by the total flow:

The normalization factor is the total weight of the smaller signature, because of constraint (4). This factor is needed when the two signatures have different total weight, in order to avoid favoring smaller signatures. In general, the ground distance dij can be any distance and will be chosen according to the problem at hand.

The Earth Mover's Distance的更多相关文章

  1. Earth Mover's Distance (EMD)

    原文: http://d.hatena.ne.jp/aidiary/20120804/1344058475作者: sylvan5翻译: Myautsai和他的朋友们(Google Translate. ...

  2. [转]Earth Mover's Distance (EMD)

    转自:http://www.sigvc.org/bbs/forum.php?mod=viewthread&tid=981 Earth Mover's Distance (EMD)原文: htt ...

  3. Distributed Sentence Similarity Base on Word Mover's Distance

    Algorithm: Refrence from one ICML15 paper: Word Mover's Distance. 1. First use Google's word2vec too ...

  4. 唐诗掠影:基于词移距离(Word Mover's Distance)的唐诗诗句匹配实践

    词移距离(Word Mover's Distance)是在词向量的基础上发展而来的用来衡量文档相似性的度量.   词移距离的具体介绍参考http://blog.csdn.net/qrlhl/artic ...

  5. CV界的明星人物们

    CV界的明星人物们 来自:http://blog.csdn.net/necrazy/article/details/9380151,另外根据自己关注的地方,加了点东西. 今天在cvchina论坛上看到 ...

  6. paper 99:CV界的明星人物经典介绍

            CV人物1:Jianbo Shi史建波毕业于UC Berkeley,导师是Jitendra Malik.其最有影响力的研究成果:图像分割.其于2000年在PAMI上多人合作发表”Nor ...

  7. paper 23 :Kullback–Leibler divergence KL散度(2)

    Kullback–Leibler divergence KL散度 In probability theory and information theory, the Kullback–Leibler ...

  8. paper 22:kl-divergence(KL散度)实现代码

    这个函数很重要: function KL = kldiv(varValue,pVect1,pVect2,varargin) %KLDIV Kullback-Leibler or Jensen-Shan ...

  9. ### Paper about Event Detection

    Paper about Event Detection. #@author: gr #@date: 2014-03-15 #@email: forgerui@gmail.com 看一些相关的论文. 1 ...

随机推荐

  1. 为什么网页通常把JS调用放在底部?

    JS是单线程,浏览器是多线程.当我们在浏览器的地址栏里输入一个url地址,访问新页面时,页面展示的快慢是由一个单线程控制,这个线程叫做UI线程.UI线程会根据页面里资源(资源是html文件.图片.cs ...

  2. [工作中的设计模式]备忘录模式memento

    一.模式解析 备忘录对象是一个用来存储另外一个对象内部状态的快照的对象.备忘录模式的用意是在不破坏封装的条件下,将一个对象的状态捕捉(Capture)住,并外部化,存储起来,从而可以在将来合适的时候把 ...

  3. HDU5724 Chess(SG定理)

    题目 Source http://acm.hdu.edu.cn/showproblem.php?pid=5724 Description Alice and Bob are playing a spe ...

  4. ural 1339. Babies

    1339. Babies Time limit: 1.0 secondMemory limit: 64 MB O tempora! O mores! Present-day babies progre ...

  5. BZOJ2679 : [Usaco2012 Open]Balanced Cow Subsets

    考虑折半搜索,每个数的系数只能是-1,0,1之中的一个,因此可以先通过$O(3^\frac{n}{2})$的搜索分别搜索出两边每个状态的和以及数字的选择情况. 然后将后一半的状态按照和排序,$O(2^ ...

  6. POJ 3320 (尺取法+Hash)

    题目链接: http://poj.org/problem?id=3320 题目大意:一本书有P页,每页有个知识点,知识点可以重复.问至少连续读几页,使得覆盖全部知识点. 解题思路: 知识点是有重复的, ...

  7. topcoder SRM 610 DIV2 DivideByZero

    题目的意思是给你一组数,然后不断的进行除法(注意是大数除以小数),然后将得到的结果加入这组数种然后继续进行除法, 直到没有新添加的数为止 此题按照提议模拟即可 注意要保持元素的不同 int Count ...

  8. Codeforces Testing Round #10 B. Balancer

    水题,只要遍历一遍,不够平均数的,从后面的借,比平均数多的,把多余的数添加到后面即可,注意数据范围 #include <iostream> #include <vector> ...

  9. There was an internal API error.

    1.一开始以为是用了 iOS8的API, 抱着不相信的态度,我搜到了一条原因,可能是没有选择对开发者账号或需要 cmd+shift+k 来 clean 一下,无效. 2.又继续盯着这个问题,才发现我的 ...

  10. wxPython学习

    http://www.cnblogs.com/coderzh/archive/2008/11/23/1339310.html 一个简单的实例: #!/usr/bin/python import wx ...