This short tutorial shows how to compute Fisher vector and VLAD encodings with VLFeat MATLAB interface.

These encoding serve a similar purposes: summarizing in a vectorial statistic a number of local feature descriptors (e.g. SIFT). Similarly to bag of visual words, they assign local descriptor to elements in a visual dictionary, obtained with vector quantization (KMeans) in the case of VLAD or a Gaussian Mixture Models for Fisher Vectors. However, rather than storing visual word occurrences only, these representations store a statistics of the difference between dictionary elements and pooled local features.

Fisher encoding

The Fisher encoding uses GMM to construct a visual word dictionary. To exemplify constructing a GMM, consider a number of 2 dimensional data points (see also the GMM tutorial). In practice, these points would be a collection of SIFT or other local image features. The following code fits a GMM to the points:

numFeatures = 5000 ;
dimension = 2 ;
data = rand(dimension,numFeatures) ; numClusters = 30 ;
[means, covariances, priors] = vl_gmm(data, numClusters);

Next, we create another random set of vectors, which should be encoded using the Fisher Vector representation and the GMM just obtained:

numDataToBeEncoded = 1000;
dataToBeEncoded = rand(dimension,numDataToBeEncoded);

The Fisher vector encoding enc of these vectors is obtained by calling the vl_fisher function using the output of the vl_gmm function:

encoding = vl_fisher(datatoBeEncoded, means, covariances, priors);

The encoding vector is the Fisher vector representation of the data dataToBeEncoded.

Note that Fisher Vectors support several normalization options that can affect substantially the performance of the representation.

VLAD encoding

The Vector of Linearly Agregated Descriptors is similar to Fisher vectors but (i) it does not store second-order information about the features and (ii) it typically use KMeans instead of GMMs to generate the feature vocabulary (although the latter is also an option).

Consider the same 2D data matrix data used in the previous section to train the Fisher vector representation. To compute VLAD, we first need to obtain a visual word dictionary. This time, we use K-means:

numClusters = 30 ;
centers = vl_kmeans(dataLearn, numClusters);

Now consider the data dataToBeEncoded and use the vl_vlad function to compute the encoding. Differently from vl_fishervl_vlad requires the data-to-cluster assignments to be passed in. This allows using a fast vector quantization technique (e.g. kd-tree) as well as switching from soft to hard assignment.

In this example, we use a kd-tree for quantization:

kdtree = vl_kdtreebuild(centers) ;
nn = vl_kdtreequery(kdtree, centers, dataEncode) ;

Now we have in the nn the indexes of the nearest center to each vector in the matrix dataToBeEncoded. The next step is to create an assignment matrix:

assignments = zeros(numClusters,numDataToBeEncoded);
assignments(sub2ind(size(assignments), nn, 1:length(nn))) = 1;

It is now possible to encode the data using the vl_vlad function:

enc = vl_vlad(dataToBeEncoded,centers,assignments);

Note that, similarly to Fisher vectors, VLAD supports several normalization options that can affect substantially the performance of the representation.

from: http://www.vlfeat.org/overview/encodings.html

计算Fisher vector和VLAD的更多相关文章

  1. Fisher Vector Encoding and Gaussian Mixture Model

    一.背景知识 1. Discriminant  Learning Algorithms(判别式方法) and Generative Learning Algorithms(生成式方法) 现在常见的模式 ...

  2. 【CV知识学习】Fisher Vector

    在论文<action recognition with improved trajectories>中看到fisher vector,所以学习一下.但网上很多的资料我觉得都写的不好,查了一 ...

  3. Fisher vector for image classification

    http://files.cnblogs.com/files/sylar120/fisher_vector.rar 拿各个参数上的偏导作为特征

  4. VLAD算法浅析, BOF、FV比较

    划重点 ================================================= BOF.FV.VLAD等算法都是基于特征描述算子的特征编码算法,关于特征描述算子是以SIFT ...

  5. 转 STL之vector的使用

    http://www.cnblogs.com/caoshenghe/archive/2010/01/31/1660399.html 第一部分 使用入门 vector可用于代替C中的数组,或者MFC中的 ...

  6. Aggregating local features for Image Retrieval

    Josef和Andrew在2003年的ICCV上发表的论文[10]中,将文档检索的方法借鉴到了视频中的对象检测中.他们首先将图像的特征描述类比成单词,并建立了基于SIFT特征的vusual word ...

  7. 残差网络resnet学习

    Deep Residual Learning for Image Recognition 微软亚洲研究院的何凯明等人 论文地址 https://arxiv.org/pdf/1512.03385v1.p ...

  8. Resnet论文翻译

    摘要 越深层次的神经网络越难以训练.我们提供了一个残差学习框架,以减轻对网络的训练,这些网络的深度比以前的要大得多.我们明确地将这些层重新规划为通过参考输入层x,学习残差函数,来代替没有参考的学习函数 ...

  9. 图像检索(1): 再论SIFT-基于vlfeat实现

    概述 基于内容的图像检索技术是采用某种算法来提取图像中的特征,并将特征存储起来,组成图像特征数据库.当需要检索图像时,采用相同的特征提取技术提取出待检索图像的特征,并根据某种相似性准则计算得到特征数据 ...

随机推荐

  1. ASP.NET:Forms身份验证和基于Role的权限验证

    从Membership到SimpleMembership再到ASP.NET Identity,ASP.NET每一次更换身份验证的组件,都让我更失望.Membership的唯一作用就是你可以参考它的实现 ...

  2. 因为修改linux selinux修改错误产生的问题及解决办法

    会出现这个错误: not syncing attempted to kill init 解决办法是: 开机后一直按e 然后按这个修改: https://www.deep-silver.com/kern ...

  3. 实现linux和windows文件传输

    其实这个题目有点大,这里介绍的只是linux和windows文件传输中的一种,但是这种方法却非常实用,那就是:ZModem协议具体是linux命令是:rz和sz但是其实它们是两个非常方便的工具.   ...

  4. [CodeChef - STREETTA] The Street 李超线段树

    大致题意: 给出两个序列A,B,A初始为负无穷,B初始为0,有三种操作 1.在A上区间[u,v]上加一个等差数列,取与原本A序列的最大值. 2.在B上区间[u,v]上加一个等差数列. 3.给出一个点X ...

  5. 预备作业02:体会做中学(Learning By Doing)

    1.很惭愧,我并没有什么技能能强过大家. 2...... 3.我觉得培养一个技能,必须要通过勤勉的练习,认真的学习,还有不断地结合实践. 4.我觉得我学习<程序设计与数据结构>之后应该对程 ...

  6. 【基础知识】Dom基础

    [学习日记]Dom基础 1.   内容:使用JavaScript操作Dom进行DHTML开发 2.   目标:能共使用JavaScript操作Dom实现常见的DHTML效果 3.   DHTML= C ...

  7. NetCore控制台实现自定义CommandLine功能

    命令行科普: 例如输入: trans 123 456 789 -r 123 -r 789上面例子中:trans是Command,123 456 789是CommandArgument,-r之后的都是C ...

  8. Swift2.0语言教程之类的属性

    Swift2.0语言教程之类的属性 类 虽然函数可以简化代码,但是当一个程序中出现成百上千的函数和变量时,代码还是会显得很混乱.为此,人们又引入了新的类型——类.它是人们构建代码所用的一种通用.灵活的 ...

  9. nginx + uswgi +django

    适合ubuntu 系统,不只是树莓派 安装必要软件 pt-get install build-essential psmisc apt-get install python-dev libxml2 l ...

  10. codevs 5790 素数序数

    5790 素数序数(筛素数版) 时间限制: 1 s 空间限制: 32000 KB 题目等级 : 黄金 Gold       题目描述 Description 给定一个整数n,保证n为正整数且在int范 ...