Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata

读了一篇paper，MSRA的Wei Wu的一篇《Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata》。是关于Ranking Relevence方面的文章。下面简单讲下我对这篇文章的理解，对这方面感兴趣的小伙伴们可以交流一下。

1. Abstract

这篇文章的重点在于使用query-doc的点击二部图，结合query/doc的meta数据（组织成multiple types of features），来学习出query-doc（顺带介绍了query-query，doc-doc）的similarity。

为了计算上述的similarity，作者采用了两个不同的linear mappings，用来把query从query feature space，把doc从doc feature space映射到相同的latent space上，然后便可通过计算这个latent space上两者的vector的dot product来获得两者的similarity。于是，便把对similarity的learning形式化为对mapping的learning，而这个mapping的learning的目标是为了maximize从enriched click-through bipartite gragh上观察到的query-doc的similarity（可以通过query-doc pair的点击数来衡量）。另外，这个linear mapping是针对一种类型的features，获得一种类型features的similarity function，如果有multiple types of features的话，则最终的similarity function是每个type的similarity function的线性组合。

learning过程用到的算法包括Singular Value Decomposition（SVD）和Multi-view Partial Least Squares（M-PLS）。

2. Introduction

作者提到了先前的关于计算query-doc similarity的几种方法。

1）feature based methods：Vector Space Model（VSM），BM25，Language Models for Information Retrieval（LMIR）等。

2）gragh based methods：mining query-doc similarity from a click-through bipartite gragh等。

而这篇文章是将两者结合起来：

3. Problem Formulation

将每种type的features的query或者document用一个向量的形式来表示，，则linear mapping可以看做是维度为和的两种形式的矩阵（和），通过这两种变换矩阵，query或者doc在原始空间上的向量被变换成latent space上的维度为的向量和。于是，对于这种type的faetures，simialrity function表示为。我们可以将点击二部图中query-doc的点击数看作是query-doc similarity的大小，而通过maximize观察到的query-doc的similarity来学习linear mapping和线性加权的权重。

最终的learning problem可以表示为：

这时候有个问题，就是需要最大化的公式的值是可以无限大的，因为没有系数的限制，下面会介绍如何在系数上加上constraints。

4. Multi-view Partial Least Squares

4.1 Constrained Optimization Problem

1）对feature vectors进行归一化：，

2）对mapping matrices进行正交化限制。

3）对线性加权权重进行L2 正则化限制。

于是，learning method重新形式化为：

4.2 Globally Optimal Solution

为了获得全局最优解，两步走。第一步，对每种type的features，通过SVD求解得到optimal linear mapping；第二步，求解optimal combination weights。

上述的公式（2）可以重写为：

optimization problem为：

通过SVD求得global optimal solution。

于是，公式（2）可以写成：

而combination weights求解为：

4.3 Learning Algorithm

1）for each type of feature，solves SVD of Mi to learn the linear mapping。

2）calculates the combination weights using （5）。

本文由笨兔勿应所有，发布于http://www.cnblogs.com/bentuwuying。如果转载，请注明出处，在未经作者同意下将本文用于商业用途，将追究其法律责任。

Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata的更多相关文章

关于IOS浏览器：document,body的click事件触发规则
今天做了个手机页面,点击某个按钮->弹出菜单,再点击菜单以外的任意位置->关闭菜单,在其他浏览器里面没有问题,但是在IOS浏览器中并不会关闭. 网上解决这个bug的帖子很多,这篇帖子主要是 ...
（八）Index and Query a Document
Let’s now put something into our customer index. We’ll index a simple customer document into the cus ...
深度学习基础（一）LeNet_Gradient-Based Learning Applied to Document Recognition
作者:Yann LeCun,Leon Botton, Yoshua Bengio,and Patrick Haffner 这篇论文内容较多,这里只对部分内容进行记录: 以下是对论文原文的翻译: 在传统 ...
Gradient-Based Learning Applied to Document Recognition 部分阅读
卷积网络卷积网络用三种结构来确保移位.尺度和旋转不变:局部感知野.权值共享和时间或空间降采样.典型的leNet-5如下图所示: C1中每个特征图的每个单元和输入的25个点相连,这个5* ...
计算广告（5）----query意图识别
目录: 一.简介: 1.用户意图识别概念 2.用户意图识别难点 3.用户意图识别分类 4.意图识别方法: (1)基于规则 (2)基于穷举 (3)基于分类模型二.意图识别具体做法: 1.数据集 2.数 ...
使用点击二分图计算query-document的相关性
之前的博客中已经介绍了Ranking Relevance的一些基本情况(Click Behavior,和Text Match):http://www.cnblogs.com/bentuwuying/p ...
使用点击二分图传导计算query-document的相关性
之前的博客中已经介绍了Ranking Relevance的一些基本情况(Click Behavior,和Text Match):http://www.cnblogs.com/bentuwuying/p ...
Awesome Deep Vision
Awesome Deep Vision A curated list of deep learning resources for computer vision, inspired by awes ...
learning to rank
Learning to Rank入门小结 + 漫谈 Learning to Rank入门小结 Table of Contents 1 前言 2 LTR流程 3 训练数据的获取4 特征抽取 3.1 人工 ...

随机推荐

Python 解压缩Zip和Rar文件到指定目录
#__author__ = 'Joker'# -*- coding:utf-8 -*-import urllibimport osimport os.pathimport zipfilefrom zi ...
divmod()
divmod() 接收两个数值,然后以元组的形式返回这两个数值的商和余数 In [1]: divmod(5, 2) Out[1]: (2, 1) In [2]: divmod(10, 7) Out[2 ...
m2014-architecture-imgserver->利用Squid反向代理搭建CDN缓存服务器加快Web访问速度
案例:Web服务器:域名www.abc.com IP:192.168.21.129 电信单线路接入访问用户:电信宽带用户.移动宽带用户出现问题:电信用户打开www.abc.com正常,移动用户打开ww ...
react实现的点击拖拽元素效果
之前用vue做日程管理组件的时候,用到了点击拖拽的效果,即点击元素,鼠标移动到哪里,元素移动到哪里,鼠标松开,拖拽停止,现在在弄react,于是也在想实现这个效果,经过一番折腾,效果出来了,代码如下: ...
Linux命令之type - 显示命令的类型
用途说明 type命令用来显示指定命令的类型.一个命令的类型可以是如下之一 alias 别名 keyword 关键字,Shell保留字 function 函数,Shell函数 builtin 内建命令 ...
php第二例
参考: http://www.php.cn/code/3645.html 前言由于navicat在linux平台不能很好的支持, PHP的学习转到windows平台. php IDE: PhpSto ...
Android 7.1 SystemUI--任务管理--场景一:长按某个缩略图，拖动分屏的流程
TaskView 类的长按事件 onLongClick 方法内发送了 DragStartEvent 事件消息,该 DragStartEvent 事件消息由 RecentsView,TaskStackV ...
【BZOJ2595】[Wc2008]游览计划斯坦纳树
[BZOJ2595][Wc2008]游览计划 Description Input 第一行有两个整数,N和 M,描述方块的数目. 接下来 N行, 每行有 M 个非负整数, 如果该整数为 0, 则该方块为 ...
iOS tableview上放textfield
用UITableViewController就可以了,处理键盘弹出和消失的代码已经封装在UITableViewController里了.
java的Result类
import org.apache.commons.lang.StringUtils; import java.io.Serializable;import java.util.HashMap;imp ...

Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata

Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata的更多相关文章

随机推荐

热门专题