Paper Title

Real-time Attention Based Look-alike Model for Recommender System

Basic algorithm and main steps

Basic ideas

RALM is a similarity based look-alike model, which consists of user representation learning and look-alike learning. Novel points: attention-merge layer, local and global attention, on-line asynchronous seeds cluster.

1. Offline Traning

1. User Representation Learning

Treat it as multi-class classification that chooses an interest item from millions of candidates.

(1) Calculate the possibility of picking the $ i$-th item as a negative example

$ p(x_i) = \frac{log(k+2)-log(k+1)}{log(D+1)} $

$ D $: the max rank of all the items( rank by their frequency of appearance.)

$ k $: the rank of the $ i$-th item.

(2) Negative sampling: ample in a positive/negative proportion of 1/10

(3) Embedding layer

$ P(c=i|U,X_i) = \frac{e^{x_i u}}{\sum \limits_{j \in X}e^{x_j u}} $

the cross entropy loss : $ L = -\sum \limits_{j \in X} y_i log P(c=i|U,X_i) $

$ u $: a high-dimensional embedding of the user

$ x_j $: embeddings of item $ j $

$ y_i \in {0, 1} $: the label

When converge, output: the representation of user interests.

(4) Attention merge layer

Learn user-related weights for multiple fields.

\(n\) fields are embedded with the same length \(m\) as vector \(h \in R^m\), and then concatenate them in dimension 2, resulting a matrix \(H \in R^{n×m}\). Next, compute weights:

$ u = tanh(W_1H) $

$ w_i = \frac{e{W_2u_iT}}{\sum_j^n e{W_2u_jT}} $

\(W_1 \in R^{k×n}\) and \(W_2 \in R^k\) : weight matrix , \(k\) size of attention unit,

$ u \in R^n$ :the activation unit for fields, \(a ∈ R^n\) weights of fields.

Merge vector $ M \in R^m : M = aH $

Then take it as the input of the MLP layer and get universal user embedding.

2. Look-alike Learning

(1) Transforming matrix.

$ n \times m $ to $ n \times h $

(2) Local attention

To activate local interest / mine personalized info.

$ E_{local_s} = E_s softmax(tanh(E_s^T W_l E_u)) $

\(W_l \in R^{h \times h}\) : the attention matrix,

\(E_s\) : seen user $ E_u $: target user

Note: Firstly, cluster the seed users through K-means algorithm into k clusters, and for each cluster , calculate the average mean of seeds vectors.

(3) Global attention

$ E_{global_s} = E_s softmax(E_s^T tanh(W_g E_s)) $

(4) Calculate the similarity between seeds and target user

$ score_{u,s} = \alpha \cdot cosine(E_u,E_{global_s}) + \beta \cdot cosine(E_u, E_{local_s}) $

(5) Iterative training

2. Online Asynchronous Processing

Update seeds embedding database in real-time . It includes user feedback monitor and seeds clustering.

3. Online Serving

$ score_{u,s} = \alpha \cdot cosine(E_u,E_{global_s}) + \beta \cdot cosine(E_u, E_{local_s}) $

Motivation

  • The "Matthew effect" becomes increasingly evident in recent recommendation systems. Many competitive long-tail contents are

    difficult to achieve timely exposure because of lacking behavior

    features .
  • Traditional look-alike models which widely used in on-line

    advertising are not suitable for recommender systems because of

    the strict requirement of both real-time and effectiveness.

Contribution

  • Improve the effectiveness of user representation learning. Use the attention to capture various fields of interests.
  • Improve the robustness and adaptivity of seeds representation learning. Use local and global attention.
  • Realize a real-time and high-performance look-alike model

My own idea

Relations to what I had read

  • Method of concatenating feature fields. In other paper about CTR I had read, different feature fields

    are concatenated directly. It will cause overfitting in strongly-relevant fields(such as interested tags) and underfitting in to weakly-relevant fields(such as shopping interests) . Then it leads to a result that the recommended results are determined by the few strongly-relevant fields. Such models can not learn comprehensively on multi-fields features, and will lack diversity of recommended results. But in this paper, it uses attention merge to learn effective relations among different fields of user features.
  • Besides, it uses high-order continuous features instead of categorical features. In my opinion, if we use low-order categorical features to express the user group, we can only use statistical methods to construct the features, which will lose most of the information of the group. However, the higher-order continuous features after presentation learning actually contain the intersections of various lower-order features of users, which can more comprehensively express the information of users. Moreover, the higher-order features are generalized to avoid the expression of memory trapped in historical data.

Shortcomings and potential change I assume

  • In this paper, it seems that only a few features are used to learn representation, which may limits the effect in some extends.

【DM论文阅读杂记】推荐系统 注意力机制的更多相关文章

  1. CAP:多重注意力机制,有趣的细粒度分类方案 | AAAI 2021

    论文提出细粒度分类解决方案CAP,通过上下文感知的注意力机制来帮助模型发现细微的特征变化.除了像素级别的注意力机制,还有区域级别的注意力机制以及局部特征编码方法,与以往的视觉方案很不同,值得一看 来源 ...

  2. 推荐系统中的注意力机制——阿里深度兴趣网络(DIN)

    参考: https://zhuanlan.zhihu.com/p/51623339 https://arxiv.org/abs/1706.06978 注意力机制顾名思义,就是模型在预测的时候,对用户不 ...

  3. [论文阅读]阿里DIN深度兴趣网络之总体解读

    [论文阅读]阿里DIN深度兴趣网络之总体解读 目录 [论文阅读]阿里DIN深度兴趣网络之总体解读 0x00 摘要 0x01 论文概要 1.1 概括 1.2 文章信息 1.3 核心观点 1.4 名词解释 ...

  4. [论文阅读]阿里DIEN深度兴趣进化网络之总体解读

    [论文阅读]阿里DIEN深度兴趣进化网络之总体解读 目录 [论文阅读]阿里DIEN深度兴趣进化网络之总体解读 0x00 摘要 0x01论文概要 1.1 文章信息 1.2 基本观点 1.2.1 DIN的 ...

  5. 自然语言处理中的自注意力机制(Self-attention Mechanism)

    自然语言处理中的自注意力机制(Self-attention Mechanism) 近年来,注意力(Attention)机制被广泛应用到基于深度学习的自然语言处理(NLP)各个任务中,之前我对早期注意力 ...

  6. 深度学习之注意力机制(Attention Mechanism)和Seq2Seq

    这篇文章整理有关注意力机制(Attention Mechanism )的知识,主要涉及以下几点内容: 1.注意力机制是为了解决什么问题而提出来的? 2.软性注意力机制的数学原理: 3.软性注意力机制. ...

  7. Pytorch系列教程-使用Seq2Seq网络和注意力机制进行机器翻译

    前言 本系列教程为pytorch官网文档翻译.本文对应官网地址:https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutor ...

  8. AAAI2018中的自注意力机制(Self-attention Mechanism)

    近年来,注意力(Attention)机制被广泛应用到基于深度学习的自然语言处理(NLP)各个任务中.随着注意力机制的深入研究,各式各样的attention被研究者们提出,如单个.多个.交互式等等.去年 ...

  9. 论文阅读笔记 Improved Word Representation Learning with Sememes

    论文阅读笔记 Improved Word Representation Learning with Sememes 一句话概括本文工作 使用词汇资源--知网--来提升词嵌入的表征能力,并提出了三种基于 ...

  10. TensorFlow从1到2(十)带注意力机制的神经网络机器翻译

    基本概念 机器翻译和语音识别是最早开展的两项人工智能研究.今天也取得了最显著的商业成果. 早先的机器翻译实际脱胎于电子词典,能力更擅长于词或者短语的翻译.那时候的翻译通常会将一句话打断为一系列的片段, ...

随机推荐

  1. Grafana 系列文章(七):Grafana Explore 中的 Tracing

    ️URL: https://grafana.com/docs/grafana/latest/explore/trace-integration/ Description: Tracing in Exp ...

  2. 平台工程101:Dev、Sec和Ops的自动化黏合剂

    国际权威知名调研机构 Gartner 在<2023年最重要的10个技术趋势>报告中将平台工程(Platform Engineering)列为高速发展的技术趋势之一,并预测到2026年80% ...

  3. 支付对接常用的加密方式介绍以及java代码实现

    京东科技 姚永健 一.术语表: 1.对称算法 加密解密密钥是相同的.这些算法也叫秘密密钥算法或单密钥算法,它要求发送者和接收者在安全通信之前,商定一个密钥.对称算法的安全性依赖于密钥,泄漏密钥就意味着 ...

  4. KingbaseES集群故障分析案例

    某商业银行生产系统KingbaseES读写分离集群主库出现故障,导致集群主备发生切换.客户要求说明具体的原因. KingbaseES读写分离集群基本信息: KingbaseES集群信息 操作系统 Li ...

  5. C-07\字符串的输入输出及常用操作函数

    一.算法优化: 减少分支优化 // 求绝对值 int MyAbs(int n) { if (n < 0) { n = ~n + 1; } return n; } // 优化 int MyAbs( ...

  6. Git-01 简要介绍

    1 git简介 Git 是一个免费的.开源的分布式版本控制系统,可以快速高效地处理从小型到大型的各种项目. Git 易于学习,占地面积小,性能极快. 它具有廉价的本地库,方便的暂存区域和多个工作流分支 ...

  7. 【爬虫+数据清洗+可视化分析】舆情分析哔哩哔哩"狂飙"的评论

    目录 一.背景介绍 二.爬虫代码 2.1 展示爬取结果 2.2 爬虫代码讲解 三.可视化代码 3.1 读取数据 3.2 数据清洗 3.3 可视化 3.3.1 IP属地分析-柱形图 3.3.2 评论时间 ...

  8. 逆天的Zstack-工控机上测试

    放假前一直在服务器上折腾Zstack跑数据库,调IOC.正好手头有个32G内存,intel i7处理器的工控机,就试试装Zstack跑跑看,想着即使重负荷的跑不了,跑跑docker之类的也行.装成功后 ...

  9. DomDom

    DomDom 目录 DomDom 1 信息收集 1.1 端口扫描 1.2 后台目录扫描 1.2.1 目录分析 2 GetShell 2.1 尝试命令执行 2.2 nc反弹shell失败 2.3 PHP ...

  10. 基于PostGIS使用GeoServer发布数据量大的GPS轨迹路线图

    1. 引言 人类在行走或者驾驶过程中产生的GPS轨迹,是道路的一种采样,根据GPS轨迹路线,我们可以推知道路的存在,根据轨迹的密度,可以推知道路的热度以及重要性.如何才能在地图中显示大量的轨迹,这是一 ...