补充:基于项目的协同过滤推荐算法(Item-Based Collaborative Filtering Recommendation Algorithms)
前言
继续上篇博客,继续读论文。
想看上篇论文的同学可以点击这里
相关工作
In this section we briefly present some of the research literature related to collaborative filtering, recommender systems, data mining and personalization.
在本节中,我们简要介绍了一些与协同过滤、推荐系统、数据挖掘和个性化相关的研究文献。
Tapestry [10] is one of the earliest implementations of collaborative filtering-based recommender systems. This system relied on the explicit opinions of people from a close-knit community, such as an office workgroup. However, recommender system for large communities cannot depend on each person knowing the others. Later, several ratings-based automated recommender systems were developed. The GroupLens research system [19,16] provides a pseudonymous collaborative filtering solution for Usenet news and movies. Ringo[27] and Video Recommender[14] are email and webbased systems that generate recommendations on music and movies, respectively. A special issue of Communications of the ACM[20] presents a number of different recommender systems.
Tapestry[10]是最早实现的基于协同过滤的推荐系统之一。这个系统依赖于来自一个紧密联系的社区的人们的明确意见,比如一个办公室工作组。然而,大型社区的推荐系统不能依赖于每个人都相互了解。后来,几个基于评分的自动推荐系统被开发出来。GroupLens研究系统[19,16]为Usenet新闻和电影提供了一个假名协同过滤解决方案。Ringo[27]和Video Recommender[14]分别是基于电子邮件和网络生成音乐和电影推荐的系统。ACM[20]的通信专刊介绍了一些不同的推荐系统。
Other technologies have also been applied to recommender systems, including Bayesian networks, clustering, and Horting. Bayesian networks create a model based on a training set with a decision tree at each node and edges representing user information. The model can be built off-line over a matter of hours or days. The resulting model is very small, very fast, and essentially as accurate as nearest neighbor methods [6]. Bayesian networks may prove practical for environments in which knowledge of user preferences changes slowly with respect to the time needed to build the model but are not suitable for environments in which user preference models must be updated rapidly or frequently.
其他技术也被应用于推荐系统,包括贝叶斯网络、聚类和Horting。贝叶斯网络基于训练集创建一个模型,其中每个节点和边代表用户信息的决策树。该模型可以在几小时或几天内离线构建。由此产生的模型非常小,非常快,基本上与最近邻方法[6]一样准确。贝叶斯网络可能被证明是实用的,在这种环境中,用户偏好的知识相对于建立模型所需的时间变化缓慢,但不适合用户偏好模型必须快速或频繁更新的环境。
Clustering techniques work by identifying groups of users who appear to have similar preferences. Once the clusters are created, predictions for an individual can be made by av- eraging the opinions of the other users in that cluster. Some clustering techniques represent each user with partial participation in several clusters. The prediction is then an aver- age across the clusters, weighted by degree of participation. Clustering techniques usually produce less-personal recommendations than other methods, and in some cases, the clusters have worse accuracy than nearest neighbor algorithms [6]. Once the clustering is complete, however, performance can be very good, since the size of the group that must be analyzed is much smaller. Clustering techniques can also be applied as a "first step" for shrinking the candidate set in a nearest neighbor algorithm or for distributing nearestneighbor computation across several recommender engines. While dividing the population into clusters may hurt the accuracy or recommendations to users near the fringes of their assigned cluster, pre-clustering may be a worthwhile trade-off between accuracy and throughput.
聚类技术通过识别具有相似偏好的用户组来工作。一旦创建了集群,就可以通过对该集群中其他用户的意见进行平均来对个人进行预测。有些聚类技术将每个用户表示为部分参与多个聚类。然后,预测是通过参与程度加权的簇的平均值。聚类技术通常比其他方法产生更少的个性化推荐,在某些情况下,聚类的准确性比最近邻算法[6]更差。然而,一旦聚类完成,性能就会非常好,因为要分析的分组的规模要小得多。聚类技术也可以作为缩小最近邻算法候选集的“第一步”,或者在多个推荐引擎中分布最近邻计算。虽然将群体划分为簇可能会影响准确性或对用户所分配簇的边缘的推荐,但预聚类可能是准确性和吞吐量之间的一个值得权衡的问题。
Horting is a graph-based technique in which nodes are users, and edges between nodes indicate degree of similarity between two users [1]. Predictions are produced by walking the graph to nearby nodes and combining the opinions of the nearby users. Horting differs from nearest neighbor as the graph may be walked through other users who have not rated the item in question, thus exploring transitive relationships that nearest neighbor algorithms do not consider. In one study using synthetic data, Horting produced better predictions than a nearest neighbor algorithm [1].
Horting是一种基于图的技术,其中节点是用户,节点之间的边表示两个用户之间的相似程度[1]。预测是通过游走图到附近的节点并结合附近用户的意见来产生的。Horting与最近邻算法的不同之处在于,它可能会遍历没有对所讨论的项目进行评分的其他用户,从而探索最近邻算法没有考虑的传递关系。在一项使用合成数据的研究中,Horting比最近邻算法[1]产生了更好的预测。
Schafer et al., [26] present a detailed taxonomy and examples of recommender systems used in E-commerce and how they can provide one-to-one personalization and at the same can capture customer loyalty. Although these systems have been successful in the past, their widespread use has exposed some of their limitations such as the problems of sparsity in the data set, problems associated with high dimensionality and so on. Sparsity problem in recommender system has been addressed in [23,11]. The problems associated with high dimensionality in recommender systems have been discussed in [4], and application of dimensionality reduction techniques to address these issues has been investigated in [24].
Schafer等人,[26]介绍了电子商务中推荐系统的详细分类和例子,以及它们如何提供一对一的个性化,同时可以捕获客户忠诚度。尽管这些系统在过去取得了成功,但它们的广泛应用也暴露了一些局限性,如数据集稀疏性问题、高维相关问题等。推荐系统中的稀疏性问题在[23,11]中得到了解决。在[4]模型中讨论了推荐系统中的高维问题,在[24]模型中研究了如何利用降维技术来解决这些问题。
Our work explores the extent to which item-based recommenders, a new class of recommender algorithms, are able to solve these problems.
本文探讨了一类新的推荐算法——基于物品的推荐算法在多大程度上解决了这些问题。
贡献
This paper has three primary research contributions:
本文主要有三个研究贡献:
Analysis of the item-based prediction algorithms and identification of different ways to implement its subtasks.
Formulation of a precomputed model of item similarity to increase the online scalability of item-based recommendations.
An experimental comparison of the quality of several different item-based algorithms to the classic user-based (nearest neighbor) algorithms.
- 分析了基于项目的预测算法,并确定了实现其子任务的不同方法。
- 制定一个预先计算的项目相似度模型,以增加基于项目的推荐的在线可扩展性。
- 实验比较了几种不同的基于项目的算法与经典的基于用户的(最近邻)算法的质量。
补充:基于项目的协同过滤推荐算法(Item-Based Collaborative Filtering Recommendation Algorithms)的更多相关文章
- 基于物品的协同过滤推荐算法——读“Item-Based Collaborative Filtering Recommendation Algorithms” .
ligh@local-host$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.0.3 基于物品的协同过滤推荐算法--读"Item-Based ...
- 基于MapReduce的(用户、物品、内容)的协同过滤推荐算法
1.基于用户的协同过滤推荐算法 利用相似度矩阵*评分矩阵得到推荐列表 已经推荐过的置零 2.基于物品的协同过滤推荐算法 3.基于内容的推荐 算法思想:给用户推荐和他们之前喜欢的物品在内容上相似的物品 ...
- SparkMLlib—协同过滤推荐算法,电影推荐系统,物品喜好推荐
SparkMLlib-协同过滤推荐算法,电影推荐系统,物品喜好推荐 一.协同过滤 1.1 显示vs隐式反馈 1.2 实例介绍 1.2.1 数据说明 评分数据说明(ratings.data) 用户信息( ...
- SimRank协同过滤推荐算法
在协同过滤推荐算法总结中,我们讲到了用图模型做协同过滤的方法,包括SimRank系列算法和马尔科夫链系列算法.现在我们就对SimRank算法在推荐系统的应用做一个总结. 1. SimRank推荐算法的 ...
- Spark ML协同过滤推荐算法
一.简介 协同过滤算法[Collaborative Filtering Recommendation]算法是最经典.最常用的推荐算法.该算法通过分析用户兴趣,在用户群中找到指定用户的相似用户,综合这些 ...
- 基于局部敏感哈希的协同过滤推荐算法之E^2LSH
需要代码联系作者,不做义务咨询. 一.算法实现 基于p-stable分布,并以‘哈希技术分类’中的分层法为使用方法,就产生了E2LSH算法. E2LSH中的哈希函数定义如下: 其中,v为d维原始数据, ...
- 推荐系统| ② 离线推荐&基于隐语义模型的协同过滤推荐
一.离线推荐服务 离线推荐服务是综合用户所有的历史数据,利用设定的离线统计算法和离线推荐算法周期性的进行结果统计与保存,计算的结果在一定时间周期内是固定不变的,变更的频率取决于算法调度的频率. 离线推 ...
- 推荐召回--基于物品的协同过滤:ItemCF
目录 1. 前言 2. 原理&计算&改进 3. 总结 1. 前言 说完基于用户的协同过滤后,趁热打铁,我们来说说基于物品的协同过滤:"看了又看","买了又 ...
- Mahout之(二)协同过滤推荐
协同过滤 —— Collaborative Filtering 协同过滤简单来说就是根据目标用户的行为特征,为他发现一个兴趣相投.拥有共同经验的群体,然后根据群体的喜好来为目标用户过滤可能感兴趣的内容 ...
- 基于用户的协同过滤的电影推荐算法(tensorflow)
数据集: https://grouplens.org/datasets/movielens/ ml-latest-small 协同过滤算法理论基础 https://blog.csdn.net/u012 ...
随机推荐
- Dapr-2: 世界是分布式的
第 2 章 世界是分布的 只需要问任何达人:现代的.分布式的系统已经到来,单体应用已经过时. 但是,不仅是达人,渐进的 IT 领袖,企业架构师,以及精明的开发者,在探寻和评估现代分布式应用的时候,也在 ...
- maven常见命令之 -pl -am -amd
昨天maven的deploy任务需要只选择单个模块并且把它依赖的模块一起打包,第一时间便想到了-pl参数,然后就开始处理,但是因为之前只看了一下命令的介绍,竟然花了近半小时才完全跑通,故记录此文. 假 ...
- Qt开发经验小技巧101-110
如果需要在尺寸改变的时候不重绘窗体,则设置属性即可 this->setAttribute(Qt::WA_StaticContents, true); 这样可以避免可以避免对已经显示区域的重新绘制 ...
- Centos7安装VNCserver,并设置为开机自启动服务的方法
参考链接: 1.How To Install and Configure VNC Remote Access for the GNOME Desktop on CentOS 7 2.Centos7作为 ...
- [转]在WorldWind中加入*.x格式的三维模型
Nasa支持的WorldWind项目最近推出了1.4RC5版,可以加入三维模型,效果如下图所示: 点击查看大图 WW1.4对XML配置文件增加了许多新的元素,其中ModelFeature就是用来增加三 ...
- 字符编码技术专题(一):快速理解ASCII、Unicode、GBK和UTF-8
本文由阮一峰(ruanyifeng.com)分享,本文收录时有内容修订和排版优化. 1.引言 今天中午,我突然想搞清楚 Unicode 和 UTF-8 之间的关系,就开始查资料. 这个问题比我想象的复 ...
- 阿里IM技术分享(四):闲鱼亿级IM消息系统的可靠投递优化实践
本文由阿里闲鱼技术团队景松分享,原题"到达率99.9%:闲鱼消息在高速上换引擎(集大成)",有修订和改动,感谢作者的分享. 1.引言 在2020年年初的时候接手了闲鱼的IM即时消息 ...
- Python 并发编程实战:优雅地使用 concurrent.futures
在 Python 多线程编程中,concurrent.futures 模块提供了一个高层的接口来异步执行可调用对象.今天,我们将通过一个循序渐进的案例,深入了解如何使用这个强大的工具. 从一个模拟场景 ...
- [.NET] 单位转换实践:深入解析 Units.NET
单位转换实践:深入解析 Units.NET 摘要 在现代软件开发中,准确处理不同单位的转换是一个常见而复杂的需求.无论是处理温度.长度.重量还是其他物理量,都需要可靠的单位转换机制.本文将深入介绍 U ...
- 编译树莓派Linux内核
1.建议边看视频边跟着教程走 https://www.bilibili.com/video/av91990721?zw 2.准备工作 下载官方提供的交叉编译工具链 git clone https:// ...