基于项目的协同过滤推荐算法(Item-Based Collaborative Filtering Recommendation Algorithms)
前言
协同过滤推荐系统,包括基于用户的、基于项目的息肉通过率等,今天我们读一篇基于项目的协同过滤算法的论文。
今天读的论文为一篇名叫《基于项目的协同过滤推荐算法》(Item-Based Collaborative Filtering RecommendationAlgorithms)。
摘要
Recommender systems apply knowledge discovery techniques to the problem of making personalized recommendations for information, products or services during a live interaction. These systems, especially the k-nearest neighbor collaborative \x0cltering based ones, are achieving widespread success on the Web. The tremendous growth in the amount of available information and the number of visitors to Web sites in recent years poses some key challenges for recommender systems. These are: producing high quality recommendations, performing many recommendations per second for millions of users and items and achieving high coverage in the face of data sparsity. In traditional collaborative filtering systems the amount of work increases with the number of participants in the system. New recommender system technologies are needed that can quickly produce high quality recommendations, even for very large-scale problems. To address these issues we have explored item-based collaborative filtering techniques. Item-based techniques first analyze the user-item matrix to identify relationships between different items, and then use these relationships to indirectly compute recommendations for users.
推荐系统将知识发现技术应用于实时交互中,为信息、产品或服务提供个性化推荐。这些系统,特别是基于k近邻协作聚类的系统,在Web上取得了广泛的成功。近年来,网站可用信息量和访问量的急剧增长对推荐系统提出了严峻的挑战。这些是:产生高质量的推荐,每秒为数百万用户和物品执行多次推荐,以及在数据稀疏的情况下实现高覆盖率。在传统的协同过滤系统中,工作量会随着参与者数量的增加而增加。新的推荐系统技术需要能够快速产生高质量的推荐,即使是对于非常大规模的问题。为了解决这些问题,我们探索了基于物品的协同过滤技术。基于物品的推荐技术首先通过分析用户-物品矩阵来识别不同物品之间的关系,然后利用这些关系间接地为用户计算推荐。
In this paper we analyze different item-based recommendation generation algorithms. We look into different techniques for computing item-item similarities (e.g., item-item correlation vs. cosine similarities between item vectors) and different techniques for obtaining recommendations from them (e.g., weighted sum vs. regression model). Finally, we ex- perimentally evaluate our results and compare them to the basic k-nearest neighbor approach. Our experiments suggest that item-based algorithms provide dramatically better performance than user-based algorithms, while at the same time providing better quality than the best available userbased algorithms.
本文分析了不同的基于项目的推荐生成算法。我们研究了计算物品相似度的不同技术(例如物品之间的相关度和物品向量之间的余弦相似度),以及从中获得推荐的不同技术(例如加权和和回归模型)。最后,对实验结果进行评估,并与基本的k近邻方法进行比较。实验表明,基于物品的算法在性能上明显优于基于用户的算法,同时在质量上也优于现有的最好的基于用户的算法。
Sarwar B, Karypis G, Konstan J, et al. Item-based collaborative filtering recommendation algorithms[C]//Proceedings of the 10th international conference on World Wide Web. 2001: 285-295.
摘要部分主要内容
摘要主要介绍了传统的K近邻算法的缺陷:随着互联网技术的快速发展,对推荐系统产生了很大的冲击,文章提出了计算物品相似度的技术,并从中获得不同的推荐技术,最后分析实验结果,同时与K近邻算法比较,实验结果表明,协同过滤推荐算法更好。
引言
The amount of information in the world is increasing far more quickly than our ability to process it. All of us have known the feeling of being overwhelmed by the number of new books, journal articles, and conference proceedings coming out each year. Technology has dramatically reduced the barriers to publishing and distributing information. Now it is time to create the technologies that can help us sift through all the available information to find that which is most valuable to us.
世界上信息量的增长速度远远超过了我们处理信息的能力。我们都有过被每年涌现的新书、期刊文章和会议记录所淹没的感觉。科技极大地减少了出版和传播信息的障碍。现在是时候创造一种技术,帮助我们筛选所有可用的信息,找到对我们最有价值的信息。
One of the most promising such technologies is col laborative filtering [19,27,14,16]. Collaborative filtering works by building a database of preferences for items by users. A new user, Neo, is matched against the database to discover neighbors, which are other users who have historically had similar taste to Neo. Items that the neighbors like are then recommended to Neo, as he will probably also like them. Collaborative filtering has been very successful in both research and practice, and in both information filtering applications and E-commerce applications. However, there remain important research questions in overcoming two fundamental challenges for collaborative filtering recommender systems.
其中最有前途的技术之一是协同过滤。协同过滤的工作原理是建立用户对项目的偏好数据库。将新用户Neo与数据库进行匹配,以发现邻居,这些邻居是历史上与Neo有着相似品味的其他用户。邻居喜欢的物品会被推荐给Neo,因为他可能也会喜欢这些物品。协同过滤在信息过滤应用和电子商务应用中都取得了很大的成功。然而,在克服协同过滤推荐系统的两个基本挑战方面,仍然存在重要的研究问题。
The first challenge is to improve the scalability of the collaborative filtering algorithms. These algorithms are able to search tens of thousands of potential neighbors in real-time, but the demands of modern systems are to search tens of millions of potential neighbors. Further, existing algorithms have performance problems with individual users for whomthe site has large amounts of information. For instance, if a site is using browsing patterns as indications of con- tent preference, it may have thousands of data points for its most frequent visitors. These "long user rows" slow down the number of neighbors that can be searched per second, further reducing scalability.
第一个挑战是提高协同过滤算法的可扩展性。这些算法能够实时搜索数以万计的潜在邻居,但现代系统的需求是搜索数以千万计的潜在邻居。此外,现有算法在处理拥有大量网站信息的个人用户时存在性能问题。例如,如果一个网站使用浏览模式作为内容偏好的指示,那么它可能有数千个最频繁访问者的数据点。这些“长用户行”减慢了每秒可以搜索的邻居的数量,进一步降低了可伸缩性。
The second challenge is to improve the quality of the recommendations for the users. Users need recommendations they can trust to help them find items they will like. Users will "vote with their feet" by refusing to use recommender systems that are not consistently accurate for them.
第二个挑战是提高用户推荐的质量。用户需要他们信任的推荐来帮助他们找到他们喜欢的东西。用户将“用脚投票”,拒绝使用对他们来说不始终准确的推荐系统。
In some ways these two challenges are in con ict, since the less time an algorithm spends searching for neighbors, the more scalable it will be, and the worse its quality. For this reason, it is important to treat the two challenges simultaneously so the solutions discovered are both useful and practical.
在某些方面,这两个挑战是相互冲突的,因为算法搜索邻居的时间越少,它的可扩展性就越强,质量就越差。因此,同时处理这两个挑战非常重要,这样所发现的解决方案才既有用又实用。
In this paper, we address these issues of recommender systems by applying a different approach{item-based algorithm. The bottleneck in conventional collaborative filtering algorithms is the search for neighbors among a large user population of potential neighbors [12]. Item-based algorithms avoid this bottleneck by exploring the relationships between items first, rather than the relationships between users. Recommendations for users are computed by finding items that are similar to other items the user has liked. Because the relationships between items are relatively static,item-based algorithms may be able to provide the same quality as the user-based algorithms with less online computation.
在本文中,我们通过应用一种不同的方法(基于项目的算法)来解决推荐系统的这些问题。传统协同过滤算法的瓶颈是在大量潜在邻居用户群中搜索邻居。基于项目的算法通过首先探索项目之间的关系而不是用户之间的关系来避免这个瓶颈。对用户的推荐是通过查找与用户喜欢的其他物品相似的物品来计算的。因为项目之间的关系是相对静态的,基于项目的算法可能能够提供与基于用户的算法相同的质量,并且在线计算较少。
结尾
今天的论文就先读到这里了,今天主要学习相关概念与知识,下次再补充详细的信息吧。
2024-01-28 18:05:28 星期日
这几天有点忙,忘记上传补充内容了,今天有时间补充一下,
补充:查看补充内容,请访问 补充:基于项目的协同过滤推荐算法(Item-Based Collaborative Filtering Recommendation Algorithms)
基于项目的协同过滤推荐算法(Item-Based Collaborative Filtering Recommendation Algorithms)的更多相关文章
- 基于物品的协同过滤推荐算法——读“Item-Based Collaborative Filtering Recommendation Algorithms” .
ligh@local-host$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.0.3 基于物品的协同过滤推荐算法--读"Item-Based ...
- 基于MapReduce的(用户、物品、内容)的协同过滤推荐算法
1.基于用户的协同过滤推荐算法 利用相似度矩阵*评分矩阵得到推荐列表 已经推荐过的置零 2.基于物品的协同过滤推荐算法 3.基于内容的推荐 算法思想:给用户推荐和他们之前喜欢的物品在内容上相似的物品 ...
- SparkMLlib—协同过滤推荐算法,电影推荐系统,物品喜好推荐
SparkMLlib-协同过滤推荐算法,电影推荐系统,物品喜好推荐 一.协同过滤 1.1 显示vs隐式反馈 1.2 实例介绍 1.2.1 数据说明 评分数据说明(ratings.data) 用户信息( ...
- SimRank协同过滤推荐算法
在协同过滤推荐算法总结中,我们讲到了用图模型做协同过滤的方法,包括SimRank系列算法和马尔科夫链系列算法.现在我们就对SimRank算法在推荐系统的应用做一个总结. 1. SimRank推荐算法的 ...
- Spark ML协同过滤推荐算法
一.简介 协同过滤算法[Collaborative Filtering Recommendation]算法是最经典.最常用的推荐算法.该算法通过分析用户兴趣,在用户群中找到指定用户的相似用户,综合这些 ...
- 基于局部敏感哈希的协同过滤推荐算法之E^2LSH
需要代码联系作者,不做义务咨询. 一.算法实现 基于p-stable分布,并以‘哈希技术分类’中的分层法为使用方法,就产生了E2LSH算法. E2LSH中的哈希函数定义如下: 其中,v为d维原始数据, ...
- 推荐系统| ② 离线推荐&基于隐语义模型的协同过滤推荐
一.离线推荐服务 离线推荐服务是综合用户所有的历史数据,利用设定的离线统计算法和离线推荐算法周期性的进行结果统计与保存,计算的结果在一定时间周期内是固定不变的,变更的频率取决于算法调度的频率. 离线推 ...
- 推荐召回--基于物品的协同过滤:ItemCF
目录 1. 前言 2. 原理&计算&改进 3. 总结 1. 前言 说完基于用户的协同过滤后,趁热打铁,我们来说说基于物品的协同过滤:"看了又看","买了又 ...
- Mahout之(二)协同过滤推荐
协同过滤 —— Collaborative Filtering 协同过滤简单来说就是根据目标用户的行为特征,为他发现一个兴趣相投.拥有共同经验的群体,然后根据群体的喜好来为目标用户过滤可能感兴趣的内容 ...
- 基于用户的协同过滤的电影推荐算法(tensorflow)
数据集: https://grouplens.org/datasets/movielens/ ml-latest-small 协同过滤算法理论基础 https://blog.csdn.net/u012 ...
随机推荐
- 加速人民币国际化,CIPS迎来三大变化
何谓CIPS? 人民币跨境支付系统(Cross-border Interbank Payment System,简称CIPS)是由中国人民银行组织开发的独立支付系统,为境内外金融机构人民币跨境和离岸业 ...
- Docker 部署数据可视化 Superset 3.0.0 深度汉化并配置元数据存储为 Postgres
services: postgres: image: postgres:14.10 container_name: postgres hostname: postgres environment: P ...
- 如何判断平台是x86还是arm
case $(uname -m) in x86_64) echo x86;; aarch64) echo arm;; esac ref 上面的代码片改自这里 https://stackoverflow ...
- MySQL said: Authentication plugin 'caching_sha2_password' cannot be loaded
OUTLINE问题描述解决方案问题描述在mac下,用sequel pro连接数据库,出现以下问题: MySQL said: Authentication plugin 'caching_sha2_pa ...
- [转]CMake学习笔记(一)基本概念介绍、入门教程及CLion安装配置
原文链接:CMake学习笔记(一)基本概念介绍.入门教程及CLion安装配置
- PaperAssistant:使用Microsoft.Extensions.AI实现
前言 上篇文章介绍了使用Semantic Kernel Chat Completion Agent实现的版本. 使用C#构建一个论文总结AI Agent 今天来介绍一下使用Microsoft.Exte ...
- 【量化读书笔记】【打开量化投资的黑箱】CH.05. 交易成本模型
交易是有成本的,除非有足够的理由,否则便不应该进行交易. 交易的原因 增加盈利的期望值 降低亏损的期望值 对交易成本的估计 过低,会导致交易过于频繁,损失扩大. 过高,导致交易次数少,持仓时间过长. ...
- sqlserver空间数据 + c# 实现查询附近的设备
前言 一个小需求的实现,做一个备忘,个人理解,可能存在错误. 客户有很多设备,这些设备分散在不同的地方,现在需要通过小程序获取附近的(比如1000米)之类的设备列表,以距离排序 第一个想到的的是找百度 ...
- 前端面试100-copy
1.一些开放性题目 1.自我介绍:除了基本个人信息以外,面试官更想听的是你与众不同的地方和你的优势. 2.项目介绍 3.如何看待前端开发? 4.平时是如何学习前端开发的? 5.未来三到五年的规划是怎样 ...
- SpringBoot集成Jwt(详细步骤+图解)
SpringBoot集成Jwt(详细步骤+图解)Jwt简介 JSON Web Token是目前最流行的跨域认证解决方案,,适合前后端分离项目通过Restful API进行数据交互时进行身份认证 Jwt ...