基于baseline和stochastic gradient descent的个性化推荐系统
koren论文中用到netflix 数据集, 过于大, 在普通的pc机上运行时间很长很长。考虑到写文章目地主要是已介绍总结方法为主,所以采用Movielens 数据集。
要用到的变量介绍:
Baseline estimates
object function:
梯度变化(利用stochastic gradient descent算法使上述的目标函数值,在设定的迭代次数内,降到最小)
系统评判标准:
参数设置:
迭代次数maxStep = 100, 学习速率(梯度变化速率)取0.99 还有的其他参数设置参考引用论文[2]
具体的代码实现
'''''
Created on Dec 11, 2012 @Author: Dennis Wu
@E-mail: hansel.zh@gmail.com
@Homepage: http://blog.csdn.net/wuzh670 Data set download from : http://www.grouplens.org/system/files/ml-100k.zip '''
from operator import itemgetter, attrgetter
from math import sqrt
import random def load_data(): train = {}
test = {} filename_train = 'data/ua.base'
filename_test = 'data/ua.test' for line in open(filename_train):
(userId, itemId, rating, timestamp) = line.strip().split('\t')
train.setdefault(userId,{})
train[userId][itemId] = float(rating) for line in open(filename_test):
(userId, itemId, rating, timestamp) = line.strip().split('\t')
test.setdefault(userId,{})
test[userId][itemId] = float(rating) return train, test def calMean(train):
sta = 0
num = 0
for u in train.keys():
for i in train[u].keys():
sta += train[u][i]
num += 1
mean = sta*1.0/num
return mean def initialBias(train, userNum, movieNum): mean = calMean(train)
bu = {}
bi = {}
biNum = {}
buNum = {} u = 1
while u < (userNum+1):
su = str(u)
for i in train[su].keys():
bi.setdefault(i,0)
biNum.setdefault(i,0)
bi[i] += (train[su][i] - mean)
biNum[i] += 1
u += 1 i = 1
while i < (movieNum+1):
si = str(i)
biNum.setdefault(si,0)
if biNum[si] >= 1:
bi[si] = bi[si]*1.0/(biNum[si]+25)
else:
bi[si] = 0.0
i += 1 u = 1
while u < (userNum+1):
su = str(u)
for i in train[su].keys():
bu.setdefault(su,0)
buNum.setdefault(su,0)
bu[su] += (train[su][i] - mean - bi[i])
buNum[su] += 1
u += 1 u = 1
while u < (userNum+1):
su = str(u)
buNum.setdefault(su,0)
if buNum[su] >= 1:
bu[su] = bu[su]*1.0/(buNum[su]+10)
else:
bu[su] = 0.0
u += 1 return bu,bi,mean def sgd(train, test, userNum, movieNum): bu, bi, mean = initialBias(train, userNum, movieNum) alpha1 = 0.002
beta1 = 0.1
slowRate = 0.99
step = 0
preRmse = 1000000000.0
nowRmse = 0.0
while step < 100:
rmse = 0.0
n = 0
for u in train.keys():
for i in train[u].keys():
pui = 1.0 * (mean + bu[u] + bi[i])
eui = train[u][i] - pui
rmse += pow(eui,2)
n += 1
bu[u] += alpha1 * (eui - beta1 * bu[u])
bi[i] += alpha1 * (eui - beta1 * bi[i]) nowRmse = sqrt(rmse*1.0/n)
print 'step: %d Rmse: %s' % ((step+1), nowRmse)
if (nowRmse < preRmse):
preRmse = nowRmse
alpha1 *= slowRate
step += 1
return bu, bi, mean def calRmse(test, bu, bi, mean): rmse = 0.0
n = 0
for u in test.keys():
for i in test[u].keys():
pui = 1.0 * (mean + bu[u] + bi[i])
eui = pui - test[u][i]
rmse += pow(eui,2)
n += 1
rmse = sqrt(rmse*1.0 / n)
return rmse; if __name__ == "__main__": # load data
train, test = load_data() # baseline + stochastic gradient descent
bu, bi, mean = sgd(train, test, 943, 1682) # compute the rmse of test set
print 'the Rmse of test test is: %s' % calRmse(test, bu, bi, mean)
实验结果
REFERENCES
1.Y. Koren. Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model. Proc. 14th ACM SIGKDD Int. Conf. On Knowledge Discovery and Data Mining (KDD’08), pp. 426–434, 2008.
2. Y.Koren. The BellKor Solution to the Netflix Grand Prize 2009
基于baseline和stochastic gradient descent的个性化推荐系统的更多相关文章
- 基于baseline、svd和stochastic gradient descent的个性化推荐系统
文章主要介绍的是koren 08年发的论文[1], 2.3部分内容(其余部分会陆续补充上来).koren论文中用到netflix 数据集, 过于大, 在普通的pc机上运行时间很长很长.考虑到写文章目 ...
- FITTING A MODEL VIA CLOSED-FORM EQUATIONS VS. GRADIENT DESCENT VS STOCHASTIC GRADIENT DESCENT VS MINI-BATCH LEARNING. WHAT IS THE DIFFERENCE?
FITTING A MODEL VIA CLOSED-FORM EQUATIONS VS. GRADIENT DESCENT VS STOCHASTIC GRADIENT DESCENT VS MIN ...
- Stochastic Gradient Descent
一.从Multinomial Logistic模型说起 1.Multinomial Logistic 令为维输入向量; 为输出label;(一共k类); 为模型参数向量: Multinomial Lo ...
- Stochastic Gradient Descent 随机梯度下降法-R实现
随机梯度下降法 [转载时请注明来源]:http://www.cnblogs.com/runner-ljt/ Ljt 作为一个初学者,水平有限,欢迎交流指正. 批量梯度下降法在权值更新前对所有样本汇总 ...
- 机器学习-随机梯度下降(Stochastic gradient descent)
sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频) https://study.163.com/course/introduction.htm?courseId=1005269003& ...
- 几种梯度下降方法对比(Batch gradient descent、Mini-batch gradient descent 和 stochastic gradient descent)
https://blog.csdn.net/u012328159/article/details/80252012 我们在训练神经网络模型时,最常用的就是梯度下降,这篇博客主要介绍下几种梯度下降的变种 ...
- Stochastic Gradient Descent收敛判断及收敛速度的控制
要判断Stochastic Gradient Descent是否收敛,可以像Batch Gradient Descent一样打印出iteration的次数和Cost的函数关系图,然后判断曲线是否呈现下 ...
- Gradient Descent 和 Stochastic Gradient Descent(随机梯度下降法)
Gradient Descent(Batch Gradient)也就是梯度下降法是一种常用的的寻找局域最小值的方法.其主要思想就是计算当前位置的梯度,取梯度反方向并结合合适步长使其向最小值移动.通过柯 ...
- 随机梯度下降法(Stochastic gradient descent, SGD)
BGD(Batch gradient descent)批量梯度下降法:每次迭代使用所有的样本(样本量小) Mold 一直在更新 SGD(Stochastic gradientdescent)随机 ...
随机推荐
- bzoj 3579: 破冰派对
题意: 给你一个图,问你有多少个方案把他分成连个新的图.使得一个图是一个团,另外一个是独立集 一些闲话: 以前做过一次这个题..当时听说爆搜可以过,就无脑莽过去了.. 也没有思考为什么爆搜能过,或者有 ...
- Windows系统查看xxx.dll、xxx.lib文件的导出函数、依赖文件等信息的方法
1.查看xxx.dll或xxx.exe文件的导出函数.依赖文件等信息,使用Depends软件即可. 2.查看xxx.lib文件的导出函数.依赖文件等信息,使用Visual Studio附带工具dump ...
- C++之指针与数组区别
C++/C程序中,数组要么在静态存储区被创建(如全局数组),要么在栈上被创建.数组名对应着(而不是指向)一块内存,其地址与容量在生命期内保持不变,只有数组的内容可以改变.指针可以随时指向任意类型的内存 ...
- leetcode-第10周双周赛-5081-歩进数
题目描述: 自己的提交:参考全排列 class Solution: def countSteppingNumbers(self, low: int, high: int) -> List[int ...
- [JZOJ3234] 阴阳
题目 题目大意 有一棵树,每条边有\(0\)或\(1\)两种颜色. 求满足存在\((u,v)\)路径上的点\(x\)使得\((u,x)\)和\((x,v)\)路径上的两种颜色出现次数相同的点对\((u ...
- Windows ping
用法: ping [-t] [-a] [-n count] [-l size] [-f] [-i TTL] [-v TOS] [-r count] [-s count] [[-j ...
- identifier of an instance of xx.entity was altered from xxKey@249e3cb2 to xxKey@74e8f4a3; nested exception is org.hibernate.HibernateException: identifier of an instance of xxentity was altered from错误
用entityManager保存数据时报错如下 identifier of an instance of xx.entity was altered from xxKey@249e3cb2 to xx ...
- 图片压缩(js压缩,底部有vue压缩图片依赖使用的教程链接)
directTurnIntoBase64(fileObj, callback) { var r = new FileReader(); // 转成base64 r.onload = function( ...
- flutter中的BuildContext
https://www.jianshu.com/p/509b77b26b78
- JS继承(简单理解版)
童鞋们,我们今天聊聊js的继承,关于继承,平时开发基本用不到,但是面试没有不考的,我就想问,这是人干的事吗? 好吧,迫于社会主义核心价值观,我们今天就来简单说一说js的继承,谁让它是面向对象编程很重要 ...