simrank python实现
1、数据
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,bestbuy.com
camera,bestbuy.com
camera,bestbuy.com
camera,bestbuy.com
camera,bestbuy.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
2、simrank 的python实现
import numpy as np
from numpy import matrix with open('sample1 (1).txt','r') as log_fp:
logs = [log.strip() for log in log_fp.readlines()]
# print(logs)
logs_tuple = [tuple(log.split(",")) for log in logs]
# print (logs_tuple) queries = list(set([log[0] for log in logs_tuple]))
# print(queries) #['digital camera', 'flower', 'pc', 'camera', 'tv']
ads = list(set([log[1] for log in logs_tuple]))
# print(ads)#['hp.com', 'teleflora.com', 'bestbuy.com', 'orchids.com'] graph = np.matrix(np.zeros([len(queries),len(ads)]))
# print(graph) #6行4列的0矩阵 for log in logs_tuple:
query = log[0]
ad = log[1]
q_i = queries.index(query)
a_j = ads.index(ad)
graph[q_i,a_j] +=1
print(graph) query_sim = matrix(np.identity(len(queries)))
print(query_sim)
ad_sim = matrix(np.identity(len(ads)))
print(ad_sim) def get_ads_num(query):
q_i = queries.index(query)
return graph[q_i] def get_queries_num(ad):
a_j = ads.index(ad)
return graph.transpose()[a_j] def get_ads(query):
series = get_ads_num(query).tolist()[0]
return [ads[x] for x in range(len(series)) if series[x] > 0] def get_queries(ad):
series = get_queries_num(ad).tolist()[0]
return [queries[x] for x in range(len(series)) if series[x] > 0] def query_simrank(q1,q2,c):
if q1 == q2 :
return 1
prefix = c/(get_ads_num(q1).sum() *get_ads_num(q2).sum())
postfix = 0
for ad_i in get_ads(q1):
for ad_j in get_ads(q2):
i = ads.index(ad_i)
j = ads.index(ad_j)
postfix += ad_sim[i,j]
return prefix*postfix def ad_simrank(a1,a2,c):
if a1 == a2 :
return 1
prefix = c/(get_queries_num(a1).sum()*get_queries_num(a2).sum())
postfix = 0
for query_i in get_queries(a1):
for query_j in get_queries(a2):
i = queries.index(query_i)
j = queries.index(query_j)
postfix += query_sim[i,j]
return prefix*postfix def simrank(c=0.8,times = 1):
global query_sim,ad_sim for run in range(times):
new_query_sim = matrix(np.identity(len(queries)))
for qi in queries:
for qj in queries:
i = queries.index(qi)
j = queries.index(qj)
new_query_sim[i,j] =query_simrank(qi,qj,c) new_ad_sim = matrix(np.identity(len(ads)))
for ai in ads:
for aj in ads :
i = ads.index(ai)
j = ads.index(aj)
new_ad_sim[i,j] =ad_simrank(ai,aj,c) query_sim = new_query_sim
ad_sim = new_ad_sim if __name__ == '__main__':
print (queries)
print(ads)
simrank()
print(query_sim)
print(ad_sim)
[[15. 0. 0. 0.]
[ 0. 0. 10. 0.]
[ 5. 0. 20. 0.]
[ 7. 0. 30. 0.]
[ 0. 16. 0. 15.]]
[[1. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 0. 1.]]
[[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]
['tv', 'pc', 'camera', 'digital camera', 'flower']
['bestbuy.com', 'teleflora.com', 'hp.com', 'orchids.com']
[[1. 0. 0.00213333 0.00144144 0. ]
[0. 1. 0.0032 0.00216216 0. ]
[0.00213333 0.0032 1. 0.00172973 0. ]
[0.00144144 0.00216216 0.00172973 1. 0. ]
[0. 0. 0. 0. 1. ]]
[[1.00000000e+00 0.00000000e+00 9.87654321e-04 0.00000000e+00]
[0.00000000e+00 1.00000000e+00 0.00000000e+00 3.33333333e-03]
[9.87654321e-04 0.00000000e+00 1.00000000e+00 0.00000000e+00]
[0.00000000e+00 3.33333333e-03 0.00000000e+00 1.00000000e+00]]
simrank python实现的更多相关文章
- Python中的多进程与多线程(一)
一.背景 最近在Azkaban的测试工作中,需要在测试环境下模拟线上的调度场景进行稳定性测试.故而重操python旧业,通过python编写脚本来构造类似线上的调度场景.在脚本编写过程中,碰到这样一个 ...
- Python高手之路【六】python基础之字符串格式化
Python的字符串格式化有两种方式: 百分号方式.format方式 百分号的方式相对来说比较老,而format方式则是比较先进的方式,企图替换古老的方式,目前两者并存.[PEP-3101] This ...
- Python 小而美的函数
python提供了一些有趣且实用的函数,如any all zip,这些函数能够大幅简化我们得代码,可以更优雅的处理可迭代的对象,同时使用的时候也得注意一些情况 any any(iterable) ...
- JavaScript之父Brendan Eich,Clojure 创建者Rich Hickey,Python创建者Van Rossum等编程大牛对程序员的职业建议
软件开发是现时很火的职业.据美国劳动局发布的一项统计数据显示,从2014年至2024年,美国就业市场对开发人员的需求量将增长17%,而这个增长率比起所有职业的平均需求量高出了7%.很多人年轻人会选择编 ...
- 可爱的豆子——使用Beans思想让Python代码更易维护
title: 可爱的豆子--使用Beans思想让Python代码更易维护 toc: false comments: true date: 2016-06-19 21:43:33 tags: [Pyth ...
- 使用Python保存屏幕截图(不使用PIL)
起因 在极客学院讲授<使用Python编写远程控制程序>的课程中,涉及到查看被控制电脑屏幕截图的功能. 如果使用PIL,这个需求只需要三行代码: from PIL import Image ...
- Python编码记录
字节流和字符串 当使用Python定义一个字符串时,实际会存储一个字节串: "abc"--[97][98][99] python2.x默认会把所有的字符串当做ASCII码来对待,但 ...
- Apache执行Python脚本
由于经常需要到服务器上执行些命令,有些命令懒得敲,就准备写点脚本直接浏览器调用就好了,比如这样: 因为线上有现成的Apache,就直接放它里面了,当然访问安全要设置,我似乎别的随笔里写了安全问题,这里 ...
- python开发编译器
引言 最近刚刚用python写完了一个解析protobuf文件的简单编译器,深感ply实现词法分析和语法分析的简洁方便.乘着余热未过,头脑清醒,记下一点总结和心得,方便各位pythoner参考使用. ...
随机推荐
- [CSP-S模拟测试]:big(Trie树+贪心)
题目描述 你需要在$[0,2^n)$中选一个整数$x$,接着把$x$依次异或$m$个整数$a_1~a_m$.在你选出$x$后,你的对手需要选择恰好一个时刻(刚选完数时.异或一些数后或是最后),将$x$ ...
- SQL Server函数大全(三)----Union与Union All的区别
如果我们需要将两个select语句的结果作为一个整体显示出来,我们就需要用到union或者union all关键字.union(或称为联合)的作用是将多个结果合并在一起显示出来. union和unio ...
- React-Native 之 GD (九)POST 网络请求封装
1.POST /** * * POST请求 * * @param url * @param params {}包装 * @param headers * * @return {Promise} * * ...
- 洛谷P2657 windy数
传送 裸的数位dp 看这个题面,要求相邻两个数字之差至少为2,所以我们记录当前填的数的最后一位 同时要考虑毒瘤的前导0.如果填的数前面都是0,则这一位填0是合法的. emmm具体的看代码叭 #incl ...
- (转)flexpaper 参数
本文转载自:http://blog.csdn.net/z69183787/article/details/18659913 Flexpaper可能用到如下参数 SwfFile (String) 需 ...
- fedora23帮定键盘系统操作快捷键
在All settings -> keyboard 主要是以super为主, 然后有 super+ shift+...虽然感觉用 ctrl+super+... 来组合更方便, 但是用 shift ...
- string与wstring的互相转换接口(Windows版本)
string与wstring的互相转换接口(Windows版本) std::wstring stringToWstring( const std::string & str ) { LPCST ...
- 剑指offer--day03
1.1题目:斐波那契数列:大家都知道斐波那契数列,现在要求输入一个整数n,请你输出斐波那契数列的第n项(从0开始,第0项为0).n<=39 1.2解题思路:斐波那契数列公式为: 这道题递归很好写 ...
- OuterXml和InnerXml
例如 <bkk> <rp fe="few" > <fe>fff</fe> </rp> </bkk> 对于fe ...
- Spring IoC,IoC原理
一.IoC概念及原理 IOC的别名:依赖注入(DI) 2004年,Martin Fowler探讨了同一个问题,既然IOC是控制反转,那么到底是“哪些方面的控制被反转了呢?”,经过详细地分析和论证后,他 ...