1、数据

pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,bestbuy.com
camera,bestbuy.com
camera,bestbuy.com
camera,bestbuy.com
camera,bestbuy.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com

2、simrank 的python实现

import numpy as np
from numpy import matrix with open('sample1 (1).txt','r') as log_fp:
logs = [log.strip() for log in log_fp.readlines()]
# print(logs)
logs_tuple = [tuple(log.split(",")) for log in logs]
# print (logs_tuple) queries = list(set([log[0] for log in logs_tuple]))
# print(queries) #['digital camera', 'flower', 'pc', 'camera', 'tv']
ads = list(set([log[1] for log in logs_tuple]))
# print(ads)#['hp.com', 'teleflora.com', 'bestbuy.com', 'orchids.com'] graph = np.matrix(np.zeros([len(queries),len(ads)]))
# print(graph) #6行4列的0矩阵 for log in logs_tuple:
query = log[0]
ad = log[1]
q_i = queries.index(query)
a_j = ads.index(ad)
graph[q_i,a_j] +=1
print(graph) query_sim = matrix(np.identity(len(queries)))
print(query_sim)
ad_sim = matrix(np.identity(len(ads)))
print(ad_sim) def get_ads_num(query):
q_i = queries.index(query)
return graph[q_i] def get_queries_num(ad):
a_j = ads.index(ad)
return graph.transpose()[a_j] def get_ads(query):
series = get_ads_num(query).tolist()[0]
return [ads[x] for x in range(len(series)) if series[x] > 0] def get_queries(ad):
series = get_queries_num(ad).tolist()[0]
return [queries[x] for x in range(len(series)) if series[x] > 0] def query_simrank(q1,q2,c):
if q1 == q2 :
return 1
prefix = c/(get_ads_num(q1).sum() *get_ads_num(q2).sum())
postfix = 0
for ad_i in get_ads(q1):
for ad_j in get_ads(q2):
i = ads.index(ad_i)
j = ads.index(ad_j)
postfix += ad_sim[i,j]
return prefix*postfix def ad_simrank(a1,a2,c):
if a1 == a2 :
return 1
prefix = c/(get_queries_num(a1).sum()*get_queries_num(a2).sum())
postfix = 0
for query_i in get_queries(a1):
for query_j in get_queries(a2):
i = queries.index(query_i)
j = queries.index(query_j)
postfix += query_sim[i,j]
return prefix*postfix def simrank(c=0.8,times = 1):
global query_sim,ad_sim for run in range(times):
new_query_sim = matrix(np.identity(len(queries)))
for qi in queries:
for qj in queries:
i = queries.index(qi)
j = queries.index(qj)
new_query_sim[i,j] =query_simrank(qi,qj,c) new_ad_sim = matrix(np.identity(len(ads)))
for ai in ads:
for aj in ads :
i = ads.index(ai)
j = ads.index(aj)
new_ad_sim[i,j] =ad_simrank(ai,aj,c) query_sim = new_query_sim
ad_sim = new_ad_sim if __name__ == '__main__':
print (queries)
print(ads)
simrank()
print(query_sim)
print(ad_sim)
[[15.  0.  0.  0.]
[ 0. 0. 10. 0.]
[ 5. 0. 20. 0.]
[ 7. 0. 30. 0.]
[ 0. 16. 0. 15.]]
[[1. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 0. 1.]]
[[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]
['tv', 'pc', 'camera', 'digital camera', 'flower']
['bestbuy.com', 'teleflora.com', 'hp.com', 'orchids.com']
[[1. 0. 0.00213333 0.00144144 0. ]
[0. 1. 0.0032 0.00216216 0. ]
[0.00213333 0.0032 1. 0.00172973 0. ]
[0.00144144 0.00216216 0.00172973 1. 0. ]
[0. 0. 0. 0. 1. ]]
[[1.00000000e+00 0.00000000e+00 9.87654321e-04 0.00000000e+00]
[0.00000000e+00 1.00000000e+00 0.00000000e+00 3.33333333e-03]
[9.87654321e-04 0.00000000e+00 1.00000000e+00 0.00000000e+00]
[0.00000000e+00 3.33333333e-03 0.00000000e+00 1.00000000e+00]]

simrank python实现的更多相关文章

  1. Python中的多进程与多线程(一)

    一.背景 最近在Azkaban的测试工作中,需要在测试环境下模拟线上的调度场景进行稳定性测试.故而重操python旧业,通过python编写脚本来构造类似线上的调度场景.在脚本编写过程中,碰到这样一个 ...

  2. Python高手之路【六】python基础之字符串格式化

    Python的字符串格式化有两种方式: 百分号方式.format方式 百分号的方式相对来说比较老,而format方式则是比较先进的方式,企图替换古老的方式,目前两者并存.[PEP-3101] This ...

  3. Python 小而美的函数

    python提供了一些有趣且实用的函数,如any all zip,这些函数能够大幅简化我们得代码,可以更优雅的处理可迭代的对象,同时使用的时候也得注意一些情况   any any(iterable) ...

  4. JavaScript之父Brendan Eich,Clojure 创建者Rich Hickey,Python创建者Van Rossum等编程大牛对程序员的职业建议

    软件开发是现时很火的职业.据美国劳动局发布的一项统计数据显示,从2014年至2024年,美国就业市场对开发人员的需求量将增长17%,而这个增长率比起所有职业的平均需求量高出了7%.很多人年轻人会选择编 ...

  5. 可爱的豆子——使用Beans思想让Python代码更易维护

    title: 可爱的豆子--使用Beans思想让Python代码更易维护 toc: false comments: true date: 2016-06-19 21:43:33 tags: [Pyth ...

  6. 使用Python保存屏幕截图(不使用PIL)

    起因 在极客学院讲授<使用Python编写远程控制程序>的课程中,涉及到查看被控制电脑屏幕截图的功能. 如果使用PIL,这个需求只需要三行代码: from PIL import Image ...

  7. Python编码记录

    字节流和字符串 当使用Python定义一个字符串时,实际会存储一个字节串: "abc"--[97][98][99] python2.x默认会把所有的字符串当做ASCII码来对待,但 ...

  8. Apache执行Python脚本

    由于经常需要到服务器上执行些命令,有些命令懒得敲,就准备写点脚本直接浏览器调用就好了,比如这样: 因为线上有现成的Apache,就直接放它里面了,当然访问安全要设置,我似乎别的随笔里写了安全问题,这里 ...

  9. python开发编译器

    引言 最近刚刚用python写完了一个解析protobuf文件的简单编译器,深感ply实现词法分析和语法分析的简洁方便.乘着余热未过,头脑清醒,记下一点总结和心得,方便各位pythoner参考使用. ...

随机推荐

  1. 设计模式之动态代理(Java的JDK动态代理实现)

    先来看一下思维导图: 对于JDK的动态代理,孔浩老师说学习的方法是把它记下来. 先写一个主题接口类,表示要完成的一个主题. package com.liwei.dynaproxy; /** * 要代理 ...

  2. IDEA中代码不小心删除,或者改了半天想回退到某个特定时间怎么办?

    第一步: 第二步: 第三步: 第四步:

  3. Linux shell - ps,wc命令用法

    例1. 查看Oracle数据库活动进程LOCAL=NO,输出行数 oracle@sha> ps -ef|grep LOCAL=NO|wc -l 15 解释:ps -ef是查看所有的进程的 然后用 ...

  4. 用Python给头像加上圣诞帽或圣诞老人小图标

    随着圣诞的到来,想给给自己的头像加上一顶圣诞帽.如果不是头像,就加一个圣诞老人陪伴.   用Python给头像加上圣诞帽,看了下大概也都是来自2017年大神的文章:https://zhuanlan.z ...

  5. codeforces 704B - Ant Man [想法题]

    题目链接:http://codeforces.com/problemset/problem/704/B ------------------------------------------------ ...

  6. 二十五、python中pickle序列学习(仅python语言中有)

    1.pickle序列介绍:提供4个关键字:dumps,dump,loads,load 语法:f.write(pickle.dumps(dict))=pickle.dump(dict,f) " ...

  7. Android安全测试(二)反编译检测

    1.测试环境 SDK: Java JDK, Android SDK. 工具: 7zip, dex2jar, jd-gui 2.操作步骤 第一步:把apk改后缀名为zip 第二步:将zip文件解压,得到 ...

  8. linux安装 redis(redis-3.0.2.tar.gz) 和 mongodb(mongodb-linux-x86_64-rhel62-4.0.0)

    1:首先 要下载 这两个 压缩包 注意:liunx是否已经安装过 gcc没安装的话 先安装:yum install gcc-c++ 2:安装 redis:redis-3.0.2.tar.gz (1): ...

  9. Generative Model vs Discriminative Model

    In this post, we are going to compare the two types of machine learning models-generative model and ...

  10. 编码规范(code style guide)

    1. Javascript Google: https://google.github.io/styleguide/jsguide.html Airbnb:https://github.com/air ...