1、数据

pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
pc,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,hp.com
camera,bestbuy.com
camera,bestbuy.com
camera,bestbuy.com
camera,bestbuy.com
camera,bestbuy.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,hp.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
digital camera,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
tv,bestbuy.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,teleflora.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com
flower,orchids.com

2、simrank 的python实现

import numpy as np
from numpy import matrix with open('sample1 (1).txt','r') as log_fp:
logs = [log.strip() for log in log_fp.readlines()]
# print(logs)
logs_tuple = [tuple(log.split(",")) for log in logs]
# print (logs_tuple) queries = list(set([log[0] for log in logs_tuple]))
# print(queries) #['digital camera', 'flower', 'pc', 'camera', 'tv']
ads = list(set([log[1] for log in logs_tuple]))
# print(ads)#['hp.com', 'teleflora.com', 'bestbuy.com', 'orchids.com'] graph = np.matrix(np.zeros([len(queries),len(ads)]))
# print(graph) #6行4列的0矩阵 for log in logs_tuple:
query = log[0]
ad = log[1]
q_i = queries.index(query)
a_j = ads.index(ad)
graph[q_i,a_j] +=1
print(graph) query_sim = matrix(np.identity(len(queries)))
print(query_sim)
ad_sim = matrix(np.identity(len(ads)))
print(ad_sim) def get_ads_num(query):
q_i = queries.index(query)
return graph[q_i] def get_queries_num(ad):
a_j = ads.index(ad)
return graph.transpose()[a_j] def get_ads(query):
series = get_ads_num(query).tolist()[0]
return [ads[x] for x in range(len(series)) if series[x] > 0] def get_queries(ad):
series = get_queries_num(ad).tolist()[0]
return [queries[x] for x in range(len(series)) if series[x] > 0] def query_simrank(q1,q2,c):
if q1 == q2 :
return 1
prefix = c/(get_ads_num(q1).sum() *get_ads_num(q2).sum())
postfix = 0
for ad_i in get_ads(q1):
for ad_j in get_ads(q2):
i = ads.index(ad_i)
j = ads.index(ad_j)
postfix += ad_sim[i,j]
return prefix*postfix def ad_simrank(a1,a2,c):
if a1 == a2 :
return 1
prefix = c/(get_queries_num(a1).sum()*get_queries_num(a2).sum())
postfix = 0
for query_i in get_queries(a1):
for query_j in get_queries(a2):
i = queries.index(query_i)
j = queries.index(query_j)
postfix += query_sim[i,j]
return prefix*postfix def simrank(c=0.8,times = 1):
global query_sim,ad_sim for run in range(times):
new_query_sim = matrix(np.identity(len(queries)))
for qi in queries:
for qj in queries:
i = queries.index(qi)
j = queries.index(qj)
new_query_sim[i,j] =query_simrank(qi,qj,c) new_ad_sim = matrix(np.identity(len(ads)))
for ai in ads:
for aj in ads :
i = ads.index(ai)
j = ads.index(aj)
new_ad_sim[i,j] =ad_simrank(ai,aj,c) query_sim = new_query_sim
ad_sim = new_ad_sim if __name__ == '__main__':
print (queries)
print(ads)
simrank()
print(query_sim)
print(ad_sim)
[[15.  0.  0.  0.]
[ 0. 0. 10. 0.]
[ 5. 0. 20. 0.]
[ 7. 0. 30. 0.]
[ 0. 16. 0. 15.]]
[[1. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 0. 1.]]
[[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]
['tv', 'pc', 'camera', 'digital camera', 'flower']
['bestbuy.com', 'teleflora.com', 'hp.com', 'orchids.com']
[[1. 0. 0.00213333 0.00144144 0. ]
[0. 1. 0.0032 0.00216216 0. ]
[0.00213333 0.0032 1. 0.00172973 0. ]
[0.00144144 0.00216216 0.00172973 1. 0. ]
[0. 0. 0. 0. 1. ]]
[[1.00000000e+00 0.00000000e+00 9.87654321e-04 0.00000000e+00]
[0.00000000e+00 1.00000000e+00 0.00000000e+00 3.33333333e-03]
[9.87654321e-04 0.00000000e+00 1.00000000e+00 0.00000000e+00]
[0.00000000e+00 3.33333333e-03 0.00000000e+00 1.00000000e+00]]

simrank python实现的更多相关文章

  1. Python中的多进程与多线程(一)

    一.背景 最近在Azkaban的测试工作中,需要在测试环境下模拟线上的调度场景进行稳定性测试.故而重操python旧业,通过python编写脚本来构造类似线上的调度场景.在脚本编写过程中,碰到这样一个 ...

  2. Python高手之路【六】python基础之字符串格式化

    Python的字符串格式化有两种方式: 百分号方式.format方式 百分号的方式相对来说比较老,而format方式则是比较先进的方式,企图替换古老的方式,目前两者并存.[PEP-3101] This ...

  3. Python 小而美的函数

    python提供了一些有趣且实用的函数,如any all zip,这些函数能够大幅简化我们得代码,可以更优雅的处理可迭代的对象,同时使用的时候也得注意一些情况   any any(iterable) ...

  4. JavaScript之父Brendan Eich,Clojure 创建者Rich Hickey,Python创建者Van Rossum等编程大牛对程序员的职业建议

    软件开发是现时很火的职业.据美国劳动局发布的一项统计数据显示,从2014年至2024年,美国就业市场对开发人员的需求量将增长17%,而这个增长率比起所有职业的平均需求量高出了7%.很多人年轻人会选择编 ...

  5. 可爱的豆子——使用Beans思想让Python代码更易维护

    title: 可爱的豆子--使用Beans思想让Python代码更易维护 toc: false comments: true date: 2016-06-19 21:43:33 tags: [Pyth ...

  6. 使用Python保存屏幕截图(不使用PIL)

    起因 在极客学院讲授<使用Python编写远程控制程序>的课程中,涉及到查看被控制电脑屏幕截图的功能. 如果使用PIL,这个需求只需要三行代码: from PIL import Image ...

  7. Python编码记录

    字节流和字符串 当使用Python定义一个字符串时,实际会存储一个字节串: "abc"--[97][98][99] python2.x默认会把所有的字符串当做ASCII码来对待,但 ...

  8. Apache执行Python脚本

    由于经常需要到服务器上执行些命令,有些命令懒得敲,就准备写点脚本直接浏览器调用就好了,比如这样: 因为线上有现成的Apache,就直接放它里面了,当然访问安全要设置,我似乎别的随笔里写了安全问题,这里 ...

  9. python开发编译器

    引言 最近刚刚用python写完了一个解析protobuf文件的简单编译器,深感ply实现词法分析和语法分析的简洁方便.乘着余热未过,头脑清醒,记下一点总结和心得,方便各位pythoner参考使用. ...

随机推荐

  1. 解决Acunetix 12中文汉化的方法

    最近下载一款测试软件acunetix,苦于满屏英文的苦恼,看不懂,于是乎就问度娘,结果度娘就是给中文破解包: 我是12版的,网上提供的都是11版的,没法用.怎么办呢?还好我是做测试的,平时做兼容性测试 ...

  2. apache cgi 程序: End of script output before headers

    测试linux Apache cgi程序: #include <stdio.h> int main(){ printf("abc"); ; } 目录:/var/www/ ...

  3. Vultr CentOS下后台跑node

    在Mac或者Windows下简直易如反掌.几行命令搞定的事情,但因为使用的是远程SSH连接纯命令行处理,所以需要记录下来怎么弄. 比如, 1. 怎么在什么都没有的CentOS里下载Node安装包? 2 ...

  4. 阶段1 语言基础+高级_1-3-Java语言高级_04-集合_08 Map集合_1_Map集合概述

    map集合是双列集合 map有两个泛型.左边K也叫作键 右边V是value

  5. AUTOGUI生成的一个简易文本编辑器

    ; Generated by AutoGUI #SingleInstance Force #NoEnv SetWorkingDir %A_ScriptDir% SetBatchLines - #Inc ...

  6. python 递归,深度优先搜索与广度优先搜索算法模拟实现

    一.递归原理小案例分析 (1)# 概述 递归:即一个函数调用了自身,即实现了递归 凡是循环能做到的事,递归一般都能做到! (2)# 写递归的过程 1.写出临界条件 2.找出这一次和上一次关系 3.假设 ...

  7. Fiddler抓百度请求

    fiddler是一个很好的抓包工具,默认是抓http请求的,对于pc上的https请求,会提示网页不安全,这时候需要在浏览器上安装证书. 一.网页不安全 1.用fiddler抓包时候,打开百度网页:h ...

  8. 剑指offer--day01

    1.1题目:二维数组中的查找:在一个二维数组中(每个一维数组的长度相同),每一行都按照从左到右递增的顺序排序,每一列都按照从上到下递增的顺序排序.请完成一个函数,输入这样的一个二维数组和一个整数,判断 ...

  9. New start-开始我的学习记录吧

    不知道从何说起,就从眼下的感想开始吧. 转行是一件不容易的事情! 今天是来北京学习Java的第41天.小测验了两次,一次51分,一次54分. 下午有学长过来分享了他的成长经历,感触很多.不是灌鸡汤,也 ...

  10. C# TCPListener

    1: 有两个地方必须做异常处理,异常类型为IOException 服务器读客户端发来的信息时: LeafTCPClient client = (LeafTCPClient)ar.AsyncState; ...