转:对比python 链接 neo4j 驱动,py2neo 和 neo4j-driver 和 neo4jrestclient
Comparing Neo4j driver, py2neo and neo4jrestclient with some basic commands using the Panama Papers Data
1. Before we begin
In our last thrilling post, we installed Neo4j and downloaded the Panama Papers Data. Today, before diving into the dirty world of tax evasion, we want to benchmark the performance of 3 Python based modules. Namely Neo4j Python driver, py2neo and neo4jrestclient. If you haven’t done it already, install all of the modules by the following commands.
pip3 install neo4j-driver
pip3 install py2neo
pip3 install neo4jrestclient
Or whatever way you are accustomed to.
2. Loading the database to python
The first step, before doing anything, is to start Neo4j with the Panama papers data. If you forgot how to do this, please refer to our last post or check the “Benchmark.ipynb” in the following repository. It has all the necessary codes to replicate the experiment.
The next step is to load the data so that it is queryable from Python. In py2neo this is done with the following command.
from py2neo import Graph, Node, Relationship
gdb = Graph(user=”neo4j”, password=”YOURPASS")
Similarly in neo4jrestclient.
from neo4jrestclient.client import GraphDatabase
from neo4jrestclient import client
gdb2 = GraphDatabase(“http://localhost:7474", username=”neo4j”, password=”YOURPASS")
Finally in Neo4j Python driver.
from neo4j.v1 import GraphDatabase, basic_auth
driver = GraphDatabase.driver(“bolt://localhost:7687”, auth=basic_auth(“neo4j”, “YOURPASS”))
sess = driver.session()
3. Getting node labels and label-attribute pairs
The first thing we would like to do, when we encounter any new graph database, is to see what node label and relation types are there in the database. So the first thing we would do in our experiment is to get all the distinct node labels and all the associated attributes for each node labels.
In py2neo this is performed with the following code which takes about 100 ms. I am grad to see that py2neo has an built-in object which stores the node label and its attributes.
INPUT CODE py2neo:
# Get Distinct Node Labels
NodeLabel = list(gdb.node_labels)
print(NodeLabel)
# For each node type print attributes
Node = []
Attr = []
for nl in NodeLabel:
for i in gdb.schema.get_indexes(nl):
Node.append(nl)
Attr.append(format(i))
NodeLabelAttribute = pd.DataFrame(
{‘NodeLabel’: Node,’Attribute’: Attr})
NodeLabelAttribute.head(5)
However things get a little bit more nasty with neo4jrestclient and Neo4j Python driver. For neo4jrestclient it does have a way to access the node label but not the attributes. This means that we have to query it from our graph database. Not surprisingly this querying step takes quite a lot of time resulting in about 12sec for neo4jrestclient.
INPUT CODE neo4jrestclient:
# Get Distinct Node Labels
def extract(text):
import re
matches=re.findall(r'\'(.+?)\'',text)
return(",".join(matches))
NodeLabel = [extract(str(x)) for x in list(gdb2.labels)]
print(NodeLabel)
# For each node label print attributes
Node, Attr = ([] for i in range(2))
for nl in NodeLabel:
q = "MATCH (n:" + str(nl) + ")\n" + "RETURN distinct keys(n)"
temp = list(gdb2.query(q))
temp = list(set(sum(sum(temp,[]),[])))
for i in range(len(temp)):
Node.append(nl)
Attr.extend(temp)
NodeLabelAttribute = pd.DataFrame(
{'NodeLabel': Node,'Attribute': Attr})
NodeLabelAttribute.head(5)
For the Neo4j Python driver you have to query the node labels as well resulting in 20 sec.
INPUT CODE Neo4j Python Driver:
q = “””
MATCH (n)
RETURN distinct labels(n)
“””
res = sess.run(q)
NodeLabel = []
for r in res:
temp = r[“labels(n)”]
if temp != “”:
NodeLabel.extend(temp)
NodeLabel = list(filter(None, NodeLabel))
# For each node label print attributes
Node, Attr = ([] for i in range(2))
for nl in NodeLabel:
q = “MATCH (n:” + str(nl) + “)\n” + “RETURN distinct keys(n)”
res = sess.run(q)
temp = []
for r in res:
temp.extend(r[“keys(n)”])
temp2 = list(set(temp))
Attr.extend(temp2)
for i in range(len(temp2)):
Node.append(nl)
NodeLabelAttribute = pd.DataFrame(
{‘NodeLabel’: Node,’Attribute’: Attr})
NodeLabelAttribute.head(5)
4. Relation types and length of each edge list
The next thing we would like to do is make a list of all the relation types in the database and see which relation type has the longest edge list.
In py2neo this could be performed with the following code. This takes about 4min.
# Get Distinct Relation Types
RelaType = sorted(list(gdb.relationship_types))
print("There are " + str(len(RelaType)) + " relations in total")
# Calculate lengh of edge list for each types
res = []
for i in range(len(RelaType)):
#for i in range(10):
q = "MATCH (n)-[:`" + RelaType[i] + "`]-(m)\n" + "RETURN count(n)"
res.append(gdb.data(q)[0]["count(n)"])
RelaType = pd.DataFrame({'RelaType': RelaType[:len(res)],'count(n)': res})
RelaType.head(5)
In neo4jrestclient, the same thing could be implemented by the following command. Note that again, since we do not have a built-in method to get distinct relation types in neo4jrestclient, we have to query it from our graph database first. In total this takes about 4min 21s so it’s slightly slower than py2neo.
INPUT CODE neo4jrestclient:
# Get Distinct Relations
q = “””
START r =rel(*)
RETURN distinct(type(r))
“””
RelaType = sorted(sum(list(gdb2.query(q)),[]))
print(“There are “ + str(len(RelaType)) + “ relations in total”)
res = []
for i in range(len(RelaType)):
q = “MATCH (n)-[:`” + RelaType[i] + “`]-(m)\n” + “RETURN count(n)”
res.append(gdb2.query(q)[0][0])
RelaType = pd.DataFrame({‘RelaType’: RelaType,’count(n)’: res})
RelaType
Things get even more tedious in Neo4j Python driver where we have to query the Relation Types as well. However according to the The following code it takes about 4 min 10 sec so the additional query of getting the list of relation types didn’t seem to hurt much.
INPUT CODE Neo4j Python Driver:
# Get Distinct Relations
q = “””
START r =rel(*)
RETURN distinct(type(r))
“””
RelaType = []
res = sess.run(q)
for r in res:
RelaType.append(r[“(type(r))”])
RelaType = sorted(RelaType)
print(“There are “ + str(len(RelaType)) + “ relations in total”)
res2 = []
for i in range(len(RelaType)):
#for i in range(10):
q = “MATCH (n)-[:`” + RelaType[i] + “`]-(m)\n” + “RETURN count(n)”
res = sess.run(q)
for r in res:
res2.append(r[“count(n)”])
RelaType = pd.DataFrame({‘RelaType’: RelaType[:len(res2)],’count(n)’: res2})
RelaType.head(5)
5. Calculate degree distribution of all nodes
So far so good. My first impression, before ever touching the three modules, was that py2neo is the more updated cool stuff. So it was good to see that py2neo was more user-friendly as well as well-performing. But as the following example shows, there seems to be situation where neo4jrestclient and Neo4j Python driver are much faster than py2neo.
In this experiment we would gather information concerning the degree distribution of all nodes in our graph database. In py2neo this could be performed with the following code. This take about 1min 14s.
INPUT CODE py2neo:
q = """
MATCH (n)-[r]-(m)
RETURN n.node_id,n.name, count(r)
ORDER BY count(r) desc
"""
res = gdb.data(q)
NodeDegree = pd.DataFrame(res)
NodeDegree.head(5)
OUTPUT
count(r) n.name n.node_id
0 37338 None 236724
1 36374 Portcullis TrustNet (BVI) Limited 54662
2 14902 MOSSACK FONSECA & CO. (BAHAMAS) LIMITED 23000136
3 9719 UBS TRUSTEES (BAHAMAS) LTD. 23000147
4 8302 CREDIT SUISSE TRUST LIMITED 23000330
In neo4jrestclient the same thing could be performed with the following code. Now this takes about 18 sec which is about 4 times faster than py2neo!
INPUT CODE neo4jrestclient:
q = """
MATCH (n)-[r]-(m)
RETURN n.node_id, n.name, count(r)
ORDER BY count(r) desc
"""
res = list(gdb2.query(q))
NodeDegree = pd.DataFrame(res)
NodeDegree.columns = ["n.node_id","n.name","count(r)"]
NodeDegree.head(5)
Same results holds for Neo4j Python driver which take about 25 sec.
INPUT CODE Neo4j Python Driver:
Match = “MATCH (n)-[r]-(m)\n”
Ret = [“n.node_id”,”n.name”,”count(r)”]
Opt = “ORDER BY count(r) desc”
q = Match + “RETURN “ + ‘, ‘.join(Ret) + “\n” + Opt
res = sess.run(q)
res2 = []
for r in res:
#for r in islice(res,5):
res2.append([r[x] for x in range(len(Ret))])
NodeDegree = pd.DataFrame(res2)
NodeDegree.columns = Ret
NodeDegree.head(5)
6. Conclusion
At the moment I am not sure where the difference comes from. Besides some cases where there is a built-in object which preserves some basic information, we are using exactly the same query and I think there shouldn’t be much difference in it.
For the positive side, as this post shows there aren’t much difference in the coding style among the three modules. After all we are using the same query language (i.e. Cypher) to send orders to Neo4j and it is not a pain in the ass to switch from one module to another.
My recommendation? Definitely py2no is not an option. Although it is user-friendly in many respects, it is too slow for counting queries. Neo4jrestclient is not bad, but sometimes it returns nested list structure which we have to deal with using some trick (e.g. “sum(temp,[])” which I want to avoid. So I think I would go with the Neo4j Python driver. After all it is the only official release supported by Neo4j. What is your recommendation?
转:对比python 链接 neo4j 驱动,py2neo 和 neo4j-driver 和 neo4jrestclient的更多相关文章
- 基于Spark环境对比Python和Scala语言利弊
在数据挖掘中,Python和Scala语言都是极受欢迎的,本文总结两种语言在Spark环境各自特点. 本文翻译自 https://www.dezyre.com/article/Scala-vs-Py ...
- 实现Redis Cluster并实现Python链接集群
目录 一.Redis Cluster简单介绍 二.背景 三.环境准备 3.1 主机环境 3.2 主机规划 四.部署Redis 4.1 安装Redis软件 4.2 编辑Redis配置文件 4.3 启动R ...
- python链接oracle数据库以及数据库的增删改查实例
初次使用python链接oracle,所以想记录下我遇到的问题,便于向我这样初次尝试的朋友能够快速的配置好环境进入开发环节. 1.首先,python链接oracle数据库需要配置好环境. 我的相关环境 ...
- Python学习第二十六课——PyMySql(python 链接数据库)
Python 链接数据库: 需要先安装pymysql 包 可以设置中安装,也可以pip install pymysql 安装 加载驱动: import pymysql # 需要先安装pymysql 包 ...
- python学习道路(day12note)(mysql操作,python链接mysql,redis)
1,针对mysql操作 SET PASSWORD FOR 'root'@'localhost' = PASSWORD('newpass'); 设置密码 update user set password ...
- python链接MySQLdb报错:2003
使用python链接Mysql数据库操作,遇到问题! 问题如图所示: 解决方法:将"localhost"改为"127.0.0.1" db=MySQLdb.con ...
- python链接mysql
1.安装MySQLdb MySQLdb 是用于Python链接Mysql数据库的接口,它实现了 Python 数据库 API 规范 V2.0,基于 MySQL C API 上建立的. 下载地址: ht ...
- Python来袭,教你用Neo4j构建“复联4”人物关系图谱!
来源商业新知网,原标题:Python来袭,教你用Neo4j构建“复联4”人物关系图谱!没有剧透! 复仇者联盟 之绝对不剧透 漫威英雄们为了不让自己剧透也是使出了浑身解数.在洛杉矶全球首映礼上记者费尽心 ...
- gcc 找不到 boot python 链接库的问题: /usr/bin/ld: cannot find -lboost_python
问题: Ubuntu 14.04,gcc 4.8.4,以默认方式编译 boost 1.67 后,使用 Boost.Python 时,gcc 提示找不到 boost python 链接库. 方案: 查看 ...
随机推荐
- 剑指Offer丑数问题
这是剑指第一次卡死我的题……记录一下 首先看题目: 把只包含质因子2.3和5的数称作丑数(Ugly Number).例如6.8都是丑数,但14不是,因为它包含质因子7. 习惯上我们把1当做是第一个丑数 ...
- JavaEE 7 新特性之WebSocket
开发环境: JDK:1.7及以上 JavaEE:1.7,因为只有javaee7才有websocke的api,也可以使用1.6单都导入websocket-api.jar试试(本人不清楚) 注意:没有使用 ...
- Maven的学习资料收集--(七) 构建Spring项目
在这里,使用Maven构建一个Spring项目 构建单独项目的话,其实都差不多 1. 新建一个Web项目 参考之前的博客 2.修改 pom.xml,添加Spring依赖 <project xml ...
- SpringBoot | 第五章:多环境配置
前言 写上一篇看英文资料,耗费了心力呀,这章,相对来说简单点.也比较熟悉,但是这很实用.不扯了,开始~ 多环境配置 maven的多环境配置 springboot多环境配置 总结 老生常谈 多环境配置 ...
- Spring cloud Eureka 服务治理(注册服务提供者)
搭建完成服务注册中心,下一步可以创建服务提供者并向注册中心注册服务. 接下来我们创建Spring Boot 应用将其加入Eureka服务治理体系中去. 直接使用签名章节创建hello服务项目改造: 1 ...
- BZOJ4355: Play with sequence(吉司机线段树)
题意 题目链接 Sol 传说中的吉司机线段树??感觉和BZOJ冒险那题差不多,就是强行剪枝... 这题最坑的地方在于对于操作1,$C >= 0$, 操作2中需要对0取max,$a[i] > ...
- ios微信浏览器音乐自动播放
setTimeout(function(){ //一般情况下,这样就可以自动播放了,但是一些奇葩iPhone机不可以 document.getElementById('bgmedia').play() ...
- 使用CMake生成VS2010项目查看OpenCV源代码
近期项目需要用到OpenCV中的几个函数,但其函数无法全部实现自己需要的功能,故而需要改进部分函数,为安全及效率起见,想参考OpenCV的源码来改进,这样节省时间的同时亦可提供代码的鲁棒性和通用性.那 ...
- python+selenium之自动生成excle,保存到指定的目录下
进行之自动化测试,想把自动生成的excle保存到指定的目录下.网上百度的代码如下: import xlwt import time time = time.strftime ('%Y%m%d%H%M% ...
- .NET中异常类(Exception)
异常:程序在运行期间发生的错误.异常对象就是封装这些错误的对象. try{}catch{}是非常重要的,捕获try程序块中所有发生的异常,如果没有捕获异常的话,程序运行的线程将会挂掉,更严重的是这些错 ...