TinkerPop中的遍历：图的遍历策略

遍历策略

一个TraversalStrategy分析一个遍历，如果遍历符合它的标准，可以相应地改变它。遍历策略在编译时被执行，并构成Gremlin遍历机的编译器的基础。有五类策略分列如下：

decoration: 在应用程序级别的特性可以嵌入到遍历逻辑中
optimization: 在TinkerPop3级别有更高效的方式来表达遍历
provider optimization: 在图的系统/语言/驱动程序级别上有一种更有效的方式来表示遍历
finalization: 执行遍历之前需要进行一些最终的调整/清理/分析
verification: 某些遍历对于应用程序或遍历引擎是不合法的

Note

explain()步骤向用户显示每个注册策略如何改变遍历。

如：gremlin> g.V().has('name','marko').explain()

元素ID策略

ElementIdStrategy提供对元素标识符的控制。一些Graph实现（如TinkerGraph）允许在创建元素时指定自定义标识符：

gremlin> g = TinkerGraph.open().traversal()

==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]

gremlin> v = g.addV().property(id,'42a').next()

==>v[42a]

gremlin> g.V('42a')

==>v[42a]

来源： http://tinkerpop.apache.org/docs/3.2.6/reference/#traversalstrategy

其他Graph实现（如Neo4j）会自动生成元素标识符，并且不能分配。作为一个帮手，可以使用ElementIdStrategy通过使用顶点和边索引来使标识符赋值成为可能。

如：

gremlin> graph = Neo4jGraph.open('/tmp/neo4j')

==>neo4jgraph[Community [/tmp/neo4j]]

gremlin> strategy = ElementIdStrategy.build().create()

==>ElementIdStrategy

gremlin> g = graph.traversal().withStrategies(strategy)

==>graphtraversalsource[neo4jgraph[Community [/tmp/neo4j]], standard]

gremlin> g.addV().property(id, '42a').id()

==>42a

Note

用于存储分配的标识符的key应该在底层图形数据库中建立索引。如果没有建立索引，那么查找使用这些标识符的元素将执行线性扫描。

事件策略

EventStrategy的目的是在遍历内发生对底层Graph的更改时，将事件引发到一个或多个MutationListener对象。这种策略对记录更改，触发基于更改的某些操作或在遍历期间需要通知某些变异操作的任何应用程序非常有用。如果事务回滚，则重置事件队列。

以下事件引发MutationListener：

New vertex

New edge

Vertex property changed

Edge property changed

Vertex property removed

Edge property removed

Vertex removed

Edge removed

要开始处理来自Traversal的事件，首先要实现MutationListener接口。此实现的一个示例是ConsoleMutationListener，它将输出写入每个事件的控制台。示例如下：

gremlin> graph = TinkerFactory.createModern()

==>tinkergraph[vertices:6 edges:6]

gremlin> l = new ConsoleMutationListener(graph)

==>MutationListener[tinkergraph[vertices:6 edges:6]]

gremlin> strategy = EventStrategy.build().addListener(l).create()

==>EventStrategy

gremlin> g = graph.traversal().withStrategies(strategy)

==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]

gremlin> g.addV().property('name','stephen')

Vertex [v[13]] added to graph [tinkergraph[vertices:7 edges:6]]

==>v[13]

gremlin> g.E().drop()

Edge [e[7][1-knows->2]] removed from graph [tinkergraph[vertices:7 edges:6]]

Edge [e[8][1-knows->4]] removed from graph [tinkergraph[vertices:7 edges:5]]

Edge [e[9][1-created->3]] removed from graph [tinkergraph[vertices:7 edges:4]]

Edge [e[10][4-created->5]] removed from graph [tinkergraph[vertices:7 edges:3]]

Edge [e[11][4-created->3]] removed from graph [tinkergraph[vertices:7 edges:2]]

Edge [e[12][6-created->3]] removed from graph [tinkergraph[vertices:7 edges:1]]

Note

EventStrategy并不意味着用于跟踪不同进程间的全局变化。换句话说，一个JVM进程中的突变不会作为不同JVM进程中的事件引发。

分区策略

PartitionStrategy将图的顶点和边分割成String命名的分区（如桶，子图等）。

PartitionStrategy中有三种主要配置：

分区键（Partition Key） - 以字符串值的属性key来表示的分区。
写分区（Write Partition） - 一个字符串，表示将来所有未来写入元素的分区。
读分区（Read Partitions） - 一个字符串集合Set<String>表示可以读取的分区。

使用分区策略的一个例子：

gremlin> graph = TinkerFactory.createModern()

==>tinkergraph[vertices:6 edges:6]

gremlin> strategyA = PartitionStrategy.build().partitionKey("_partition").writePartition("a").readPartitions("a").create()

==>PartitionStrategy

gremlin> strategyB = PartitionStrategy.build().partitionKey("_partition").writePartition("b").readPartitions("b").create()

==>PartitionStrategy

gremlin> gA = graph.traversal().withStrategies(strategyA)

==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]

gremlin> gA.addV() // this vertex has a property of {_partition:"a"}

==>v[13]

gremlin> gB = graph.traversal().withStrategies(strategyB)

==>graphtraversalsource[tinkergraph[vertices:7 edges:6], standard]

gremlin> gB.addV() // this vertex has a property of {_partition:"b"}

==>v[15]

gremlin> gA.V()

==>v[13]

gremlin> gB.V()

==>v[15]

通过将元素写入特定分区，然后限制读取分区，开发人员可以在单个地址空间内创建多个图形。此外，通过支持分区之间的引用，可以合并这些多个图（即连接分区）。

Note

如果Graph可以支持元属性，并且在构建PartitionStrategy时将includeMetaProperties值设置为true，则分区也可能扩展到VertexProperty元素。

只读策略

ReadOnlyStrategy 如其名称所示，如果Traversal内有任何改变的步骤，则应用此策略的遍历将抛出IllegalStateException。

子图策略

SubgraphStrategy类似于PartitionStrategy，因为它限制了某些顶点，边和顶点属性的遍历。

下例使用相同的查询对是否使用子图策略的两种情景进行查询，其中子图策略为：创建一个SubgraphStrategy，其中顶点属性不能有一个endTime属性。

gremlin> graph = TinkerFactory.createTheCrew()

==>tinkergraph[vertices:6 edges:14]

gremlin> g = graph.traversal()

==>graphtraversalsource[tinkergraph[vertices:6 edges:14], standard]

gremlin> g.V().as('a').values('location').as('b'). //1\

           select('a','b').by('name').by()

==>[a:marko,b:san diego]

==>[a:marko,b:santa cruz]

==>[a:marko,b:brussels]

==>[a:marko,b:santa fe]

==>[a:stephen,b:centreville]

==>[a:stephen,b:dulles]

==>[a:stephen,b:purcellville]

==>[a:matthias,b:bremen]

==>[a:matthias,b:baltimore]

==>[a:matthias,b:oakland]

==>[a:matthias,b:seattle]

==>[a:daniel,b:spremberg]

==>[a:daniel,b:kaiserslautern]

==>[a:daniel,b:aachen]

gremlin> g = g.withStrategies(SubgraphStrategy.build().vertexProperties(hasNot('endTime')).create()) //2\

==>graphtraversalsource[tinkergraph[vertices:6 edges:14], standard]

gremlin> g.V().as('a').values('location').as('b'). //3\

           select('a','b').by('name').by()

==>[a:marko,b:santa fe]

==>[a:stephen,b:purcellville]

==>[a:matthias,b:seattle]

==>[a:daniel,b:aachen]

来源： http://tinkerpop.apache.org/docs/3.2.6/reference/#_subgraphstrategy

下面的示例使用所有三个过滤器：vertex，edge和vertex property。Vertices必须居住(location属性)在三个以上的地方或者没有居住信息，Edges必须标注为“develops”，VertexProperties必须是当前位置或没有位置（location）属性。

gremlin> graph = TinkerFactory.createTheCrew()

==>tinkergraph[vertices:6 edges:14]

gremlin> g = graph.traversal().withStrategies(SubgraphStrategy.build().

           vertices(or(hasNot('location'),properties('location').count().is(gt(3)))).

           edges(hasLabel('develops')).

           vertexProperties(or(hasLabel(neq('location')),hasNot('endTime'))).create())

==>graphtraversalsource[tinkergraph[vertices:6 edges:14], standard]

gremlin> g.V().valueMap(true)

==>[name:[marko],label:person,location:[santa fe],id:1]

==>[name:[matthias],label:person,location:[seattle],id:8]

==>[name:[gremlin],label:software,id:10]

==>[name:[tinkergraph],label:software,id:11]

gremlin> g.E().valueMap(true)

==>[label:develops,id:13,since:2009]

==>[label:develops,id:14,since:2010]

==>[label:develops,id:21,since:2012]

gremlin> g.V().outE().inV().path().by('name').by(label).by('name')

==>[marko,develops,gremlin]

==>[marko,develops,tinkergraph]

==>[matthias,develops,gremlin]

gremlin>