Spark GraphX图处理编程实例

所构建的图如下：

Scala程序代码如下：

import org.apache.spark._

import org.apache.spark.graphx._

// To make some of the examples work we will also need RDD

import org.apache.spark.rdd.RDD

object Test {

  def main(args: Array[String]): Unit = {

        // 初始化SparkContext

        val sc: SparkContext = new SparkContext("local[2]", "Spark Graphx");

        // 创造一个点的RDD

        val users: RDD[(VertexId, (String, String))] =

        sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")),

                (5L, ("franklin", "prof")), (2L, ("istoica", "prof"))))

        // 创造一个边的RDD，包含各种关系

        val relationships: RDD[Edge[String]] =

        sc.parallelize(Array(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"),

                Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi")))

        // 定义一个缺省的用户，其主要作用就在于当描述一种关系中不存在的目标顶点时就会使用这个缺省的用户

        val defaultUser = ("John Doe", "Missing")

        // 构造图

        val graph = Graph(users, relationships, defaultUser)

        // 输出Graph的信息

        graph.vertices.collect().foreach(println(_))

        graph.triplets.map(triplet => triplet.srcAttr + "----->" + triplet.dstAttr + "    attr:" + triplet.attr)

            .collect().foreach(println(_))

        // 统计所有用户当中postdoc的数量

      val cnt1 = graph.vertices.filter { case (id, (name, pos)) => pos == "postdoc" }.count

      System.out.println("所有用户当中postdoc的数量为："+cnt1);

      // 统计所有源顶点大于目标顶点src > dst的边的数量

      val cnt2 = graph.edges.filter(e => e.srcId > e.dstId).count

      System.out.println("所有源顶点大于目标顶点 src > dst的边的数量为："+cnt2);

      // 统计图各个顶点的入度

      val inDegrees: VertexRDD[Int] = graph.inDegrees

      inDegrees.collect().foreach(println(_))

    }

}

相关内置的图操作方法有：

/** Summary of the functionality in the property graph */

class Graph[VD, ED] {

  // Information about the Graph ===================================================================

  val numEdges: Long

  val numVertices: Long

  val inDegrees: VertexRDD[Int]

  val outDegrees: VertexRDD[Int]

  val degrees: VertexRDD[Int]

  // Views of the graph as collections =============================================================

  val vertices: VertexRDD[VD]

  val edges: EdgeRDD[ED]

  val triplets: RDD[EdgeTriplet[VD, ED]]

  // Functions for caching graphs ==================================================================

  def persist(newLevel: StorageLevel = StorageLevel.MEMORY_ONLY): Graph[VD, ED]

  def cache(): Graph[VD, ED]

  def unpersistVertices(blocking: Boolean = true): Graph[VD, ED]

  // Change the partitioning heuristic  ============================================================

  def partitionBy(partitionStrategy: PartitionStrategy): Graph[VD, ED]

  // Transform vertex and edge attributes ==========================================================

  def mapVertices[VD2](map: (VertexID, VD) => VD2): Graph[VD2, ED]

  def mapEdges[ED2](map: Edge[ED] => ED2): Graph[VD, ED2]

  def mapEdges[ED2](map: (PartitionID, Iterator[Edge[ED]]) => Iterator[ED2]): Graph[VD, ED2]

  def mapTriplets[ED2](map: EdgeTriplet[VD, ED] => ED2): Graph[VD, ED2]

  def mapTriplets[ED2](map: (PartitionID, Iterator[EdgeTriplet[VD, ED]]) => Iterator[ED2])

    : Graph[VD, ED2]

  // Modify the graph structure ====================================================================

  def reverse: Graph[VD, ED]

  def subgraph(

      epred: EdgeTriplet[VD,ED] => Boolean = (x => true),

      vpred: (VertexID, VD) => Boolean = ((v, d) => true))

    : Graph[VD, ED]

  def mask[VD2, ED2](other: Graph[VD2, ED2]): Graph[VD, ED]

  def groupEdges(merge: (ED, ED) => ED): Graph[VD, ED]

  // Join RDDs with the graph ======================================================================

  def joinVertices[U](table: RDD[(VertexID, U)])(mapFunc: (VertexID, VD, U) => VD): Graph[VD, ED]

  def outerJoinVertices[U, VD2](other: RDD[(VertexID, U)])

      (mapFunc: (VertexID, VD, Option[U]) => VD2)

    : Graph[VD2, ED]

  // Aggregate information about adjacent triplets =================================================

  def collectNeighborIds(edgeDirection: EdgeDirection): VertexRDD[Array[VertexID]]

  def collectNeighbors(edgeDirection: EdgeDirection): VertexRDD[Array[(VertexID, VD)]]

  def aggregateMessages[Msg: ClassTag](

      sendMsg: EdgeContext[VD, ED, Msg] => Unit,

      mergeMsg: (Msg, Msg) => Msg,

      tripletFields: TripletFields = TripletFields.All)

    : VertexRDD[A]

  // Iterative graph-parallel computation ==========================================================

  def pregel[A](initialMsg: A, maxIterations: Int, activeDirection: EdgeDirection)(

      vprog: (VertexID, VD, A) => VD,

      sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexID,A)],

      mergeMsg: (A, A) => A)

    : Graph[VD, ED]

  // Basic graph algorithms ========================================================================

  def pageRank(tol: Double, resetProb: Double = 0.15): Graph[Double, Double]

  def connectedComponents(): Graph[VertexID, ED]

  def triangleCount(): Graph[Int, ED]

  def stronglyConnectedComponents(numIter: Int): Graph[VertexID, ED]

}

参考链接：

http://spark.apache.org/docs/latest/graphx-programming-guide.html

Spark GraphX图处理编程实例的更多相关文章

Spark GraphX图计算核心源码分析【图构建器、顶点、边】
一.图构建器 GraphX提供了几种从RDD或磁盘上的顶点和边的集合构建图形的方法.默认情况下,没有图构建器会重新划分图的边:相反,边保留在默认分区中.Graph.groupEdges要求对图进行重新 ...
Spark GraphX图计算核心算子实战【AggreagteMessage】
一.简介参考博客:https://www.cnblogs.com/yszd/p/10186556.html 二.代码实现 package graphx import org.apache.log4j ...
Spark GraphX图计算简单案例【代码实现，源码分析】
一.简介参考:https://www.cnblogs.com/yszd/p/10186556.html 二.代码实现 package big.data.analyse.graphx import o ...
Spark GraphX实例(1)
Spark GraphX是一个分布式的图处理框架.社交网络中,用户与用户之间会存在错综复杂的联系,如微信.QQ.微博的用户之间的好友.关注等关系,构成了一张巨大的图,单机无法处理,只能使用分布式图处理 ...
Spark + GraphX + Pregel
Spark+GraphX图 Q:什么是图?图的应用场景 A:图是由顶点集合(vertex)及顶点间的关系集合(边edge)组成的一种网状数据结构,表示为二元组:Gragh=(V,E),V\E分别是顶点 ...
Spark GraphX企业运用
========== Spark GraphX 概述 ==========1.Spark GraphX是什么? (1)Spark GraphX 是 Spark 的一个模块,主要用于进行以图为核心的计 ...
Spark—GraphX编程指南
Spark系列面试题 Spark面试题(一) Spark面试题(二) Spark面试题(三) Spark面试题(四) Spark面试题(五)--数据倾斜调优 Spark面试题(六)--Spark资源调 ...
明风：分布式图计算的平台Spark GraphX 在淘宝的实践
快刀初试:Spark GraphX在淘宝的实践作者:明风 (本文由团队中梧苇和我一起撰写,并由团队中的林岳,岩岫,世仪等多人Review,发表于程序员的8月刊,由于篇幅原因,略作删减,本文为完整版) ...
Spark Graphx编程指南
问题导读1.GraphX提供了几种方式从RDD或者磁盘上的顶点和边集合构造图?2.PageRank算法在图中发挥什么作用?3.三角形计数算法的作用是什么?Spark中文手册-编程指南Spark之一个快 ...

随机推荐

autoit v3安装
thinkphp中如何是实现多表查询
多表查询经常使用到,但如何在thinkphp中实现多表查询呢,其实有三种方法. 1 2 3 4 5 6 7 8 9 10 11 12 // 1.原生查询示例: $Model = new Model() ...
poj-2528线段树练习
title: poj-2528线段树练习 date: 2018-10-13 13:45:09 tags: acm 刷题 categories: ACM-线段树概述这道题坑了我好久啊啊啊啊,,,, ...
图形文件元数据管理工具exiv2
图形文件元数据管理工具exiv2 图形文件通常都包含多种元数据,如Exif.IPTC.XMP.这些信息往往是渗透人员收集的目标.为了便于管理这些信息,Kali Linux内置了专用工具exiv2. ...
Revit二次开发示例：DeleteDimensions
在本例中,创建一个命令,实现删除所选中的尺寸标注. #region Namespaces using System; using System.Collections.Generic; using S ...
机器学习之路：python k均值聚类 KMeans 手写数字
python3 学习使用api 使用了网上的数据集,我把他下载到了本地可以到我的git中下载数据集: https://github.com/linyi0604/MachineLearning 代码: ...
基于ONVIF协议的摄像头开发总结
<什么是ONVIF协议> 2008年5月,由安讯士(AXIS)联合博世(BOSCH)及索尼(SONY)公司三方宣布携手共同成立一个国际开放型网络视频产品标准网络接口开发论坛,取名为 ...
为什么Android手机总是越用越慢？
根据第三方的调研数据显示,有77%的Android手机用户承认自己曾遭遇过手机变慢的影响,百度搜索“Android+卡慢”,也有超过460万条结果.在业内,Android手机一直有着“越用越慢”的口碑 ...
hdu 2819 记录路径的二分匹配
题目大意就是给出一个矩阵,每个格子里面要么是0, 要么是1:是否能够经过交换(交换行或者列)使得主对角线上都是1. 其实就行和列的匹配,左边是行,右边是列,然后如果行列交点是1,那么就可以匹配,看是否 ...
HDU 3282 Running Median 动态中位数，可惜数据范围太小
Running Median Time Limit: 1 Sec Memory Limit: 256 MB 题目连接 http://acm.hdu.edu.cn/showproblem.php?pi ...

Spark GraphX图处理编程实例

所构建的图如下：

Scala程序代码如下：

相关内置的图操作方法有：

参考链接：

http://spark.apache.org/docs/latest/graphx-programming-guide.html

Spark GraphX图处理编程实例的更多相关文章

随机推荐

热门专题