Spark GraphX图处理编程实例

所构建的图如下：

Scala程序代码如下：

import org.apache.spark._

import org.apache.spark.graphx._

// To make some of the examples work we will also need RDD

import org.apache.spark.rdd.RDD

object Test {

  def main(args: Array[String]): Unit = {

        // 初始化SparkContext

        val sc: SparkContext = new SparkContext("local[2]", "Spark Graphx");

        // 创造一个点的RDD

        val users: RDD[(VertexId, (String, String))] =

        sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")),

                (5L, ("franklin", "prof")), (2L, ("istoica", "prof"))))

        // 创造一个边的RDD，包含各种关系

        val relationships: RDD[Edge[String]] =

        sc.parallelize(Array(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"),

                Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi")))

        // 定义一个缺省的用户，其主要作用就在于当描述一种关系中不存在的目标顶点时就会使用这个缺省的用户

        val defaultUser = ("John Doe", "Missing")

        // 构造图

        val graph = Graph(users, relationships, defaultUser)

        // 输出Graph的信息

        graph.vertices.collect().foreach(println(_))

        graph.triplets.map(triplet => triplet.srcAttr + "----->" + triplet.dstAttr + "    attr:" + triplet.attr)

            .collect().foreach(println(_))

        // 统计所有用户当中postdoc的数量

      val cnt1 = graph.vertices.filter { case (id, (name, pos)) => pos == "postdoc" }.count

      System.out.println("所有用户当中postdoc的数量为："+cnt1);

      // 统计所有源顶点大于目标顶点src > dst的边的数量

      val cnt2 = graph.edges.filter(e => e.srcId > e.dstId).count

      System.out.println("所有源顶点大于目标顶点 src > dst的边的数量为："+cnt2);

      // 统计图各个顶点的入度

      val inDegrees: VertexRDD[Int] = graph.inDegrees

      inDegrees.collect().foreach(println(_))

    }

}

相关内置的图操作方法有：

/** Summary of the functionality in the property graph */

class Graph[VD, ED] {

  // Information about the Graph ===================================================================

  val numEdges: Long

  val numVertices: Long

  val inDegrees: VertexRDD[Int]

  val outDegrees: VertexRDD[Int]

  val degrees: VertexRDD[Int]

  // Views of the graph as collections =============================================================

  val vertices: VertexRDD[VD]

  val edges: EdgeRDD[ED]

  val triplets: RDD[EdgeTriplet[VD, ED]]

  // Functions for caching graphs ==================================================================

  def persist(newLevel: StorageLevel = StorageLevel.MEMORY_ONLY): Graph[VD, ED]

  def cache(): Graph[VD, ED]

  def unpersistVertices(blocking: Boolean = true): Graph[VD, ED]

  // Change the partitioning heuristic  ============================================================

  def partitionBy(partitionStrategy: PartitionStrategy): Graph[VD, ED]

  // Transform vertex and edge attributes ==========================================================

  def mapVertices[VD2](map: (VertexID, VD) => VD2): Graph[VD2, ED]

  def mapEdges[ED2](map: Edge[ED] => ED2): Graph[VD, ED2]

  def mapEdges[ED2](map: (PartitionID, Iterator[Edge[ED]]) => Iterator[ED2]): Graph[VD, ED2]

  def mapTriplets[ED2](map: EdgeTriplet[VD, ED] => ED2): Graph[VD, ED2]

  def mapTriplets[ED2](map: (PartitionID, Iterator[EdgeTriplet[VD, ED]]) => Iterator[ED2])

    : Graph[VD, ED2]

  // Modify the graph structure ====================================================================

  def reverse: Graph[VD, ED]

  def subgraph(

      epred: EdgeTriplet[VD,ED] => Boolean = (x => true),

      vpred: (VertexID, VD) => Boolean = ((v, d) => true))

    : Graph[VD, ED]

  def mask[VD2, ED2](other: Graph[VD2, ED2]): Graph[VD, ED]

  def groupEdges(merge: (ED, ED) => ED): Graph[VD, ED]

  // Join RDDs with the graph ======================================================================

  def joinVertices[U](table: RDD[(VertexID, U)])(mapFunc: (VertexID, VD, U) => VD): Graph[VD, ED]

  def outerJoinVertices[U, VD2](other: RDD[(VertexID, U)])

      (mapFunc: (VertexID, VD, Option[U]) => VD2)

    : Graph[VD2, ED]

  // Aggregate information about adjacent triplets =================================================

  def collectNeighborIds(edgeDirection: EdgeDirection): VertexRDD[Array[VertexID]]

  def collectNeighbors(edgeDirection: EdgeDirection): VertexRDD[Array[(VertexID, VD)]]

  def aggregateMessages[Msg: ClassTag](

      sendMsg: EdgeContext[VD, ED, Msg] => Unit,

      mergeMsg: (Msg, Msg) => Msg,

      tripletFields: TripletFields = TripletFields.All)

    : VertexRDD[A]

  // Iterative graph-parallel computation ==========================================================

  def pregel[A](initialMsg: A, maxIterations: Int, activeDirection: EdgeDirection)(

      vprog: (VertexID, VD, A) => VD,

      sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexID,A)],

      mergeMsg: (A, A) => A)

    : Graph[VD, ED]

  // Basic graph algorithms ========================================================================

  def pageRank(tol: Double, resetProb: Double = 0.15): Graph[Double, Double]

  def connectedComponents(): Graph[VertexID, ED]

  def triangleCount(): Graph[Int, ED]

  def stronglyConnectedComponents(numIter: Int): Graph[VertexID, ED]

}

参考链接：

http://spark.apache.org/docs/latest/graphx-programming-guide.html

Spark GraphX图处理编程实例的更多相关文章

Spark GraphX图计算核心源码分析【图构建器、顶点、边】
一.图构建器 GraphX提供了几种从RDD或磁盘上的顶点和边的集合构建图形的方法.默认情况下,没有图构建器会重新划分图的边:相反,边保留在默认分区中.Graph.groupEdges要求对图进行重新 ...
Spark GraphX图计算核心算子实战【AggreagteMessage】
一.简介参考博客:https://www.cnblogs.com/yszd/p/10186556.html 二.代码实现 package graphx import org.apache.log4j ...
Spark GraphX图计算简单案例【代码实现，源码分析】
一.简介参考:https://www.cnblogs.com/yszd/p/10186556.html 二.代码实现 package big.data.analyse.graphx import o ...
Spark GraphX实例(1)
Spark GraphX是一个分布式的图处理框架.社交网络中,用户与用户之间会存在错综复杂的联系,如微信.QQ.微博的用户之间的好友.关注等关系,构成了一张巨大的图,单机无法处理,只能使用分布式图处理 ...
Spark + GraphX + Pregel
Spark+GraphX图 Q:什么是图?图的应用场景 A:图是由顶点集合(vertex)及顶点间的关系集合(边edge)组成的一种网状数据结构,表示为二元组:Gragh=(V,E),V\E分别是顶点 ...
Spark GraphX企业运用
========== Spark GraphX 概述 ==========1.Spark GraphX是什么? (1)Spark GraphX 是 Spark 的一个模块,主要用于进行以图为核心的计 ...
Spark—GraphX编程指南
Spark系列面试题 Spark面试题(一) Spark面试题(二) Spark面试题(三) Spark面试题(四) Spark面试题(五)--数据倾斜调优 Spark面试题(六)--Spark资源调 ...
明风：分布式图计算的平台Spark GraphX 在淘宝的实践
快刀初试:Spark GraphX在淘宝的实践作者:明风 (本文由团队中梧苇和我一起撰写,并由团队中的林岳,岩岫,世仪等多人Review,发表于程序员的8月刊,由于篇幅原因,略作删减,本文为完整版) ...
Spark Graphx编程指南
问题导读1.GraphX提供了几种方式从RDD或者磁盘上的顶点和边集合构造图?2.PageRank算法在图中发挥什么作用?3.三角形计数算法的作用是什么?Spark中文手册-编程指南Spark之一个快 ...

随机推荐

Django实战（16）：Django+jquery
现在我们有了一个使用json格式的RESTful API,可以实现这样的功能了:为了避免在产品列表和购物车之间来回切换,需要在产品列表界面显示购物车,并且通过ajax的方式不刷新界面就更新购物车的显示 ...
Linq To Sql 使用初探
最近有数据处理需要,就是那种从数据库中把数据取出来 ,对其中的部分字段做一些处理再吐回去的工作,从同事那里学习到了,这中活最适合使用 Linq to Sql 这种方式,不用搭建框架,不用自建实体,直接 ...
html+css制作五环（代码极简）
五环把五环做成一个浮动,总是位于屏幕中央的效果. 难点定位的练习 position :fixed 位于body中间的时候 left:50%:top:50%; css内部样式表的使用 style=& ...
[SQL]196. Delete Duplicate Emails
Write a SQL query to delete all duplicate email entries in a table named Person, keeping only unique ...
JPG图片EXIF信息提取工具exif
JPG图片EXIF信息提取工具exif 有相机生成的JPG图片(如照片)都包含有EXIF信息.这些信息包括拍照所使用的设备.地理位置.拍摄时间等敏感信息.Kali Linux内置了EXIF信息提取 ...
Centos7下shell脚本添加开机自启动
添加开机自启脚本,注意都需要用绝对路径 psubscribe.sh脚本中的内容: nohup /usr/bin/php -f /data/aliyun51015cn/redisChannel/psub ...
sublime插件FileHeader使用，自动的添加模板
sublime插件FileHeader能够自动的监测创建新文件动作,自动的添加模板下载地址:https://github.com/shiyanhui/FileHeader FileHeader能够自 ...
POI2018
[BZOJ5099][POI2018]Pionek(极角排序+two pointers) 几个不会严谨证明的结论: 1.将所有向量按极角排序,则答案集合一定是连续的一段. 当答案方向确定时,则一个向量 ...
Windows下ftp服务器搭建及配置
Win系统使用ser-u软件进行FTP服务器的搭建下载地址:https://www.serv-u.com/操作步骤如下:1. 点击执行程序进行按照SU-FTP-Server-Windows-v15.1 ...
MYSQL用户操作管理大杂烩
一.创建用户命令:CREATE USER 'username'@'host' IDENTIFIED BY 'password'; 说明:username - 你将创建的用户名, host - 指定该 ...

Spark GraphX图处理编程实例

所构建的图如下：

Scala程序代码如下：

相关内置的图操作方法有：

参考链接：

http://spark.apache.org/docs/latest/graphx-programming-guide.html

Spark GraphX图处理编程实例的更多相关文章

随机推荐

热门专题