GraphX之Pregel（BSP模型-消息传递机制）学习

/*

 * Licensed to the Apache Software Foundation (ASF) under one or more

 * contributor license agreements.  See the NOTICE file distributed with

 * this work for additional information regarding copyright ownership.

 * The ASF licenses this file to You under the Apache License, Version 2.0

 * (the "License"); you may not use this file except in compliance with

 * the License.  You may obtain a copy of the License at

 *

 *    http://www.apache.org/licenses/LICENSE-2.0

 *

 * Unless required by applicable law or agreed to in writing, software

 * distributed under the License is distributed on an "AS IS" BASIS,

 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

 * See the License for the specific language governing permissions and

 * limitations under the License.

 */

package org.apache.spark.graphx

import scala.reflect.ClassTag

import org.apache.spark.Logging

/**

 * Implements a Pregel-like bulk-synchronous message-passing API.

 *

 * Unlike the original Pregel API, the GraphX Pregel API factors the sendMessage computation over

 * edges, enables the message sending computation to read both vertex attributes, and constrains

 * messages to the graph structure.  These changes allow for substantially more efficient

 * distributed execution while also exposing greater flexibility for graph-based computation.

 *

 * @example We can use the Pregel abstraction to implement PageRank:

 * {{{

 * val pagerankGraph: Graph[Double, Double] = graph

 *   // Associate the degree with each vertex

 *   .outerJoinVertices(graph.outDegrees) {

 *     (vid, vdata, deg) => deg.getOrElse(0)

 *   }

 *   // Set the weight on the edges based on the degree

 *   .mapTriplets(e => 1.0 / e.srcAttr)

 *   // Set the vertex attributes to the initial pagerank values

 *   .mapVertices((id, attr) => 1.0)

 *

 * def vertexProgram(id: VertexId, attr: Double, msgSum: Double): Double =

 *   resetProb + (1.0 - resetProb) * msgSum

 * def sendMessage(id: VertexId, edge: EdgeTriplet[Double, Double]): Iterator[(VertexId, Double)] =

 *   Iterator((edge.dstId, edge.srcAttr * edge.attr))

 * def messageCombiner(a: Double, b: Double): Double = a + b

 * val initialMessage = 0.0

 * // Execute Pregel for a fixed number of iterations.

 * Pregel(pagerankGraph, initialMessage, numIter)(

 *   vertexProgram, sendMessage, messageCombiner)

 * }}}

 *

 */

object Pregel extends Logging {

  /**

   * Execute a Pregel-like iterative vertex-parallel abstraction.  The

   * user-defined vertex-program `vprog` is executed in parallel on

   * each vertex receiving any inbound messages and computing a new

   * value for the vertex.  The `sendMsg` function is then invoked on

   * all out-edges and is used to compute an optional message to the

   * destination vertex. The `mergeMsg` function is a commutative

   * associative function used to combine messages destined to the

   * same vertex.

   *

   * On the first iteration all vertices receive the `initialMsg` and

   * on subsequent iterations if a vertex does not receive a message

   * then the vertex-program is not invoked.

   *

   * This function iterates until there are no remaining messages, or

   * for `maxIterations` iterations.

   *

   * @tparam VD the vertex data type

   * @tparam ED the edge data type

   * @tparam A the Pregel message type

   *

   * @param graph the input graph.

   *

   * @param initialMsg the message each vertex will receive at the on

   * the first iteration

   *

   * @param maxIterations the maximum number of iterations to run for

   *

   * @param activeDirection the direction of edges incident to a vertex that received a message in

   * the previous round on which to run `sendMsg`. For example, if this is `EdgeDirection.Out`, only

   * out-edges of vertices that received a message in the previous round will run. The default is

   * `EdgeDirection.Either`, which will run `sendMsg` on edges where either side received a message

   * in the previous round. If this is `EdgeDirection.Both`, `sendMsg` will only run on edges where

   * *both* vertices received a message.

   *

   * @param vprog the user-defined vertex program which runs on each

   * vertex and receives the inbound message and computes a new vertex

   * value.  On the first iteration the vertex program is invoked on

   * all vertices and is passed the default message.  On subsequent

   * iterations the vertex program is only invoked on those vertices

   * that receive messages.

   *

   * @param sendMsg a user supplied function that is applied to out

   * edges of vertices that received messages in the current

   * iteration

   *

   * @param mergeMsg a user supplied function that takes two incoming

   * messages of type A and merges them into a single message of type

   * A.  ''This function must be commutative and associative and

   * ideally the size of A should not increase.''

   *

   * @return the resulting graph at the end of the computation

   *

   */

  def apply[VD: ClassTag, ED: ClassTag, A: ClassTag]

     (graph: Graph[VD, ED],

      initialMsg: A,

      maxIterations: Int = Int.MaxValue,

      activeDirection: EdgeDirection = EdgeDirection.Either)

     (vprog: (VertexId, VD, A) => VD,

      sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)],

      mergeMsg: (A, A) => A)

    : Graph[VD, ED] =

  {

    var g = graph.mapVertices((vid, vdata) => vprog(vid, vdata, initialMsg)).cache()

    // compute the messages

    var messages = g.mapReduceTriplets(sendMsg, mergeMsg)

    var activeMessages = messages.count()

    // Loop

    var prevG: Graph[VD, ED] = null

    var i = 0

    while (activeMessages > 0 && i < maxIterations) {

      // Receive the messages. Vertices that didn't get any messages do not appear in newVerts.

      val newVerts = g.vertices.innerJoin(messages)(vprog).cache()

      // Update the graph with the new vertices.

      prevG = g

      g = g.outerJoinVertices(newVerts) { (vid, old, newOpt) => newOpt.getOrElse(old) }

      g.cache()

      val oldMessages = messages

      // Send new messages. Vertices that didn't get any messages don't appear in newVerts, so don't

      // get to send messages. We must cache messages so it can be materialized on the next line,

      // allowing us to uncache the previous iteration.

      messages = g.mapReduceTriplets(sendMsg, mergeMsg, Some((newVerts, activeDirection))).cache()

      // The call to count() materializes `messages`, `newVerts`, and the vertices of `g`. This

      // hides oldMessages (depended on by newVerts), newVerts (depended on by messages), and the

      // vertices of prevG (depended on by newVerts, oldMessages, and the vertices of g).

      activeMessages = messages.count()

      logInfo("Pregel finished iteration " + i)

      // Unpersist the RDDs hidden by newly-materialized RDDs

      oldMessages.unpersist(blocking=false)

      newVerts.unpersist(blocking=false)

      prevG.unpersistVertices(blocking=false)

      prevG.edges.unpersist(blocking=false)

      // count the iteration

      i += 1

    }

    g

  } // end of apply

} // end of class Pregel

GraphX之Pregel（BSP模型-消息传递机制）学习的更多相关文章

Android学习笔记-事件处理之Handler消息传递机制
内容摘要:Android Handler消息传递机制的学习总结.问题记录 Handler消息传递机制的目的: 1.实现线程间通信(如:Android平台只允许主线程(UI线程)修改Activity里的 ...
Android学习之Handler消息传递机制
Android只允许UI线程修改Activity里的UI组件.当Android程序第一次启动时,Android会同时启动一条主线程(Main Thread),主线程主要负责处理与UI相关的事件,如用户 ...
从BSP模型到Apache Hama
一.什么是BSP模型概述 BSP(Bulk Synchronous Parallel,整体同步并行计算模型)是一种并行计算模型,由英国计算机科学家Viliant在上世纪80年代提出.Google发布 ...
BSP模型
http://www.uml.org.cn/yunjisuan/201212191.asp Hama中最关键的就是BSP(Bulk Synchronous Parallel-"大型" ...
我理解的Hanlder--android消息传递机制
每一个学习Android的同学都会觉得Handler是一个神奇的东西,我也一样,开始我以为我懂了Handler的机制,后来发现自己是一知半解,昨天想想,我能否自己实现一个Handler,让子线程与Ac ...
Chrome 消息传递机制
Chrome插件开发入门(二)——消息传递机制 Blog | Qiushi Chen 2014-03-31 9538 阅读 Chrome 插件由于插件的js运行环境有区别,所以消息传递机制是一个重要 ...
Chrome插件开发入门（二）——消息传递机制
Chrome插件开发入门(二)——消息传递机制由于插件的js运行环境有区别,所以消息传递机制是一个重要内容.阅读了很多博文,大家已经说得很清楚了,直接转一篇@姬小光的博文,总结的挺好.后面附一 ...
iOS开发——OC篇&消息传递机制（KVO／NOtification／Block／代理／Target－Action）
iOS开发中消息传递机制(KVO/NOtification/Block/代理/Target-Action) 今晚看到了一篇好的文章,所以就搬过来了,方便自己以后学习虽然这一期的主题是关于Fou ...
（Android数据传递）Intent消息传递机制 “Intent”“数据传递”
Intent类的继承关系: 需要注意的是,该类实现了Parcelable(用于数据传递)和Cloneable接口. Intent是一种(系统级别的)消息传递机制,可以在应用程序内使用,也可以在应用 ...

随机推荐

mysql命令行常用指令
一. 启动mysql:service mysql start 停止mysql:service mysql stop 重启mysql:service mysql restart 查看mysql服务状态: ...
mysql linux安装
Mysql(使用版本5.7.25) 1. 检查是否已安装 #rpm -qa|grep -i mysql 2. 下载安装包网址:https://dev.mysql.com/downloads/my ...
python note 02 格式化与判断、字符串转换
1.格式化输出% %s %d name = input ('请输入姓名:') age = input ('请输入年龄:') height = input ('请输入身高:') msg = " ...
命令实现linux和客户端文件上传下载
1.rz/sz命令 linux端使用rz/sz实现和windows客户端 linux服务器需要装了rz,sz yum install lrzsz 当然你的本地windows主机也通过ssh连接了lin ...
[leetcode]96. Unique Binary Search Trees给定节点形成不同BST的个数
Given n, how many structurally unique BST's (binary search trees) that store values 1 ... n? Input: ...
[leetcode]33. Search in Rotated Sorted Array旋转过有序数组里找目标值
Suppose an array sorted in ascending order is rotated at some pivot unknown to you beforehand. (i.e. ...
stark组件开发之批量操作
class UserInfoHandler(StartHandler): ....... # 批量操作功能的列表,添加则显示, 使用此功能.需要将StartHandler.display_checkb ...
Linux下搭建ftp服务
Linux下ftp服务可以通过搭建vsftpd服务来实现,以CentOS为例,首先查看系统中是否安装了vsftpd,可以通过执行命令 rpm -qa | grep vsftpd 来查看是否安装相应的包 ...
HTML第二篇
1>压缩文件格式:使用.zip格式较好 2>charset(字符集) 国内最新字符集格式为:gb18030 国际上通用的字符集是:UTF-8 3>添加图片 <img sr ...
ArcGIS for JS 离线部署
本文以arcgis_js_v36_api为例,且安装的是IIS Web服务器 1.下载最新的ArcGIS for JS api 包,可在Esri中国社区或者Esri官网下载 2.下载后解压 3.将解压 ...

GraphX之Pregel（BSP模型-消息传递机制）学习

GraphX之Pregel（BSP模型-消息传递机制）学习的更多相关文章

随机推荐

热门专题