/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.spark.graphx import scala.reflect.ClassTag
import org.apache.spark.Logging /**
* Implements a Pregel-like bulk-synchronous message-passing API.
*
* Unlike the original Pregel API, the GraphX Pregel API factors the sendMessage computation over
* edges, enables the message sending computation to read both vertex attributes, and constrains
* messages to the graph structure. These changes allow for substantially more efficient
* distributed execution while also exposing greater flexibility for graph-based computation.
*
* @example We can use the Pregel abstraction to implement PageRank:
* {{{
* val pagerankGraph: Graph[Double, Double] = graph
* // Associate the degree with each vertex
* .outerJoinVertices(graph.outDegrees) {
* (vid, vdata, deg) => deg.getOrElse(0)
* }
* // Set the weight on the edges based on the degree
* .mapTriplets(e => 1.0 / e.srcAttr)
* // Set the vertex attributes to the initial pagerank values
* .mapVertices((id, attr) => 1.0)
*
* def vertexProgram(id: VertexId, attr: Double, msgSum: Double): Double =
* resetProb + (1.0 - resetProb) * msgSum
* def sendMessage(id: VertexId, edge: EdgeTriplet[Double, Double]): Iterator[(VertexId, Double)] =
* Iterator((edge.dstId, edge.srcAttr * edge.attr))
* def messageCombiner(a: Double, b: Double): Double = a + b
* val initialMessage = 0.0
* // Execute Pregel for a fixed number of iterations.
* Pregel(pagerankGraph, initialMessage, numIter)(
* vertexProgram, sendMessage, messageCombiner)
* }}}
*
*/
object Pregel extends Logging { /**
* Execute a Pregel-like iterative vertex-parallel abstraction. The
* user-defined vertex-program `vprog` is executed in parallel on
* each vertex receiving any inbound messages and computing a new
* value for the vertex. The `sendMsg` function is then invoked on
* all out-edges and is used to compute an optional message to the
* destination vertex. The `mergeMsg` function is a commutative
* associative function used to combine messages destined to the
* same vertex.
*
* On the first iteration all vertices receive the `initialMsg` and
* on subsequent iterations if a vertex does not receive a message
* then the vertex-program is not invoked.
*
* This function iterates until there are no remaining messages, or
* for `maxIterations` iterations.
*
* @tparam VD the vertex data type
* @tparam ED the edge data type
* @tparam A the Pregel message type
*
* @param graph the input graph.
*
* @param initialMsg the message each vertex will receive at the on
* the first iteration
*
* @param maxIterations the maximum number of iterations to run for
*
* @param activeDirection the direction of edges incident to a vertex that received a message in
* the previous round on which to run `sendMsg`. For example, if this is `EdgeDirection.Out`, only
* out-edges of vertices that received a message in the previous round will run. The default is
* `EdgeDirection.Either`, which will run `sendMsg` on edges where either side received a message
* in the previous round. If this is `EdgeDirection.Both`, `sendMsg` will only run on edges where
* *both* vertices received a message.
*
* @param vprog the user-defined vertex program which runs on each
* vertex and receives the inbound message and computes a new vertex
* value. On the first iteration the vertex program is invoked on
* all vertices and is passed the default message. On subsequent
* iterations the vertex program is only invoked on those vertices
* that receive messages.
*
* @param sendMsg a user supplied function that is applied to out
* edges of vertices that received messages in the current
* iteration
*
* @param mergeMsg a user supplied function that takes two incoming
* messages of type A and merges them into a single message of type
* A. ''This function must be commutative and associative and
* ideally the size of A should not increase.''
*
* @return the resulting graph at the end of the computation
*
*/
def apply[VD: ClassTag, ED: ClassTag, A: ClassTag]
(graph: Graph[VD, ED],
initialMsg: A,
maxIterations: Int = Int.MaxValue,
activeDirection: EdgeDirection = EdgeDirection.Either)
(vprog: (VertexId, VD, A) => VD,
sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)],
mergeMsg: (A, A) => A)
: Graph[VD, ED] =
{
var g = graph.mapVertices((vid, vdata) => vprog(vid, vdata, initialMsg)).cache()
// compute the messages
var messages = g.mapReduceTriplets(sendMsg, mergeMsg)
var activeMessages = messages.count()
// Loop
var prevG: Graph[VD, ED] = null
var i = 0
while (activeMessages > 0 && i < maxIterations) {
// Receive the messages. Vertices that didn't get any messages do not appear in newVerts.
val newVerts = g.vertices.innerJoin(messages)(vprog).cache()
// Update the graph with the new vertices.
prevG = g
g = g.outerJoinVertices(newVerts) { (vid, old, newOpt) => newOpt.getOrElse(old) }
g.cache() val oldMessages = messages
// Send new messages. Vertices that didn't get any messages don't appear in newVerts, so don't
// get to send messages. We must cache messages so it can be materialized on the next line,
// allowing us to uncache the previous iteration.
messages = g.mapReduceTriplets(sendMsg, mergeMsg, Some((newVerts, activeDirection))).cache()
// The call to count() materializes `messages`, `newVerts`, and the vertices of `g`. This
// hides oldMessages (depended on by newVerts), newVerts (depended on by messages), and the
// vertices of prevG (depended on by newVerts, oldMessages, and the vertices of g).
activeMessages = messages.count() logInfo("Pregel finished iteration " + i) // Unpersist the RDDs hidden by newly-materialized RDDs
oldMessages.unpersist(blocking=false)
newVerts.unpersist(blocking=false)
prevG.unpersistVertices(blocking=false)
prevG.edges.unpersist(blocking=false)
// count the iteration
i += 1
} g
} // end of apply } // end of class Pregel

GraphX之Pregel(BSP模型-消息传递机制)学习的更多相关文章

  1. Android学习笔记-事件处理之Handler消息传递机制

    内容摘要:Android Handler消息传递机制的学习总结.问题记录 Handler消息传递机制的目的: 1.实现线程间通信(如:Android平台只允许主线程(UI线程)修改Activity里的 ...

  2. Android学习之Handler消息传递机制

    Android只允许UI线程修改Activity里的UI组件.当Android程序第一次启动时,Android会同时启动一条主线程(Main Thread),主线程主要负责处理与UI相关的事件,如用户 ...

  3. 从BSP模型到Apache Hama

    一.什么是BSP模型 概述 BSP(Bulk Synchronous Parallel,整体同步并行计算模型)是一种并行计算模型,由英国计算机科学家Viliant在上世纪80年代提出.Google发布 ...

  4. BSP模型

    http://www.uml.org.cn/yunjisuan/201212191.asp Hama中最关键的就是BSP(Bulk Synchronous Parallel-"大型" ...

  5. 我理解的Hanlder--android消息传递机制

    每一个学习Android的同学都会觉得Handler是一个神奇的东西,我也一样,开始我以为我懂了Handler的机制,后来发现自己是一知半解,昨天想想,我能否自己实现一个Handler,让子线程与Ac ...

  6. Chrome 消息传递机制

    Chrome插件开发入门(二)——消息传递机制 Blog | Qiushi Chen 2014-03-31 9538 阅读 Chrome 插件 由于插件的js运行环境有区别,所以消息传递机制是一个重要 ...

  7. Chrome插件开发入门(二)——消息传递机制

    Chrome插件开发入门(二)——消息传递机制   由于插件的js运行环境有区别,所以消息传递机制是一个重要内容.阅读了很多博文,大家已经说得很清楚了,直接转一篇@姬小光 的博文,总结的挺好.后面附一 ...

  8. iOS开发——OC篇&消息传递机制(KVO/NOtification/Block/代理/Target-Action)

     iOS开发中消息传递机制(KVO/NOtification/Block/代理/Target-Action)   今晚看到了一篇好的文章,所以就搬过来了,方便自己以后学习 虽然这一期的主题是关于Fou ...

  9. (Android数据传递)Intent消息传递机制 “Intent”“数据传递”

    Intent类的继承关系:   需要注意的是,该类实现了Parcelable(用于数据传递)和Cloneable接口. Intent是一种(系统级别的)消息传递机制,可以在应用程序内使用,也可以在应用 ...

随机推荐

  1. 错误:libstdc++.so.6: wrong ELF class

    1.安装mysql的时候报错缺少GLIBCXX_3.4.15 2.按照网上的下载了libstdc++.so.6.0.17 放到/usr/lib64 下 删除之前的libstdc++.so.6的链接 重 ...

  2. sed (未完,待续)

    sed [options] 'command' file(s) options: -e<script>或--expression=<script>:以选项中的指定的script ...

  3. 688. Knight Probability in Chessboard棋子留在棋盘上的概率

    [抄题]: On an NxN chessboard, a knight starts at the r-th row and c-th column and attempts to make exa ...

  4. node.js中使用yargs来处理命令行参数

    yargs库能够方便的处理命令行参数. 一.安装 yargs npm install yargs --save 二.读取命令行参数 const yargs = require('yargs'); le ...

  5. 五、secureCRT远程连接工具的使用

    1.secureCRT实现远程传输文件到服务器机器 alt+p ,进入sftp模式,输入命令:put 文件所在的本机位置

  6. Python开发——变量

    变量的作用 把程序运行的中间结果,临时保存到内存里,以备后面的代码继续调用 变量的声明 name = “yuan” 变量的定义规则 1.变量名只能是  字母.数字或下划线的任意组合 2.变量名的第一个 ...

  7. Elasticsearch tshark 封包分析 (转)

    Elasticsearch tshark 封包分析 使用wireshark能解決許多網路問題,將側錄下來的封包傳至Elasticsearch上方便分析製作及時報表.tshark為wireshark的命 ...

  8. powershell获取windows子文件夹的大小

    $startFolder = "E:\Migration\" $colItems = (Get-ChildItem $startFolder | Where-Object {$_. ...

  9. c++11 线程池学习笔记 (一) 任务队列

    学习内容来自一下地址 http://www.cnblogs.com/qicosmos/p/4772486.html github https://github.com/qicosmos/cosmos ...

  10. mui.fire()触发自定义事件

    导读:添加自定义事件监听操作和标准js事件监听类似,可直接通过window对象添加,通过mui.fire()方法可触发目标窗口的自定义事件. 监听自定义事件 添加自定义事件监听操作和标准js事件监听类 ...