Spark2.0 shuffle service

Spark 的shuffle 服务是spark的核心，本文介绍了非ExternalShuffleClient的方式，看BlockService的整个架构。ShuffleClient是整个框架的基础，有init方法和fetchBlock两个方法。

/** Provides an interface for reading shuffle files, either from an Executor or external service. */

public abstract class ShuffleClient implements Closeable {  

  /**

   * Initializes the ShuffleClient, specifying this Executor's appId.

   * Must be called before any other method on the ShuffleClient.

   * 初始化ShuffleClient, 传入本执行器的程序ID，本方法必须在访问ShuffleClient的其它方法前调用。

   */

  public void init(String appId) { }  

  /**

   * Fetch a sequence of blocks from a remote node asynchronously,

   *

   * Note that this API takes a sequence so the implementation can batch requests, and does not

   * return a future so the underlying implementation can invoke onBlockFetchSuccess as soon as

   * the data of a block is fetched, rather than waiting for all blocks to be fetched.

   * 异步的从远程结点取一系列的数据块，并且不返回future对象，所以当取到一个数据块的数据时，底层的实现可以调用onBlockFetchSuccess方法，

   * 并不用等所有的数据块都取完。

   */

  public abstract void fetchBlocks(

      String host,

      int port,

      String execId,

      String[] blockIds,

      BlockFetchingListener listener);

}

BlockFetchingListener接口，onBlockFetchSuccess方法：每次成功取得一个数据块时调用。当本方法返回时，数据必须被自动释放。如果数据被传递给另一个线程，接收者必须自己调用retain()和release()，或者拷贝数据到一个新的缓冲区。onBlockFetchFailure方法，数据块获取失败时，至少被调用一次。

public interface BlockFetchingListener extends EventListener {

  /**

   * Called once per successfully fetched block. After this call returns, data will be released

   * automatically. If the data will be passed to another thread, the receiver should retain()

   * and release() the buffer on their own, or copy the data to a new buffer.

   */

  void onBlockFetchSuccess(String blockId, ManagedBuffer data);  

  /**

   * Called at least once per block upon failures.

   */

  void onBlockFetchFailure(String blockId, Throwable exception);

}

BlockTransferService扩展了ShuffleClient，有一些方法的公共的实现。

private[spark]

abstract class BlockTransferService extends ShuffleClient with Closeable with Logging {  

  /**

   * Initialize the transfer service by giving it the BlockDataManager that can be used to fetch

   * local blocks or put local blocks.

   * 通过传递给他BlockDataManager对象来初始化传输服务，BlockDataManager可以用来存取本地数据块。

   */

  def init(blockDataManager: BlockDataManager): Unit  

  /**

   * Tear down the transfer service.

   * 关闭传输服务。

   */

  def close(): Unit  

  /**

   * Port number the service is listening on, available only after [[init]] is invoked.

   * 传输服务所在的端口号，在调用init方法后可用。

   */

  def port: Int  

  /**

   * Host name the service is listening on, available only after [[init]] is invoked.

   * 传输服务所在的主机名，在调用init方法后可用。

   */

  def hostName: String  

  /**

   * Fetch a sequence of blocks from a remote node asynchronously,

   * available only after [[init]] is invoked.

   *

   * Note that this API takes a sequence so the implementation can batch requests, and does not

   * return a future so the underlying implementation can invoke onBlockFetchSuccess as soon as

   * the data of a block is fetched, rather than waiting for all blocks to be fetched.

   *

   * 异步的从远程结点取一系列的数据块，，仅在调用init方法后可用。

   * 注意本API用一个序列，所以实现可以使用批量请求，并且不返回future对象，所以当取到一个数据块的数据时，底层的实现可以调用onBlockFetchSuccess方法，

   * 并不用等所有的数据块都取完。

 */ 

override def fetchBlocks( host: String, port: Int, execId: String, blockIds: Array[String], listener: BlockFetchingListener): Unit 

/**
　 * Upload a single block to a remote node, available only after [[init]] is invoked.

   * 上传一个数据块到远程结点，仅在调用init方法后可用。

   */

  def uploadBlock(

      hostname: String,

      port: Int,

      execId: String,

      blockId: BlockId,

      blockData: ManagedBuffer,

      level: StorageLevel,

      classTag: ClassTag[_]): Future[Unit]  

  /**

   * A special case of [[fetchBlocks]], as it fetches only one block and is blocking.

   *

   * It is also only available after [[init]] is invoked.

   * fetchBlocks的一个特别方法，他只取一个数据块并且阻塞，仅在调用init方法后可用。

。

   */

  def fetchBlockSync(host: String, port: Int, execId: String, blockId: String): ManagedBuffer = {

    // A monitor for the thread to wait on.

    val result = Promise[ManagedBuffer]()

    fetchBlocks(host, port, execId, Array(blockId),

      new BlockFetchingListener {

        override def onBlockFetchFailure(blockId: String, exception: Throwable): Unit = {

          result.failure(exception)

        }

        override def onBlockFetchSuccess(blockId: String, data: ManagedBuffer): Unit = {

          val ret = ByteBuffer.allocate(data.size.toInt)

          ret.put(data.nioByteBuffer())

          ret.flip()

          result.success(new NioManagedBuffer(ret))

        }

      })

    ThreadUtils.awaitResult(result.future, Duration.Inf)

  }  

  /**

   * Upload a single block to a remote node, available only after [[init]] is invoked.

   *

   * This method is similar to [[uploadBlock]], except this one blocks the thread

   * until the upload finishes.

   * 上传一个数据块到远程结点，仅在调用init方法后可用。

   * 这个方法和uploadBlock方法类似，除了直到上传结点，本方法会一直阻塞。

   */

  def uploadBlockSync(

      hostname: String,

      port: Int,

      execId: String,

      blockId: BlockId,

      blockData: ManagedBuffer,

      level: StorageLevel,

      classTag: ClassTag[_]): Unit = {

    val future = uploadBlock(hostname, port, execId, blockId, blockData, level, classTag)

    ThreadUtils.awaitResult(future, Duration.Inf)

  }

}

NettyBlockTransferService扩展了BlockTransferServie

Spark2.0 shuffle service的更多相关文章

hadoop-2.7.3.tar.gz + spark-2.0.2-bin-hadoop2.7.tgz + zeppelin-0.6.2-incubating-bin-all.tgz（master、slave1和slave2）（博主推荐）（图文详解）
不多说,直接上干货! 我这里,采取的是ubuntu 16.04系统,当然大家也可以在CentOS6.5里,这些都是小事 CentOS 6.5的安装详解 hadoop-2.6.0.tar.gz + sp ...
Ubuntu14.04或16.04下安装JDK1.8+Scala+Hadoop2.7.3+Spark2.0.2
为了将Hadoop和Spark的安装简单化,今日写下此帖. 首先,要看手头有多少机器,要安装伪分布式的Hadoop+Spark还是完全分布式的,这里分别记录. 1. 伪分布式安装伪分布式的Hadoo ...
图文解析Spark2.0核心技术(转载)
导语 Spark2.0于2016-07-27正式发布,伴随着更简单.更快速.更智慧的新特性,spark 已经逐步替代 hadoop 在大数据中的地位,成为大数据处理的主流标准.本文主要以代码和绘图的方 ...
Spark2.0机器学习系列之1：聚类算法(LDA）
在Spark2.0版本中(不是基于RDD API的MLlib),共有四种聚类方法: (1)K-means (2)Latent Dirichlet allocation (LDA) ...
在centos7上安装部署hadoop2.7.3和spark2.0.0
一.安装装备下载安装包: vmware workstations pro 12 三台centos7.1 mini 虚拟机网络配置NAT网络如下: 二.创建hadoop用户和hadoop用户组 1. ...
hive on spark (spark2.0.0 hive2.3.3)
hive on spark真的很折腾人啊!!!!!!! 一.软件准备阶段 maven3.3.9 spark2.0.0 hive2.3.3 hadoop2.7.6 二.下载源码spark2.0.0,编译 ...
Spark2.0集成Hive操作的相关配置与注意事项
前言已完成安装Apache Hive,具体安装步骤请参照,Linux基于Hadoop2.8.0集群安装配置Hive2.1.1及基础操作补充说明 Hive中metastore(元数据存储)的三种方式 ...
降本增效利器！趣头条Spark Remote Shuffle Service最佳实践
王振华,趣头条大数据总监,趣头条大数据负责人曹佳清,趣头条大数据离线团队高级研发工程师,曾就职于饿了么大数据INF团队负责存储层和计算层组件研发,目前负责趣头条大数据计算层组件Spark的建设范振 ...
Magnet: Push-based Shuffle Service for Large-scale Data Processing
本文是阅读 LinkedIn 公司2020年发表的论文 Magnet: Push-based Shuffle Service for Large-scale Data Processing 一点笔记. ...

随机推荐

NGUI之scroll view的制作和踩坑总结
之前也看了不少童鞋谢了关于NGUI的scroll view的制作下面我写下自己的制作过程以及心得,希望对童鞋们有所帮助.1.首先建立一个960*640的背景参考http://game.ceeger.c ...
用ADO操作数据库的方法步骤
用ADO操作数据库的方法步骤学习ADO时总结的一些经验 - 技术成就梦想 - 51CTO技术博客 http://freetoskey.blog.51cto.com/1355382/989218 ...
使用 MVVMLight 消息通知
欢迎阅读我的MVVMLight教程系列文章<关于 MVVMLight 设计模式系列> 在文章的其实我们就说了,MVVMLight的精华就是消息通知机制,设计的非常不错.这个东西在MVVML ...
超全面的JavaWeb笔记day06<Schema&SAX&dom4j>
1.Schema的简介和快速入门(了解) 2.Schema文档的开发流程(了解) 3.Schema文档的名称空间(了解) 4.SAX解析原理分析(*********) 5.SAX解析xml获得整个文档 ...
Leetcode: Best Time to Buy and Sell Stock I, II
思路: 1. 算法导论讲 divide and conquer 时, 讲到过这个例子. 书中的做法是先让 price 数组减去一个值, 然后求解最大连续子数组的和. 分治算法的复杂度为 o(nlogn ...
swift - UISlider 的用法
swift的UISlider的用法和oc基本没有区别 1.创建 class SecondViewController: UIViewController { var slider = UISlider ...
ScrollView拉到尽头时出现阴影的解决方法
<code class="hljs markdown has-numbering" style="display: block; padding: 0px; col ...
ContentPriver
共享应用程序内的数据, 在数据修改时可以监听 1.特点 ①.可以将应用中的数据对外进行共享: ②.数据访问方式统一,不必针对不同数据类型采取不同的访问策略: ③.内容提供者将数据封装,只暴露出我们希望 ...
Memcache命令及参数用法
Memcache命令:在linux下: # /usr/local/bin/memcached -d -m 128 -u root -l 192.168.0.10 -p 12121 -c 256 -P ...
poj_3185 反转问题
题目大意有20个碗排成一排,有些碗口朝上,有些碗口朝下.每次可以反转其中的一个碗,但是在反转该碗时,该碗左右两边的碗也跟着被反转(如果该碗为边界上的碗,则只有一侧的碗被反转).求最少需要反转几次,可 ...

Spark2.0 shuffle service

Spark2.0 shuffle service的更多相关文章

随机推荐

热门专题