区别：

repartition底层调用的是coalesce方法，默认shuffle

def repartition(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] = withScope {
 coalesce(numPartitions, shuffle = true) 
}

coalesce方法的shuffle参数默认为false，默认不shuffle

def coalesce(numPartitions: Int, shuffle: Boolean = false)(implicit ord: Ordering[T] = null)

    : RDD[T] = withScope {

  if (shuffle) {

    /** Distributes elements evenly across output partitions, starting from a random partition. */

    val distributePartition = (index: Int, items: Iterator[T]) => {

      var position = (new Random(index)).nextInt(numPartitions)

      items.map { t =>

        // Note that the hash code of the key will just be the key itself. The HashPartitioner

        // will mod it with the number of total partitions.

        position = position + 1

        (position, t)

      }

    } : Iterator[(Int, T)]

    // include a shuffle step so that our upstream tasks are still distributed

    new CoalescedRDD(

      new ShuffledRDD[Int, T, T](mapPartitionsWithIndex(distributePartition),

      new HashPartitioner(numPartitions)),

      numPartitions).values

  } else {

    new CoalescedRDD(this, numPartitions)

  }

}

使用场景：

如果你减少分区数，考虑使用coalesce，这样可以避免执行shuffle。但是假如内存不够用，可能会引起内存溢出。

spark coalesce和repartition的区别和使用场景的更多相关文章

spark partition 理解 / coalesce 与 repartition的区别
一.spark 分区 partition的理解: spark中是以vcore级别调度task的. 如果读取的是hdfs,那么有多少个block,就有多少个partition 举例来说:sparksql ...
Spark TempView和GlobalTempView的区别
Spark TempView和GlobalTempView的区别 TempView和GlobalTempView在spark的Dataframe中经常使用,两者的区别和应用场景有什么不同. 我们以下面 ...
list set map区别及适用场景
list与Set.Map区别及适用场景 1.List,Set都是继承自Collection接口,Map则不是 2.List特点:元素有放入顺序,元素可重复 ,Set特点:元素无放入顺序,元素不可重 ...
session,cookie,sessionStorage,localStorage的区别及应用场景
session,cookie,sessionStorage,localStorage的区别及应用场景浏览器的缓存机制提供了可以将用户数据存储在客户端上的方式,可以利用cookie,session等跟 ...
Java内存的静态方法和实例方法的区别及使用场景
注意:变量指基本数据类型非对象,局部变量不能被静态修饰 1.(静态)成员变量存放在data segment区(数据区),字符串常量也存放在该区 2.非静态变量,new出来的对象存放在堆内存,所有局部变 ...
【转】ArrayList与LinkedList的区别和适用场景
ArrayList 优点:ArrayList是实现了基于动态数组的数据结构,因为地址连续,一旦数据存储好了,查询操作效率会比较高(在内存里是连着放的). 缺点:因为地址连续,当要插入和删除时,Arra ...
转载>>C# Invoke和BeginInvoke区别和使用场景
转载>>C# Invoke和BeginInvoke区别和使用场景一.为什么Control类提供了Invoke和BeginInvoke机制? 关于这个问题的最主要的原因已经是dotnet程 ...
java 常用集合list与Set、Map区别及适用场景总结
转载请备注出自于:http://blog.csdn.net/qq_22118507/article/details/51576319 list与Set.Map区别及 ...
hibernate与mybatis的区别和应用场景
mybatis 与 hibernate 的区别和应用场景(转) 1 Hibernate : 标准的ORM(对象关系映射) 框架: 不要用写sql, sql 自动语句生成: 使用Hibernate ...

随机推荐

ABAP DEMO33 选择周的搜索帮助
效果图 *&---------------------------------------------------------------------**& Report YCX_02 ...
Okhttp3基本使用
https://square.github.io/okhttp/ https://www.jianshu.com/p/da4a806e599b https://www.cnblogs.com/wzk- ...
高级UI-事件传递
事件传递在Android中有着举足轻重的作用,那么事件的传递在Android中又是怎么样实现的呢,在这里我们将进一步探讨Android的事件传递机制从一个例子入手首先是一个简单的onTouch和o ...
微信小程序与后台交互----传递和回传时间
wxml代码  <view class="container"> <view class="section ...
poj1228（稳定凸包+特判最后一条边）
题目链接:https://vjudge.net/problem/POJ-1228 题意:我是真的没看懂题意QAQ...搜了才知道.题目给了n个点,问这n个点确定的凸包是否能通过添加点来变成一个新的凸包 ...
双链表的基本实现与讲解（C++描述）
双链表双链表的意义单链表相对于顺序表,确实在某些场景下解决了一些重要的问题,例如在需要插入或者删除大量元素的时候,它并不需要像顺序表一样移动很多元素,只需要修改指针的指向就可以了,其时间复杂度为 ...
LeetCode 783. 二叉搜索树结点最小距离(Minimum Distance Between BST Nodes)
783. 二叉搜索树结点最小距离 LeetCode783. Minimum Distance Between BST Nodes 题目描述给定一个二叉搜索树的根结点 root, 返回树中任意两节点的 ...
综述论文翻译：A Review on Deep Learning Techniques Applied to Semantic Segmentation
近期主要在学习语义分割相关方法,计划将arXiv上的这篇综述好好翻译下,目前已完成了一部分,但仅仅是尊重原文的直译,后续将继续完成剩余的部分,并对文中提及的多个方法给出自己的理解. 论文地址:http ...
专业仿百度百科，维基wiki百科网站开发建设
专业仿百度百科,维基wiki百科网站开发建设,有需要的朋友可以欢迎私聊我提供一站式服务:联系QQ:8582-36016(私聊),微信:lianweikj 电话:186-7597-7935 支持终端: ...
django使用pyecharts(5)----django加入echarts_增量更新_定长
五.Django 前后端分离_定时增量更新图表定长数据 1.安装 djangorestframework linux pip3 install djangorestframework windows ...

spark coalesce和repartition的区别和使用场景

区别：

使用场景：

spark coalesce和repartition的区别和使用场景的更多相关文章

随机推荐

热门专题