Spark会产生shuffle的算子

去重

def distinct()

def distinct(numPartitions: Int)

聚合

def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)]

def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)]

def groupBy[K](f: T => K, p: Partitioner):RDD[(K, Iterable[V])]

def groupByKey(partitioner: Partitioner):RDD[(K, Iterable[V])]

def aggregateByKey[U: ClassTag](zeroValue: U, partitioner: Partitioner): RDD[(K, U)]

def aggregateByKey[U: ClassTag](zeroValue: U, numPartitions: Int): RDD[(K, U)]

def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C): RDD[(K, C)]

def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C, numPartitions: Int): RDD[(K, C)]

def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C, partitioner: Partitioner, mapSideCombine: Boolean = true, serializer: Serializer = null): RDD[(K, C)]

排序

def sortByKey(ascending: Boolean = true, numPartitions: Int = self.partitions.length): RDD[(K, V)]

def sortBy[K](f: (T) => K, ascending: Boolean = true, numPartitions: Int = this.partitions.length)(implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T]

重分区

def coalesce(numPartitions: Int, shuffle: Boolean = false, partitionCoalescer: Option[PartitionCoalescer] = Option.empty)

def repartition(numPartitions: Int)(implicit ord: Ordering[T] = null)

集合或者表操作

def intersection(other: RDD[T]): RDD[T]

def intersection(other: RDD[T], partitioner: Partitioner)(implicit ord: Ordering[T] = null): RDD[T]

def intersection(other: RDD[T], numPartitions: Int): RDD[T]

def subtract(other: RDD[T], numPartitions: Int): RDD[T]

def subtract(other: RDD[T], p: Partitioner)(implicit ord: Ordering[T] = null): RDD[T]

def subtractByKey[W: ClassTag](other: RDD[(K, W)]): RDD[(K, V)]

def subtractByKey[W: ClassTag](other: RDD[(K, W)], numPartitions: Int): RDD[(K, V)]

def subtractByKey[W: ClassTag](other: RDD[(K, W)], p: Partitioner): RDD[(K, V)]

def join[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (V, W))]

def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]

def join[W](other: RDD[(K, W)], numPartitions: Int): RDD[(K, (V, W))]

def leftOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (V, Option[W]))]

Spark会产生shuffle的算子的更多相关文章

spark中产生shuffle的算子
Spark中产生shuffle的算子作用算子名能否替换,由谁替换去重 distinct() 不能聚合 reduceByKey() groupByKey groupBy() groupByKe ...
spark性能调优（二）彻底解密spark的Hash Shuffle
装载:http://www.cnblogs.com/jcchoiling/p/6431969.html 引言 Spark HashShuffle 是它以前的版本,现在1.6x 版本默应是 Sort-B ...
spark教程(13)-shuffle介绍
shuffle 简介 shuffle 描述了数据从 map task 输出到 reduce task 输入的过程,shuffle 是连接 map 和 reduce 的桥梁: shuffle 性能的高低 ...
Spark中的各种action算子操作（java版）
在我看来,Spark编程中的action算子的作用就像一个触发器,用来触发之前的transformation算子.transformation操作具有懒加载的特性,你定义完操作之后并不会立即加载,只有 ...
Spark—RDD编程常用转换算子代码实例
Spark-RDD编程常用转换算子代码实例 Spark rdd 常用 Transformation 实例: 1.def map[U: ClassTag](f: T => U): RDD[U] ...
大数据入门第二十二天——spark（二）RDD算子（2）与spark其它特性
一.JdbcRDD与关系型数据库交互虽然略显鸡肋,但这里还是记录一下(点开JdbcRDD可以看到限制比较死,基本是鸡肋.但好在我们可以通过自定义的JdbcRDD来帮助我们完成与关系型数据库的交互.这 ...
大数据入门第二十二天——spark（二）RDD算子（1）
一.RDD概述 1.什么是RDD RDD(Resilient Distributed Dataset)叫做分布式数据集,是Spark中最基本的数据抽象,它代表一个不可变.可分区.里面的元素可并行计算的 ...
Spark性能调优-RDD算子调优篇（深度好文，面试常问，建议收藏）
RDD算子调优不废话,直接进入正题! 1. RDD复用在对RDD进行算子时,要避免相同的算子和计算逻辑之下对RDD进行重复的计算,如下图所示: 对上图中的RDD计算架构进行修改,得到如下图所示的优 ...
Spark(四)【RDD编程算子】
目录测试准备一.Value类型转换算子 map(func) mapPartitions(func) mapPartitions和map的区别 mapPartitionsWithIndex(func ...

随机推荐

salesforce 公开网页，免登陆
設定→カスタマイズ→コミュニテ→新建一个visfualforce的コミュニテビルド->カスタマイズ->コミュニティ->すべてのコミュニティ->新規コミュニティ然后选中新规的 ...
wpf 动态得到label的宽度（无刷新情况）
var l =newLabel(){Content="Hello"}; l.Measure(newSize(double.PositiveInfinity,double.Posit ...
LG4169 [Violet]天使玩偶/SJY摆棋子
题意 Ayu 在七年前曾经收到过一个天使玩偶,当时她把它当作时间囊埋在了地下.而七年后的今天,Ayu 却忘了她把天使玩偶埋在了哪里,所以她决定仅凭一点模糊的记忆来寻找它. 我们把 Ayu 生活的小镇 ...
Apache Accumulo
Apache Accumulo 是一个可靠的.可伸缩的.高性能的排序分布式的 Key-Value 存储解决方案, 基于单元访问控制以及可定制的服务器端处理.Accumulo使用 Google BigT ...
macOS -- 为什么XAMPP启动后输localhost跳转到http://localhost/dashboard？
在XAMPP环境下,当我们在地址栏输入'localhost'的时候,进入的不是htdocs根目录下,而是直接跳转到了http://localhost/dashboard?下. 这是因为在xamppfi ...
fail2ban的介绍
fail2ban的介绍 http://www.jb51.net/article/48591.htm http://lilinji.blog.51cto.com/5441000/1784726 fail ...
Request.UrlReferrer详解
使用前需要进行判断: if (Request != null && Request.UrlReferrer != null && Request.UrlReferrer ...
揭晓UX（用户体验）最大的秘密
我是佩恩和特勒的粉丝已经多年了.我第一次在现实中看到他们是在上个月,被他们的表演完全迷住了. 我真的很喜欢佩恩和特勒,他们经常“回拉窗帘”,并揭示他们是怎么完成他们的魔法.其他魔术师营造神秘主义和虚假 ...
高德js API根据出行方式和出现策略由起始点经纬度实现路线规划
<!doctype html> <html> <head> <meta charset="utf-8"> <meta http ...
dubbo相关的知识点总结
dubbo最近提交到了apache,成为了apache的孵化项目,又开始活跃起来了.就官方在git上面的说明文档和其他资料,学习总结以下dubbo的一些知识点. .The dubbo protocol ...

Spark会产生shuffle的算子

去重

聚合

排序

重分区

集合或者表操作

Spark会产生shuffle的算子的更多相关文章

随机推荐

热门专题