java实现spark常用算子之coalesce
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.VoidFunction; import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List; /**
* coalesce 算子: 将N个分区 合并为 N-M个分区
* 分区合并(减少),在filter后使用效果更佳,可以有效避免数据倾斜问题
*
*/
public class CoalesceOperator {
public static void main(String[] args){
SparkConf conf = new SparkConf().setMaster("local").setAppName("coalesce");
JavaSparkContext sc = new JavaSparkContext(conf);
List<String> names = Arrays.asList("w1","w2","w3","w4","w5"); JavaRDD<String> nameRdd = sc.parallelize(names,4); JavaRDD<String> namefristRdd = nameRdd.mapPartitionsWithIndex(new Function2<Integer, Iterator<String>, Iterator<String>>() {
@Override
public Iterator<String> call(Integer index, Iterator<String> iterator) throws Exception {
List<String> list = new ArrayList<>();
while (iterator.hasNext()){
list.add("1["+index+"]:"+iterator.next());
}
return list.iterator();
}
},true); // 将 4 个partition减少为2个partition
JavaRDD<String> tempRdd = namefristRdd.coalesce(2); JavaRDD<String> result = tempRdd.mapPartitionsWithIndex(new Function2<Integer, Iterator<String>, Iterator<String>>() {
@Override
public Iterator<String> call(Integer index, Iterator<String> iterator) throws Exception {
List<String> list = new ArrayList<>();
while (iterator.hasNext()){
list.add("2["+index+"]:"+iterator.next());
}
return list.iterator();
}
},false); result.foreach(new VoidFunction<String>() {
@Override
public void call(String s) throws Exception {
System.err.println(s);
}
}); }
} 微信扫描下图二维码加入博主知识星球,获取更多大数据、人工智能、算法等免费学习资料哦!

java实现spark常用算子之coalesce的更多相关文章
- java实现spark常用算子之Union
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之TakeSample
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之SaveAsTextFile
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之Repartitions
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之mapPartitionsWithIndex
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之map
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之intersection
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之frist
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之flatmap
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
随机推荐
- Unity3D_(网格导航)简单物体自动寻路
NavMesh(导航网络)是3D游戏世界中用于实现动态物体自动寻路的一种技术,它将游戏场景中复杂的结构组织关系简化为带有一定信息的网格,进而在这些网格的基础上通过一系列的计算来实现自动寻路. 实现Ca ...
- Java使用FileOutputStream写入文件
From: http://beginnersbook.com/2014/01/how-to-write-to-a-file-in-java-using-fileoutputstream/ /* 使用F ...
- Java密码处理
- 32位下操作mongodb心得
本文出处:http://blog.csdn.net/chaijunkun/article/details/7236911,转载请注明. 随着互联网的变革,互联网的内容生成方式也逐渐地从网站生成转为用户 ...
- CppCheck介绍与使用
版权声明:本文为博主原创文章,未经博主允许不得转载. https://blog.csdn.net/u011012932/article/details/52778149 简述 Cppcheck 是一种 ...
- SQLite 数据类型与C#数据类型对应表
SQLite 数据类型 C# 数据类型 BIGINT Int64 BIGUINT UInt64 BINARY Binary BIT Boolean 首选 BLOB Binary ...
- [论文理解] Spatial Transformer Networks
Spatial Transformer Networks 简介 本文提出了能够学习feature仿射变换的一种结构,并且该结构不需要给其他额外的监督信息,网络自己就能学习到对预测结果有用的仿射变换.因 ...
- Vs code工具汉化
官网为:https://code.visualstudio.com/ 看到中间有一些提示的命令 选择第一条,即Ctrl+shift+P,弹出命令行,选择"Configure Display ...
- Ubuntu13.04编译安装cmake2.8.12.2
前提: 安装过程需要gcc和gcc-c++.ubuntu13.04桌面版自带gcc4.7,apt-get install g++4.7安装g++./usr/bin目录下有x86_64-linux-gn ...
- python 元组和数组
参考:https://stackoverflow.com/questions/1708510/list-vs-tuple-when-to-use-each tuple(元组):不可变,不能添加.删除. ...