import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.VoidFunction;
import java.util.*; /**
* mapPartitions 算子
* 针对partition的操作,一次会处理一个partition的所有数据
*/
public class MapPartitionsOperator { public static void main(String[] args){
SparkConf conf = new SparkConf().setMaster("local").setAppName("mapPartitions");
JavaSparkContext sc = new JavaSparkContext(conf);
List<String> names = Arrays.asList("w1","w2","w3","w4");
JavaRDD<String> nameRdd = sc.parallelize(names,2); final Map<String,Integer> scoreMap = new HashMap<>();
scoreMap.put("w1",1);
scoreMap.put("w2",2);
scoreMap.put("w3",3);
scoreMap.put("w4",4); JavaRDD<Integer> result = nameRdd.mapPartitions(new FlatMapFunction<Iterator<String>, Integer>() {
private static final long serialVersionUID = 1L; @Override
public Iterator<Integer> call(Iterator<String> iterator) throws Exception{
List<Integer> list = new ArrayList<>();
while(iterator.hasNext()){
String name = iterator.next();
int score = scoreMap.get(name);
list.add(score);
}
return list.iterator();
}
}); result.foreach(new VoidFunction<Integer>() {
@Override
public void call(Integer integer) throws Exception {
System.err.println("mapPartitions算子:"+integer);
}
}); result.foreachPartition(new VoidFunction<Iterator<Integer>>() {
@Override
public void call(Iterator<Integer> integerIterator) throws Exception {
while (integerIterator.hasNext()){
System.err.println("mapPartitions算子遍历:"+integerIterator.next());
}
}
}); }
}

微信扫描下图二维码加入博主知识星球,获取更多大数据、人工智能、算法等免费学习资料哦!

java实现spark常用算子之mapPartitions的更多相关文章

  1. java实现spark常用算子之mapPartitionsWithIndex

    import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...

  2. java实现spark常用算子之Union

    import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...

  3. java实现spark常用算子之TakeSample

    import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...

  4. java实现spark常用算子之SaveAsTextFile

    import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...

  5. java实现spark常用算子之Repartitions

    import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...

  6. java实现spark常用算子之map

    import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...

  7. java实现spark常用算子之intersection

    import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...

  8. java实现spark常用算子之frist

    import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...

  9. java实现spark常用算子之flatmap

    import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...

随机推荐

  1. Linux上Python的安装升级

    1.下载 cd /usr/local/src/ wget https://www.python.org/ftp/python/3.5.1/Python-3.5.1.tgz 2.安装,在/usr/loc ...

  2. MyBatis 整合 Druid

    pom.xml 依赖 <?xml version="1.0" encoding="UTF-8"?> <project xmlns=" ...

  3. mysql 创建相同的表结构

    前言: 项目中用到分表存储,需要创建100张表,每个表的结构相同,原始操作,一个个复制粘贴,修改名字.今天DBA给了意见 create table a like b 将b的表结构和索引都复制  cre ...

  4. LC 954. Array of Doubled Pairs

    Given an array of integers A with even length, return true if and only if it is possible to reorder ...

  5. vue画图运用echarts

    <template> <div class="tubiao"> <div id="main" style="width: ...

  6. centos 7 删除 virbr0 虚拟网卡

    出现虚拟网卡是因为安装时启用了 libvirtd 服务后生成的关闭方法virsh net-list名称               状态     自动开始  持久------------------- ...

  7. kvm网络虚拟化(vlan,bond,vlan+bond)(3)

    一.Linux Bridge网桥管理 网络虚拟化是虚拟化技术中最复杂的部分,也是非常重要的资源. VM2 的虚拟网卡 vnet1 也连接到了 br0 上. 现在 VM1 和 VM2 之间可以通信,同时 ...

  8. 使用jdk自带工具jvisualvm 分析内存dump文件

    1.获取dump文件 使用 以下命令 创建 进程PID = 16231的 dump文件,命名为 order.hprof jmap -dump:format=b,file=order.hprof 162 ...

  9. Java ——注释 命名

    注释 1.类在每个类前面必须加上类注释,注释模板如下:/*** Copyright (C), 2006-2010, ChengDu Lovo info. Co., Ltd.* FileName: Te ...

  10. Vue实现点击时间获取时间段查询功能

    二话不说,先上图 实现如上代码: //获取本周第一天 showWeekFirstDay: function () { let Nowdate = new Date(); let WeekFirstDa ...