spark accumulator累加器

java

 /**

  * accumulator可以让多个task共同操作一份变量,主要进行多个节点对一个变量进行共享性的操作,accumulator只提供了累加的功能

  * 只有driver可以获取accumulator的值

  * @author Tele

  */

 public class AccumulatorDemo {

     private static SparkConf conf = new SparkConf().setMaster("local").setAppName("AccumulatorDemo");

     private static JavaSparkContext jsc = new JavaSparkContext(conf);

     public static void main(String[] args) {

         List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6);

         JavaRDD<Integer> rdd = jsc.parallelize(list);

         /*

          * Accumulator<Integer> accumulator = jsc.accumulator(10);

          *

          * rdd.foreach(new VoidFunction<Integer>() {

          *

          * private static final long serialVersionUID = 1L;

          *

          * @Override public void call(Integer t) throws Exception { accumulator.add(t);

          * } }); System.out.println(accumulator.value());

          */

         LongAccumulator la = new LongAccumulator();

         la.setValue(100L);

         jsc.sc().register(la, "数值累加器");

         rdd.foreach(new VoidFunction<Integer>() {

             private static final long serialVersionUID = 1L;

             @Override

             public void call(Integer t) throws Exception {

                 // 不能在算子内部获得accumulator.value()

                 la.add(t);

             }

         });

         System.out.println(la.value());

         jsc.close();

     }

 }

scala

 object AccumulatorDemo {

   def main(args: Array[String]): Unit = {

     val conf = new SparkConf().setMaster("local").setAppName("accumulator");

     val sc = new SparkContext(conf);

     val arr = Array(1, 2, 3, 4, 5);

     val rdd = sc.parallelize(arr, 1);

     val accumulator = new LongAccumulator;

     accumulator.add(100);

     sc.register(accumulator);

     rdd.foreach(accumulator.add(_));

     println(accumulator.value);

   }

 }

spark accumulator累加器的更多相关文章

spark.Accumulator
scala> val accum = sc.accumulator() accum: org.apache.spark.Accumulator[Int] = scala> sc.paral ...
Spark RDD概念学习系列之rdd持久化、广播、累加器（十八）
1.rdd持久化 2.广播 3.累加器 1.rdd持久化通过spark-shell,可以快速的验证我们的想法和操作! 启动hdfs集群 spark@SparkSingleNode:/usr/loca ...
【Spark篇】---Spark中广播变量和累加器
一.前述 Spark中因为算子中的真正逻辑是发送到Executor中去运行的,所以当Executor中需要引用外部变量时,需要使用广播变量. 累机器相当于统筹大变量,常用于计数,统计. 二.具体原理 ...
Spark共享变量(广播变量、累加器)
转载自:https://blog.csdn.net/Android_xue/article/details/79780463 Spark两种共享变量:广播变量(broadcast variable)与 ...
【Spark Java API】broadcast、accumulator
转载自:http://www.jianshu.com/p/082ef79c63c1 broadcast 官方文档描述: Broadcast a read-only variable to the cl ...
Spark累加器
spark累计器因为task的执行是在多个Executor中执行,所以会出现计算总量的时候,每个Executor只会计算部分数据,不能全局计算. 累计器是可以实现在全局中进行累加计数. 注意: 累加 ...
pyspark中使用累加器Accumulator统计指标
评价分类模型的性能时需要用到以下四个指标最开始使用以下代码计算,发现代码需要跑近一个小时,而且这一个小时都花在这四行代码上 # evaluate model TP = labelAndPreds.f ...
spark累加器、广播变量
一言以蔽之: 累加器就是只写变量通常就是做事件统计用的因为rdd是在不同的excutor去执行的你在不同excutor中累加的结果没办法汇总到一起这个时候就需要累加器来帮忙完成广播变量是只 ...
spark 变量使用 broadcast、accumulator
broadcast 官方文档描述: Broadcast a read-only variable to the cluster, returning a [[org.apache.spark.broa ...

随机推荐

LuceneIndexFileDeleter会保留初始的commit
给实时索引加入了merge策略,持续更新时发现有做merge,但索引文件夹中的段数远远大于RealTimeIndexWriter中的段数,就是有些merge的段应该删除,文件夹中没有删除.而关闭sea ...
js进阶 14-5 $.getScript()和$.getJSON()方法的作用是什么
js进阶 14-5 $.getScript()和$.getJSON()方法的作用是什么一.总结一句话总结:$.getScript()和$.getJSON()方法专门用来加载JS/JSON文件(远程 ...
CentOS 6 通过DVD快速建立本地YUM源
一.将DVD光盘放入RedHat/CentOS6.X服务器/电脑光驱中二.挂载DVD光驱到/mnt/cdrom # mkdir -p /mnt/cdrom # mount -t iso9660 /d ...
Eclipse手动配置svn
1.在Eclipse根目录下建一个任意文件夹(如plugin),在该文件夹下建一个以该插件名命名的文件夹(如SVN).2.将下载下的插件文件解压,plugins和features文件夹复制到该文件夹下 ...
nokia 5220 XpressMusic 自己刷机
看了半天各种论坛,是在不知道从哪里下手,所以自己写一篇自己刷机的新的.凤凰那个软件好像已经挂了,每次打开就是service is not authorized. 所以还是使用nokia自己的官方下载平 ...
【剑指offer】对面和相等的正方体
转载请注明出处:http://blog.csdn.net/ns_code/article/details/26509459 剑指offer上的全排列相关题目. 输入一个含有8个数字的数组.推断有么有可 ...
(转)curl 命令使用
原文地址:http://blog.sina.com.cn/s/blog_4b9eab320100slyw.html 可以看作命令行浏览器 1.开启gzip请求curl -I http://www.si ...
【redis，1】java操作redis：将string、list、map、自己定义的对象保存到redis中
一.操作string .list .map 对象 1.引入jar: jedis-2.1.0.jar 2.代码 /** * @param args */ public s ...
你说你会C++? —— 智能指针
智能指针的设计初衷是: C++中没有提供自己主动回收内存的机制,每次new对象之后都须要手动delete.稍不注意就memory leak. 智能指针能够解决上面遇到的问题. C++中常见的 ...
js进阶ajax基本用法（创建对象，连接服务器，发送请求，获取服务器传过来的数据）
js进阶ajax基本用法(创建对象,连接服务器,发送请求,获取服务器传过来的数据) 一.总结 1.ajax的浏览器的window对象的XMLHtmlRequest对象的两个重要方法:open(),se ...

spark accumulator累加器

spark accumulator累加器的更多相关文章

随机推荐

热门专题