sparkSQL中的example学习(3)

UserDefinedTypedAggregation.scala(用户可自定义类型)



import org.apache.spark.sql.expressions.Aggregator

import org.apache.spark.sql.{Encoder, Encoders, SparkSession}

object UserDefinedTypedAggregation {

 case class Employee(name: String, salary: Long)

 case class Average(var sum: Long, var count: Long)

 object MyAverage extends Aggregator[Employee, Average, Double] {

  //A zero value for this aggregation. Should satisfy the property that any b + zero = b

  def zero: Average = Average(0L, 0L)

  //Commine two values to produce a new value. For performance, the function may modify `buffer`

  //and return it instead of constructiong a new object

  def reduce(buffer: Average, employee: Employee): Average = {

   buffer.sum += employee.salary

   buffer.count += 1

   buffer

  }

  //Merge two intermediate values

  def merge(b1: Average, b2: Average): Average = {

   b1.sum += b2.sum

   b1.count += b2.count

   b1

  }

  //Transform the ouput of the reduction

  def finish(reducetion: Average): Double = reducetion.sum.toDouble / reducetion.count

  //Specifies the Encoder for the intermediate value type

  def bufferEncoder: Encoder[Average] = Encoders.product

  //Specifies the Encoder for the final output value type

  def outputEncoder: Encoder[Double] = Encoders.scalaDouble

 }

// $example off: type_custom_aggregation$

 def main(args: Array[String]): Unit = {

  val spark = SparkSession

    .builder()

    .appName("Spark SQL user-defined Datasets aggregation example")

    .master("local")

    .getOrCreate()

  import spark.implicits._

  val ds = spark.read.json("/Users/hadoop/app/spark/examples/src/main/resources/employees.json").as[Employee]

  ds.show()

  val averageSalary = MyAverage.toColumn.name("average_salary")

  val result = ds.select(averageSalary)

  result.show()

  spark.stop()

 }

}

sparkSQL中的example学习(3)的更多相关文章

sparkSQL中的example学习(1)
SparkSQLDemo.scala import org.apache.spark.sql.{Row, SparkSession} import org.apache.spark.sql.types ...
sparkSQL中的example学习(2)
UserDefinedUntypedAggregate.scala(默认返回类型为空,不能更改) import org.apache.spark.sql.{Row, SparkSession} imp ...
PHP中的Libevent学习
wangbin@2012,1,3 目录 Libevent在php中的应用学习 1. Libevent介绍 2. 为什么要学习libevent 3. Php libeven ...
JS中childNodes深入学习
原文:JS中childNodes深入学习 <html xmlns="http://www.w3.org/1999/xhtml"> <head> <ti ...
CNCC2017中的深度学习与跨媒体智能
CNCC2017中的深度学习与跨媒体智能转载请注明作者:梦里茶目录机器学习与跨媒体智能传统方法与深度学习图像分割小数据集下的深度学习语音前沿技术生成模型基于贝叶斯的视觉信息编解码珠 ...
【Spark篇】---SparkSQL中自定义UDF和UDAF，开窗函数的应用
一.前述 SparkSQL中的UDF相当于是1进1出,UDAF相当于是多进一出,类似于聚合函数. 开窗函数一般分组取topn时常用. 二.UDF和UDAF函数 1.UDF函数 java代码: Spar ...
图解BERT（NLP中的迁移学习）
目录一.例子:句子分类二.模型架构模型的输入模型的输出三.与卷积网络并行四.嵌入表示的新时代回顾一下词嵌入 ELMo: 语境的重要性五.ULM-FiT:搞懂NLP中的迁移学习六.Tr ...
python中confIgparser模块学习
python中configparser模块学习 ConfigParser模块在python中用来读取配置文件,配置文件的格式跟windows下的ini配置文件相似,可以包含一个或多个节(section ...
Scala中的类学习
Scala中的类学习从java了解类的情况下,了解Scala的类并不难.Scala类中的字段自动带getter和setter方法,用@BeanProperty注解生成javaBean对象的getXX ...

随机推荐

PyCharm重命名文件时更改引用的地方
Shift + F6 在文件夹直接更改文件名称时,其它文件里有调用这个模块的话,名称是不会改变的,只会报错,显示找不到这个模块,这时,可以在pycharm里直接更改右键你需要改名的文件,选择Refa ...
渗透测试学习二十三、常见cms拿shell
常见cms 良精.科讯.动易.aspcms.dz 米拓cms.phpcms2008.帝国cms.phpv9 phpweb.dedecms 良精方法: 1.数据库备份拿shell 上传图片——点击数据 ...
MVC 、MTV 模式
著名的MVC模式:所谓MVC就是把web应用分为模型(M),控制器(C),视图(V)三层:他们之间以一种插件似的,松耦合的方式连接在一起. 模型负责业务对象与数据库的对象(ORM),视图负责与用户的交 ...
AtCoder Beginner Contest 145
传送门 A - Circle 签到. B - Echo 签到到. C - Average Length 要卡下精度,可用二分或者long double来搞. Code /* * Author: hey ...
Pwn-TestYourMemory
题目地址 https://dn.jarvisoj.com/challengefiles/memory.838286edf4b832fd482d58ff1c217561 32位的程序,有NX保护,拖到I ...
strcpy&memcpy&memmove
strcpy extern char *strcpy(char *dest,char *source); { assert((dest!=NULL)&&(source!=NULL)); ...
Session中短信验证码设置有效时间
Session中短信验证码设置有效时间 package com.mozq.boot.kuayu01.controller; import org.springframework.web.bind.an ...
angular6.x 引入echarts
因为angular2+ 使用 ==typescript==开发,所以想要使用echarts,必须安装echarts针对angular的插件ngx-echarts.本文案列实际效果如上图. 安装ngx- ...
echarts使用------地图生成----省市地图的生成及其他相关细节调整
为使用多种业务场景,百度echarts地图示例只有中国地图,那么在使用省市地图的时候,就需要我们使用省市的地图数据了以下为陕西西安市的地图示例: 此页面引用echarts的js:http://ech ...
44 dlib鼠标指定目标跟踪
dlib提供了dlib.correlation_tracker()类用于跟踪目标.官方文档入口:http://dlib.net/python/index.html#dlib.correlation_t ...

sparkSQL中的example学习(3)

sparkSQL中的example学习(3)的更多相关文章

随机推荐

热门专题