Spark2 Dataset之collect_set与collect_list
collect_set去除重复元素;collect_list不去除重复元素
select gender,
concat_ws(',', collect_set(children)),
concat_ws(',', collect_list(children))
from Affairs
group by gender
// 创建视图
data.createOrReplaceTempView("Affairs") val df3= spark.sql("select gender,concat_ws(',',collect_set(children)),concat_ws(',',collect_list(children)) from Affairs group by gender")
df3: org.apache.spark.sql.DataFrame = [gender: string, concat_ws(,, collect_set(children)): string ... 1 more field] df3.show // collect_set去除重复元素;collect_list不去除重复元素
+------+-----------------------------------+------------------------------------+
|gender|concat_ws(,, collect_set(children))|concat_ws(,, collect_list(children))|
+------+-----------------------------------+------------------------------------+
|female| no,yes| no,yes,no,no,yes|
| male| no,yes| no,yes,no,yes,no|
+------+-----------------------------------+------------------------------------+
Spark2 Dataset之collect_set与collect_list的更多相关文章
- Spark2 DataSet 创建新行之flatMap
val dfList = List(("Hadoop", "Java,SQL,Hive,HBase,MySQL"), ("Spark", & ...
- Spark2 Dataset行列操作和执行计划
Dataset是一个强类型的特定领域的对象,这种对象可以函数式或者关系操作并行地转换.每个Dataset也有一个被称为一个DataFrame的类型化视图,这种DataFrame是Row类型的Datas ...
- Spark2 Dataset DataFrame空值null,NaN判断和处理
import org.apache.spark.sql.SparkSession import org.apache.spark.sql.Dataset import org.apache.spark ...
- Spark2 Dataset分析函数--排名函数row_number,rank,dense_rank,percent_rank
select gender, age, row_number() over(partition by gender order by age) as rowNumber, ...
- Spark2 Dataset多维度统计cube与rollup
val df6 = spark.sql("select gender,children,max(age),avg(age),count(age) from Affairs group by ...
- Spark2 Dataset统计指标:mean均值,variance方差,stddev标准差,corr(Pearson相关系数),skewness偏度,kurtosis峰度
val df4=spark.sql("SELECT mean(age),variance(age),stddev(age),corr(age,yearsmarried),skewness(a ...
- Spark2 Dataset之视图与SQL
// 创建视图 data.createOrReplaceTempView("Affairs") val df1 = spark.sql("SELECT * FROM Af ...
- Spark2 Dataset聚合操作
data.groupBy("gender").agg(count($"age"),max($"age").as("maxAge&q ...
- Spark2 Dataset去重、差集、交集
import org.apache.spark.sql.functions._ // 对整个DataFrame的数据去重 data.distinct() data.dropDuplicates() / ...
随机推荐
- 如何用BarTender将日期变量和序列号变量放一起打印成条码?
刚接触BarTender 2016的小伙伴们可能对条码的数据源还不太搞的定,例如有时需要将日期变量和序列号变量放一起打印成条码,那如何简单达到目的呢?下面,小编教大家解决这一问题的三大步骤. 1.在B ...
- springMVC中如何访问WebContent中的资源文件
一.问题: 我的工程目录如下: WebContent |-css |-js |-imgs |-META-INF |-WEB-INF |-jsp |-login.jsp 如何在login.jsp中引用i ...
- go 类型转换
https://studygolang.com/articles/3400 https://studygolang.com/articles/6633
- windows 7 64位出现Oracle中文乱码
提示oracle客户端无法连接指定字符 安装好客户端之后,如图 将数据库dbhome_1中的network文件夹全部复制到客户端,如图 然后在设置环境变量:F:\app\Administrator\p ...
- Simply Syntax(思维)
Simply Syntax Time Limit: 1000MS Memory Limit: 10000K Total Submissions: 5551 Accepted: 2481 Des ...
- feed流拉取,读扩散,究竟是啥?
from:https://mp.weixin.qq.com/s?__biz=MjM5ODYxMDA5OQ==&mid=2651961214&idx=1&sn=5e80ad6f2 ...
- Android基础部分再学习---activity的状态保存
主要是bundle 这个參数 參考地址:http://blog.csdn.net/lonelyroamer/article/details/18715975 学习Activity的生命周期,我们知 ...
- 【Android】录音暂停和继续
https://www.2cto.com/kf/201410/347839.html http://blog.csdn.net/wanli_smile/article/details/7715030 ...
- System.exit(0)会跳过finally块的执行
public class test { public static void main(String[] args) { try { System.exit(0); System.out.printl ...
- [NodeJS] Node.js 编码转换
Node.js 自带的 toString() 方法不支持 gbk,因此中文转换的时候需要加载第三方库,推荐以下两个编码转换库,iconv-lite 和 encoding. iconv, iconv-l ...