java实现spark常用算子之join
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.VoidFunction;
import scala.Tuple2; import java.util.Arrays;
import java.util.List; /**
* join(otherDataSet,[numTasks]) 算子:
* 同样的也是按照key将两个RDD中进行汇总操作,会对每个key所对应的两个RDD中的数据进行笛卡尔积计算。
*
*按照key进行分类汇总,并且做笛卡尔积
*/
public class JoinOperator { public static void main(String[] args) {
SparkConf conf = new SparkConf().setMaster("local").setAppName("join");
JavaSparkContext sc = new JavaSparkContext(conf);
List<Tuple2<String,String>> stus = Arrays.asList(
new Tuple2<>("w1","1"),
new Tuple2<>("w2","2"),
new Tuple2<>("w3","3"),
new Tuple2<>("w2","22"),
new Tuple2<>("w1","11")
);
List<Tuple2<String,String>> scores = Arrays.asList(
new Tuple2<>("w1","a1"),
new Tuple2<>("w2","a2"),
new Tuple2<>("w2","a22"),
new Tuple2<>("w1","a11"),
new Tuple2<>("w3","a3")
); JavaPairRDD<String,String> stusRdd = sc.parallelizePairs(stus);
JavaPairRDD<String,String> scoresRdd = sc.parallelizePairs(scores);
JavaPairRDD<String,Tuple2<String,String>> result = stusRdd.join(scoresRdd); result.foreach(new VoidFunction<Tuple2<String, Tuple2<String, String>>>() {
@Override
public void call(Tuple2<String, Tuple2<String, String>> tuple) throws Exception {
System.err.println(tuple._1+":"+tuple._2);
}
}); }
}
微信扫描下图二维码加入博主知识星球,获取更多大数据、人工智能、算法等免费学习资料哦!

java实现spark常用算子之join的更多相关文章
- java实现spark常用算子之Union
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之TakeSample
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之SaveAsTextFile
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之Repartitions
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之mapPartitionsWithIndex
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之map
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之intersection
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之frist
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
- java实现spark常用算子之flatmap
import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.a ...
随机推荐
- 【Java面试宝典】JavaSE_2.1_Java基础● 请你说说Java和PHP的区别?
文章目录 ①eechen的回答 - 知乎 ②h4cd-开源中国 ③乔·沃特金斯-Musings, ninja ones-思考,忍者 什么是准时制? 为什么PHP需要JIT? JIT可以使我的网站更快吗 ...
- 设置Chart.js默认显示Point点的值不用鼠标经过才显示
Chart.js默认的显示方式是鼠标经过Point点的时候才会显示这个点的值,代码如下: var testdata: { periodNum: ["2018121","2 ...
- 做一个把网页带出来的SpringBoot工程
JDK:1.8.0_212 IDE:STS4(Spring Tool Suit4 Version: 4.3.2.RELEASE) 工程下载:https://files.cnblogs.com/file ...
- LC 677. Map Sum Pairs
Implement a MapSum class with insert, and sum methods. For the method insert, you'll be given a pair ...
- awk中begin/end的含义
BEGIN中的内容是在awk开始扫描输入之前执行,一般用来初始化或设置全局变量: 而END之后的操作将在扫描完全部的输入之后执行.
- UBT框架加解密工具项目 UBT.Framework.Encryption
DESEncrypt.cs //==================================================================================== ...
- HTML Img标签 src为网络地址无法显示图片问题解决(https)
举例说明: <img src="https://pic.cnblogs.com/avatar/1549846/20191126100502.png" alt="加载 ...
- 【HANA系列】SAP HANA SQL获取某字符串的位置
公众号:SAP Technical 本文作者:matinal 原文出处:http://www.cnblogs.com/SAPmatinal/ 原文链接:[HANA系列]SAP HANA SQL获取某字 ...
- Django 邮箱找回密码!!!!!!!!!!!!!!!!
1.大概流程. @首先在完善登陆页面,增加忘记密码的链接. @为了账户安全,需要对操作者进行验证,向邮箱发随机数验证! @在重置验证码页面,验证验证码是否匹配(验证成功跳转至更改密码也页面). @ 重 ...
- AE调用GP工具(创建缓冲区和相交为例)
引用 Geoprocessing是ArcGIS提供的一个非常实用的工具,借由Geoprocessing工具可以方便的调用ArcToolBox中提供的各类工具,本文在ArcEngine9.2平台环境下总 ...