[Spark][Python]DataFrame的左右连接例子
[Spark][Python]DataFrame的左右连接例子
$ hdfs dfs -cat people.json
{"name":"Alice","pcode":"94304"}
{"name":"Brayden","age":30,"pcode":"94304"}
{"name":"Carla","age":19,"pcoe":"10036"}
{"name":"Diana","age":46}
{"name":"Etienne","pcode":"94104"}
$ hdfs dfs -cat pcodes.json
{"pcode":"10036","city":"New York","state":"NY"}
{"pcode":"87501","city":"Santa Fe","state":"NM"}
{"pcode":"94304","city":"Palo Alto","state":"CA"}
{"pcode":"94104","city":"San Francisco","state":"CA"}
$pyspark
sqlContext = HiveContext(sc)
peopleDF = sqlContext.read.json("people.json")
peopleDF.limit(5).show()
+----+-------+-----+-----+
| age| name|pcode| pcoe|
+----+-------+-----+-----+
|null| Alice|94304| null|
| 30|Brayden|94304| null|
| 19| Carla| null|10036|
| 46| Diana| null| null|
|null|Etienne|94104| null|
+----+-------+-----+-----+
sqlContext = HiveContext(sc)
pcodesDF = sqlContext.read.json("pcodes.json")
pcodesDF.limit(5).show()
+-------------+-----+-----+
| city|pcode|state|
+-------------+-----+-----+
| New York|10036| NY|
| Santa Fe|87501| NM|
| Palo Alto|94304| CA|
|San Francisco|94104| CA|
+-------------+-----+-----+
mydf000 = peopleDF.join(pcodesDF,"pcode")
mydf000.limit(5).show()
+-----+----+-------+----+-------------+-----+
|pcode| age| name|pcoe| city|state|
+-----+----+-------+----+-------------+-----+
|94304|null| Alice|null| Palo Alto| CA|
|94304| 30|Brayden|null| Palo Alto| CA|
|94104|null|Etienne|null|San Francisco| CA|
+-----+----+-------+----+-------------+-----+
mydf001=peopleDF.join(pcodesDF,"pcode","leftsemi")
mydf001.limit(5).show()
+-----+----+-------+----+
|pcode| age| name|pcoe|
+-----+----+-------+----+
|94304|null| Alice|null|
|94304| 30|Brayden|null|
|94104|null|Etienne|null|
+-----+----+-------+----+
mydf002=peopleDF.join(pcodesDF,"pcode","left_outer")
mydf002.limit(5).show()
+-----+----+-------+-----+-------------+-----+
|pcode| age| name| pcoe| city|state|
+-----+----+-------+-----+-------------+-----+
|94304|null| Alice| null| Palo Alto| CA|
|94304| 30|Brayden| null| Palo Alto| CA|
| null| 19| Carla|10036| null| null|
| null| 46| Diana| null| null| null|
|94104|null|Etienne| null|San Francisco| CA|
+-----+----+-------+-----+-------------+-----+
mydf003=peopleDF.join(pcodesDF,"pcode","right_outer")
mydf003.limit(5).show()
+-----+----+-------+----+-------------+-----+
|pcode| age| name|pcoe| city|state|
+-----+----+-------+----+-------------+-----+
|10036|null| null|null| New York| NY|
|87501|null| null|null| Santa Fe| NM|
|94304|null| Alice|null| Palo Alto| CA|
|94304| 30|Brayden|null| Palo Alto| CA|
|94104|null|Etienne|null|San Francisco| CA|
+-----+----+-------+----+-------------+-----+
[Spark][Python]DataFrame的左右连接例子的更多相关文章
- [Spark][Python][DataFrame][RDD]DataFrame中抽取RDD例子
[Spark][Python][DataFrame][RDD]DataFrame中抽取RDD例子 sqlContext = HiveContext(sc) peopleDF = sqlContext. ...
- [Spark][Python][DataFrame][RDD]从DataFrame得到RDD的例子
[Spark][Python][DataFrame][RDD]从DataFrame得到RDD的例子 $ hdfs dfs -cat people.json {"name":&quo ...
- [Spark][Python][DataFrame][Write]DataFrame写入的例子
[Spark][Python][DataFrame][Write]DataFrame写入的例子 $ hdfs dfs -cat people.json {"name":" ...
- [Spark][Python][DataFrame][SQL]Spark对DataFrame直接执行SQL处理的例子
[Spark][Python][DataFrame][SQL]Spark对DataFrame直接执行SQL处理的例子 $cat people.json {"name":" ...
- [Spark][Python]DataFrame where 操作例子
[Spark][Python]DataFrame中取出有限个记录的例子 的 继续 [15]: myDF=peopleDF.where("age>21") In [16]: m ...
- [Spark][Python]DataFrame select 操作例子
[Spark][Python]DataFrame中取出有限个记录的例子 的 继续 In [4]: peopleDF.select("age")Out[4]: DataFrame[a ...
- [Spark][Python]DataFrame中取出有限个记录的例子
[Spark][Python]DataFrame中取出有限个记录的例子: sqlContext = HiveContext(sc) peopleDF = sqlContext.read.json(&q ...
- [Spark][Python]DataFrame select 操作例子II
[Spark][Python]DataFrame中取出有限个记录的 继续 In [4]: peopleDF.select("age","name") In ...
- [Spark][Python][RDD][DataFrame]从 RDD 构造 DataFrame 例子
[Spark][Python][RDD][DataFrame]从 RDD 构造 DataFrame 例子 from pyspark.sql.types import * schema = Struct ...
随机推荐
- SQLServer 2005Windows验证如何改为混合模式验证
SQL Server 2005 Windows验证如何改为混合模式验证[摘] by:授客 QQ:1033553122 默认情况下,SQL Server 2005 Express是采用集成的Window ...
- Ubuntu16.04升级 Ubuntu18.04
1.更新资源 $ sudo apt-get update $ sudo apt-get upgrade $ sudo apt dist-upgrade 2.安装update-manager-core ...
- maven(八),阿里云国内镜像,提高jar包下载速度
镜像 maven默认会从中央仓库下载jar包,这个仓库在国外,而且全世界的人都会从这里下载,所以下载速度肯定是非常慢的.镜像就相当于是中央仓库的一个副本,内容和中央仓库完全一样,目前有不少国内镜像,其 ...
- python第四十八天--高级FTP
高级FTP服务器1. 用户加密认证2. 多用户同时登陆3. 每个用户有自己的家目录且只能访问自己的家目录4. 对用户进行磁盘配额.不同用户配额可不同5. 用户可以登陆server后,可切换目录6. 查 ...
- jQuery 实现图片动画代码
向下移动动画 $(".image").click(function(){ $(this).animate({height:'0px'}) }); <!doctype html ...
- CentOS 7下systemd是如何stop mysql服务的
[背景] 有同事在研究mongo的服务启动方式,讨论到mysql5.7的服务管理时一起做了下面测试. MySQL5.7是用systemd来管理service的,它的配置文件/usr/lib/sys ...
- 如何使用C语言的面向对象
我们都知道,C++才是面向对象的语言,但是C语言是否能使用面向对象的功能? (1)继承性 typedef struct _parent { int data_parent; }Parent; type ...
- YUM仓库服务与PXE网络装机
1.yum:基于RPM包构建软件更新机制自动解决依赖关系,软件包由软件包库提供 提供方式:ftp服务:ftp://IP地址/仓库目录 Http服务:http :// IP地址/仓库目录 本地目录:f ...
- Linux初学 - Centos7忘记root密码的解决办法
开机进入启动界面后,要按照屏幕的下方的操作提示迅速按下“e”键. 按下“e”键后即来到启动文件界面,这时按键盘上面的方向键“下”,一直到文件底部,在"LANG=zh_cn.UTF-8&quo ...
- 记录:一个SQL SERVER奇怪的问题。
今天遇到了一个奇怪的问题.始终没搞清楚是怎么回事.先记一下 1.首先有张表a,包含字段 编号.日期(varchar(250)),数值 发生日期字段有非正常日期字符串,有NULL,空字符串,可能是误触键 ...