[Spark][Python]DataFrame的左右连接例子

$ hdfs dfs -cat people.json

{"name":"Alice","pcode":"94304"}
{"name":"Brayden","age":30,"pcode":"94304"}
{"name":"Carla","age":19,"pcoe":"10036"}
{"name":"Diana","age":46}
{"name":"Etienne","pcode":"94104"}

$ hdfs dfs -cat pcodes.json

{"pcode":"10036","city":"New York","state":"NY"}
{"pcode":"87501","city":"Santa Fe","state":"NM"}
{"pcode":"94304","city":"Palo Alto","state":"CA"}
{"pcode":"94104","city":"San Francisco","state":"CA"}

$pyspark

sqlContext = HiveContext(sc)
peopleDF = sqlContext.read.json("people.json")
peopleDF.limit(5).show()

+----+-------+-----+-----+
| age| name|pcode| pcoe|
+----+-------+-----+-----+
|null| Alice|94304| null|
| 30|Brayden|94304| null|
| 19| Carla| null|10036|
| 46| Diana| null| null|
|null|Etienne|94104| null|
+----+-------+-----+-----+

sqlContext = HiveContext(sc)
pcodesDF = sqlContext.read.json("pcodes.json")
pcodesDF.limit(5).show()

+-------------+-----+-----+
| city|pcode|state|
+-------------+-----+-----+
| New York|10036| NY|
| Santa Fe|87501| NM|
| Palo Alto|94304| CA|
|San Francisco|94104| CA|
+-------------+-----+-----+

mydf000 = peopleDF.join(pcodesDF,"pcode")
mydf000.limit(5).show()

+-----+----+-------+----+-------------+-----+
|pcode| age| name|pcoe| city|state|
+-----+----+-------+----+-------------+-----+
|94304|null| Alice|null| Palo Alto| CA|
|94304| 30|Brayden|null| Palo Alto| CA|
|94104|null|Etienne|null|San Francisco| CA|
+-----+----+-------+----+-------------+-----+

mydf001=peopleDF.join(pcodesDF,"pcode","leftsemi")
mydf001.limit(5).show()

+-----+----+-------+----+
|pcode| age| name|pcoe|
+-----+----+-------+----+
|94304|null| Alice|null|
|94304| 30|Brayden|null|
|94104|null|Etienne|null|
+-----+----+-------+----+

mydf002=peopleDF.join(pcodesDF,"pcode","left_outer")
mydf002.limit(5).show()

+-----+----+-------+-----+-------------+-----+
|pcode| age| name| pcoe| city|state|
+-----+----+-------+-----+-------------+-----+
|94304|null| Alice| null| Palo Alto| CA|
|94304| 30|Brayden| null| Palo Alto| CA|
| null| 19| Carla|10036| null| null|
| null| 46| Diana| null| null| null|
|94104|null|Etienne| null|San Francisco| CA|
+-----+----+-------+-----+-------------+-----+

mydf003=peopleDF.join(pcodesDF,"pcode","right_outer")
mydf003.limit(5).show()

+-----+----+-------+----+-------------+-----+
|pcode| age| name|pcoe| city|state|
+-----+----+-------+----+-------------+-----+
|10036|null| null|null| New York| NY|
|87501|null| null|null| Santa Fe| NM|
|94304|null| Alice|null| Palo Alto| CA|
|94304| 30|Brayden|null| Palo Alto| CA|
|94104|null|Etienne|null|San Francisco| CA|
+-----+----+-------+----+-------------+-----+

[Spark][Python]DataFrame的左右连接例子的更多相关文章

  1. [Spark][Python][DataFrame][RDD]DataFrame中抽取RDD例子

    [Spark][Python][DataFrame][RDD]DataFrame中抽取RDD例子 sqlContext = HiveContext(sc) peopleDF = sqlContext. ...

  2. [Spark][Python][DataFrame][RDD]从DataFrame得到RDD的例子

    [Spark][Python][DataFrame][RDD]从DataFrame得到RDD的例子 $ hdfs dfs -cat people.json {"name":&quo ...

  3. [Spark][Python][DataFrame][Write]DataFrame写入的例子

    [Spark][Python][DataFrame][Write]DataFrame写入的例子 $ hdfs dfs -cat people.json {"name":" ...

  4. [Spark][Python][DataFrame][SQL]Spark对DataFrame直接执行SQL处理的例子

    [Spark][Python][DataFrame][SQL]Spark对DataFrame直接执行SQL处理的例子 $cat people.json {"name":" ...

  5. [Spark][Python]DataFrame where 操作例子

    [Spark][Python]DataFrame中取出有限个记录的例子 的 继续 [15]: myDF=peopleDF.where("age>21") In [16]: m ...

  6. [Spark][Python]DataFrame select 操作例子

    [Spark][Python]DataFrame中取出有限个记录的例子 的 继续 In [4]: peopleDF.select("age")Out[4]: DataFrame[a ...

  7. [Spark][Python]DataFrame中取出有限个记录的例子

    [Spark][Python]DataFrame中取出有限个记录的例子: sqlContext = HiveContext(sc) peopleDF = sqlContext.read.json(&q ...

  8. [Spark][Python]DataFrame select 操作例子II

    [Spark][Python]DataFrame中取出有限个记录的   继续 In [4]: peopleDF.select("age","name") In ...

  9. [Spark][Python][RDD][DataFrame]从 RDD 构造 DataFrame 例子

    [Spark][Python][RDD][DataFrame]从 RDD 构造 DataFrame 例子 from pyspark.sql.types import * schema = Struct ...

随机推荐

  1. Spring Boot(二):Web 综合开发

    详见:http://www.ityouknow.com/springboot/2016/02/03/spring-boot-web.html Web 开发 Spring Boot Web 开发非常的简 ...

  2. React 表单与事件

    一个简单是实例 在实例中我们设置了输入框 input 值value = {this.state.data}.在输入框值发生变化时我们可以更新 state.我们可以使用 onChange 事件来监听 i ...

  3. 洗礼灵魂,修炼python(29)--装饰器(1)—>利用经典案例解析装饰器概念

    前提必备 不急着进入正题,在前面函数作用域那一章介绍了闭包,全局变量局部变量,这里再看几个简单的闭包案例: 1):不带参数 注意: 1.这里的name属性是每个函数都有的,可以反馈函数名 2.temp ...

  4. 将mssql数据库高版本迁移到低版本

    将mssql数据库高版本迁移到低版本 在低版本目标数据库中创建目标空数据库[TargetDb] ,注意新建数据库即可,不要创建任何表 在低版本数据库中,选中[服务器对象=>链接服务器] 右键[新 ...

  5. jQuery 实现图片动画代码

    向下移动动画 $(".image").click(function(){ $(this).animate({height:'0px'}) }); <!doctype html ...

  6. iTween for Unity

    你曾经在你的游戏中制作过动画吗?问这个问题可能是愚蠢的,几乎每个Game都有动画,虽然有一些没有,但你必须处理有动画和没有动画.让我们结识 ITween. iTween 官方网站:http://itw ...

  7. npm install node-sass 本地安装失败

    $ npm install --save node-sass --registry=https://registry.npm.taobao.org --disturl=https://npm.taob ...

  8. windows server 2008额外域控提升为主域控

    windows server 2008额外域控提升为主域控 ---图形界面操作方法 https://blog.csdn.net/tladagio/article/details/79618338 wi ...

  9. CSRF 漏洞原理详解及防御方法

    跨站请求伪造:攻击者可以劫持其他用户进行的一些请求,利用用户身份进行恶意操作. 例如:请求http://x.com/del.php?id=1 是一个删除ID为1的账号,但是只有管理员才可以操作,如果攻 ...

  10. January 09th, 2018 Week 02nd Tuesday

    Use the smile to change the world. Don't let the world change your smile. 用你的笑容去改变这个世界,别让这个世界改变了你的笑容 ...