[Spark][Python]DataFrame select 操作例子
[Spark][Python]DataFrame中取出有限个记录的例子
的 继续
In [4]: peopleDF.select("age")
Out[4]: DataFrame[age: bigint]
In [5]: myDF=people.select("age")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-5-b5b723b62a49> in <module>()
----> 1 myDF=people.select("age")
NameError: name 'people' is not defined
In [6]: myDF=peopleDF.select("age")
In [7]: myDF.take(3)
17/10/05 05:13:02 INFO storage.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 230.1 KB, free 871.7 KB)
17/10/05 05:13:02 INFO storage.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 21.4 KB, free 893.1 KB)
17/10/05 05:13:02 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on localhost:55073 (size: 21.4 KB, free: 208.7 MB)
17/10/05 05:13:02 INFO spark.SparkContext: Created broadcast 5 from take at <ipython-input-7-745486715568>:1
17/10/05 05:13:02 INFO storage.MemoryStore: Block broadcast_6 stored as values in memory (estimated size 251.1 KB, free 1144.2 KB)
17/10/05 05:13:02 INFO storage.MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 21.6 KB, free 1165.8 KB)
17/10/05 05:13:02 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on localhost:55073 (size: 21.6 KB, free: 208.7 MB)
17/10/05 05:13:02 INFO spark.SparkContext: Created broadcast 6 from take at <ipython-input-7-745486715568>:1
17/10/05 05:13:03 INFO mapred.FileInputFormat: Total input paths to process : 1
17/10/05 05:13:03 INFO spark.SparkContext: Starting job: take at <ipython-input-7-745486715568>:1
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Got job 2 (take at <ipython-input-7-745486715568>:1) with 1 output partitions
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (take at <ipython-input-7-745486715568>:1)
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Parents of final stage: List()
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Missing parents: List()
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[14] at take at <ipython-input-7-745486715568>:1), which has no missing parents
17/10/05 05:13:03 INFO storage.MemoryStore: Block broadcast_7 stored as values in memory (estimated size 4.3 KB, free 1170.2 KB)
17/10/05 05:13:03 INFO storage.MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 2.5 KB, free 1172.6 KB)
17/10/05 05:13:03 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on localhost:55073 (size: 2.5 KB, free: 208.7 MB)
17/10/05 05:13:03 INFO spark.SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1006
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[14] at take at <ipython-input-7-745486715568>:1)
17/10/05 05:13:03 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
17/10/05 05:13:03 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, partition 0,PROCESS_LOCAL, 2149 bytes)
17/10/05 05:13:03 INFO executor.Executor: Running task 0.0 in stage 2.0 (TID 2)
17/10/05 05:13:03 INFO rdd.HadoopRDD: Input split: hdfs://localhost:8020/user/training/people.json:0+179
17/10/05 05:13:03 INFO codegen.GenerateUnsafeProjection: Code generated in 113.719806 ms
17/10/05 05:13:03 INFO executor.Executor: Finished task 0.0 in stage 2.0 (TID 2). 2235 bytes result sent to driver
17/10/05 05:13:03 INFO scheduler.DAGScheduler: ResultStage 2 (take at <ipython-input-7-745486715568>:1) finished in 0.493 s
17/10/05 05:13:03 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 487 ms on localhost (1/1)
17/10/05 05:13:03 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Job 2 finished: take at <ipython-input-7-745486715568>:1, took 0.737231 s
Out[7]: [Row(age=None), Row(age=30), Row(age=19)]
In [8]:
[Spark][Python]DataFrame select 操作例子的更多相关文章
- [Spark][Python]DataFrame select 操作例子II
[Spark][Python]DataFrame中取出有限个记录的 继续 In [4]: peopleDF.select("age","name") In ...
- [Spark][Python]DataFrame where 操作例子
[Spark][Python]DataFrame中取出有限个记录的例子 的 继续 [15]: myDF=peopleDF.where("age>21") In [16]: m ...
- [Spark][Python]RDD flatMap 操作例子
RDD flatMap 操作例子: flatMap,对原RDD的每个元素(行)执行函数操作,然后把每行都“拍扁” [training@localhost ~]$ hdfs dfs -put cats. ...
- [Spark][Python][DataFrame][SQL]Spark对DataFrame直接执行SQL处理的例子
[Spark][Python][DataFrame][SQL]Spark对DataFrame直接执行SQL处理的例子 $cat people.json {"name":" ...
- [Spark][Python][DataFrame][RDD]DataFrame中抽取RDD例子
[Spark][Python][DataFrame][RDD]DataFrame中抽取RDD例子 sqlContext = HiveContext(sc) peopleDF = sqlContext. ...
- [Spark][Python][DataFrame][RDD]从DataFrame得到RDD的例子
[Spark][Python][DataFrame][RDD]从DataFrame得到RDD的例子 $ hdfs dfs -cat people.json {"name":&quo ...
- [Spark][Python][DataFrame][Write]DataFrame写入的例子
[Spark][Python][DataFrame][Write]DataFrame写入的例子 $ hdfs dfs -cat people.json {"name":" ...
- [Spark][Python]DataFrame的左右连接例子
[Spark][Python]DataFrame的左右连接例子 $ hdfs dfs -cat people.json {"name":"Alice",&quo ...
- [Spark][Python]DataFrame中取出有限个记录的例子
[Spark][Python]DataFrame中取出有限个记录的例子: sqlContext = HiveContext(sc) peopleDF = sqlContext.read.json(&q ...
随机推荐
- Python:BeautifulSoup移除某些不需要的属性
很久之前,我看到一个问题,大概是: 他爬了一段html,他获取下了所需的部分(img标签部分),但是不想保留img标签的某些属性, 比如 <img width="147" h ...
- matlab练习程序(Levenberg-Marquardt法最优化)
上一篇博客中介绍的高斯牛顿算法可能会有J'*J为奇异矩阵的情况,这时高斯牛顿法稳定性较差,可能导致算法不收敛.比如当系数都为7或更大的时候,算法无法给出正确的结果. Levenberg-Marquar ...
- [20171120]11g select for update skip locked.txt
[20171120]11g select for update skip locked.txt --//11G在select for update遇到阻塞时可以通过skipped locked跳过阻塞 ...
- oracle中给某个用户某张表的权限设置
今天碰到需要给数据库上某一个用户,开通其中2张表的查询权限,方法如下: grant select on bas_checkcycle to jdc;这个是整个语句. 语句分析: grant selec ...
- 选择is或者as操作符而不是做强制类型转换
无论何时,正确选择使用as运算符进行类型转换.比盲目的强制类型转换更安全,而且在运行时效率更高. 用as和is进行转换时,并不是对所有用户定义的类型都能完成,只是在运行时类型和目标类型匹配时,转换才能 ...
- domain or business logic
Here are a few of the questions you should ask when writing business logic: ¡Do you fully understand ...
- DFS普及组常用模板简单整理
一些普及组会用到的DFS模板,其他的DFS我感觉普及组不会用到所以暂且搁着,等之后有时间了再细写w (至于我为什么最近不写TG相关只写最基础的PJ的内容,请戳这里了解) dfs各种模板big集合 1. ...
- Axure RP Pro7.0的key注册码加汉化非破解
上次我们刚分享过Axure RP Pro6.5 key注册码加汉化非破解,我还要分享一个Axure RP Pro7.0的key注册码加汉化,非破解哦. 当然方法还是不变,先用下面的密钥激活.用户名就是 ...
- Android开发学习笔记(二)——编译和运行原理(1)
http://www.cnblogs.com/Pickuper/archive/2011/06/14/2078969.html 接着上一篇的内容,继续从全局了解Android.在清楚了Android的 ...
- MyCat不适用场景(使用时避免)
1.非分片字段查询 Mycat中的路由结果是通过分片字段和分片方法来确定的.例如下图中的一个Mycat分库方案: · 根据 tt_waybill 表的 id 字段来进行分片 · ...