[Spark][Python]DataFrame中取出有限个记录的例子

的 继续

In [4]: peopleDF.select("age")
Out[4]: DataFrame[age: bigint]

In [5]: myDF=people.select("age")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-5-b5b723b62a49> in <module>()
----> 1 myDF=people.select("age")

NameError: name 'people' is not defined

In [6]: myDF=peopleDF.select("age")

In [7]: myDF.take(3)
17/10/05 05:13:02 INFO storage.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 230.1 KB, free 871.7 KB)
17/10/05 05:13:02 INFO storage.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 21.4 KB, free 893.1 KB)
17/10/05 05:13:02 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on localhost:55073 (size: 21.4 KB, free: 208.7 MB)
17/10/05 05:13:02 INFO spark.SparkContext: Created broadcast 5 from take at <ipython-input-7-745486715568>:1
17/10/05 05:13:02 INFO storage.MemoryStore: Block broadcast_6 stored as values in memory (estimated size 251.1 KB, free 1144.2 KB)
17/10/05 05:13:02 INFO storage.MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 21.6 KB, free 1165.8 KB)
17/10/05 05:13:02 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on localhost:55073 (size: 21.6 KB, free: 208.7 MB)
17/10/05 05:13:02 INFO spark.SparkContext: Created broadcast 6 from take at <ipython-input-7-745486715568>:1
17/10/05 05:13:03 INFO mapred.FileInputFormat: Total input paths to process : 1
17/10/05 05:13:03 INFO spark.SparkContext: Starting job: take at <ipython-input-7-745486715568>:1
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Got job 2 (take at <ipython-input-7-745486715568>:1) with 1 output partitions
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (take at <ipython-input-7-745486715568>:1)
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Parents of final stage: List()
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Missing parents: List()
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[14] at take at <ipython-input-7-745486715568>:1), which has no missing parents
17/10/05 05:13:03 INFO storage.MemoryStore: Block broadcast_7 stored as values in memory (estimated size 4.3 KB, free 1170.2 KB)
17/10/05 05:13:03 INFO storage.MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 2.5 KB, free 1172.6 KB)
17/10/05 05:13:03 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on localhost:55073 (size: 2.5 KB, free: 208.7 MB)
17/10/05 05:13:03 INFO spark.SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1006
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[14] at take at <ipython-input-7-745486715568>:1)
17/10/05 05:13:03 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
17/10/05 05:13:03 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, partition 0,PROCESS_LOCAL, 2149 bytes)
17/10/05 05:13:03 INFO executor.Executor: Running task 0.0 in stage 2.0 (TID 2)
17/10/05 05:13:03 INFO rdd.HadoopRDD: Input split: hdfs://localhost:8020/user/training/people.json:0+179
17/10/05 05:13:03 INFO codegen.GenerateUnsafeProjection: Code generated in 113.719806 ms
17/10/05 05:13:03 INFO executor.Executor: Finished task 0.0 in stage 2.0 (TID 2). 2235 bytes result sent to driver
17/10/05 05:13:03 INFO scheduler.DAGScheduler: ResultStage 2 (take at <ipython-input-7-745486715568>:1) finished in 0.493 s
17/10/05 05:13:03 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 487 ms on localhost (1/1)
17/10/05 05:13:03 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
17/10/05 05:13:03 INFO scheduler.DAGScheduler: Job 2 finished: take at <ipython-input-7-745486715568>:1, took 0.737231 s
Out[7]: [Row(age=None), Row(age=30), Row(age=19)]

In [8]:

[Spark][Python]DataFrame select 操作例子的更多相关文章

  1. [Spark][Python]DataFrame select 操作例子II

    [Spark][Python]DataFrame中取出有限个记录的   继续 In [4]: peopleDF.select("age","name") In ...

  2. [Spark][Python]DataFrame where 操作例子

    [Spark][Python]DataFrame中取出有限个记录的例子 的 继续 [15]: myDF=peopleDF.where("age>21") In [16]: m ...

  3. [Spark][Python]RDD flatMap 操作例子

    RDD flatMap 操作例子: flatMap,对原RDD的每个元素(行)执行函数操作,然后把每行都“拍扁” [training@localhost ~]$ hdfs dfs -put cats. ...

  4. [Spark][Python][DataFrame][SQL]Spark对DataFrame直接执行SQL处理的例子

    [Spark][Python][DataFrame][SQL]Spark对DataFrame直接执行SQL处理的例子 $cat people.json {"name":" ...

  5. [Spark][Python][DataFrame][RDD]DataFrame中抽取RDD例子

    [Spark][Python][DataFrame][RDD]DataFrame中抽取RDD例子 sqlContext = HiveContext(sc) peopleDF = sqlContext. ...

  6. [Spark][Python][DataFrame][RDD]从DataFrame得到RDD的例子

    [Spark][Python][DataFrame][RDD]从DataFrame得到RDD的例子 $ hdfs dfs -cat people.json {"name":&quo ...

  7. [Spark][Python][DataFrame][Write]DataFrame写入的例子

    [Spark][Python][DataFrame][Write]DataFrame写入的例子 $ hdfs dfs -cat people.json {"name":" ...

  8. [Spark][Python]DataFrame的左右连接例子

    [Spark][Python]DataFrame的左右连接例子 $ hdfs dfs -cat people.json {"name":"Alice",&quo ...

  9. [Spark][Python]DataFrame中取出有限个记录的例子

    [Spark][Python]DataFrame中取出有限个记录的例子: sqlContext = HiveContext(sc) peopleDF = sqlContext.read.json(&q ...

随机推荐

  1. 第一个Django页面(2)

    第一个Django页面 1,进入forum项目:熟悉项目里各种文件的作用 2,配置URL:在urls.py里面添加 [url路径与对应的处理函数] 3,编写处理函数:根据urls.py里添加函数的路径 ...

  2. go 排序sort的使用

    已知一个的struct组成的数组,现在要按照数组中的一个字段排序.python有sort方法,那golang要怎么实现呢?其实golang也有sort方法,并且使用简单,功能强大. 我们先看一下sor ...

  3. ffmpeg文件切片

    先用ffmpeg把abc.mp4文件转换为abc.ts文件: ffmpeg -y -i abc.mp4 -vcodec copy -acodec copy -vbsf h264_mp4toannexb ...

  4. python分包写入文件,写入固定字节内容,当包达到指定大小时继续写入新文件

    第6行通过 for 循环控制生成 .log 文件的数量 第8行,如果该文件存在时先进行清空,然后再进行写入操作 第13行,将文件大小的单位转为MB 第14行,如果文件大小超过1MB时,跳出当前循环,重 ...

  5. POJ1419 Graph Coloring

    嘟嘟嘟 求无向图的最大独立集. 有这么一回事:最大独立集=补图的最大团. 所谓的最大团,就是一个子图,满足图中任意两点都有边. 然后ssy巨佬告诉了我一个很没有道理强的做法:随机. 每一次random ...

  6. 洛谷P1144 最短路计数

    题目描述 给出一个N个顶点M条边的无向无权图,顶点编号为1-N.问从顶点1开始,到其他每个点的最短路有几条. 输入输出格式 输入格式: 输入第一行包含2个正整数N,M,为图的顶点数与边数. 接下来M行 ...

  7. yii2场景

    遇到的问题 起作用了但是使用create的时候,保存却出了问题,提示unknown scenarios:default 解决方法 后来找文章,是因为设置场景的时候,直接把父类的场景覆盖了.所以应该这样 ...

  8. M100 (0)开发

    [SDCC 2015现场]大疆Paul Yang:多旋翼飞行器的未来就是机器人的未来 http://www.csdn.net/article/2015-11-19/2826268 开发官网 https ...

  9. python3 小数据池

    '''小数据池:为了重复的使用同一个数据str(一般的,简单的)int -5~256bool True False3个数据类型会被放入小数据池id()取变量的内存地址字符串'''# s1 = &quo ...

  10. E325: ATTENTION

    vim/vi编辑器异常 E325: ATTENTION Found a swap file by the name "/usr/local/msmtp/etc/.msmtprc.swp&qu ...