[Spark][Python]Spark 访问 mysql , 生成 dataframe 的例子:

mydf001=sqlContext.read.format("jdbc").option("url","jdbc:mysql://localhost/loudacre")\
.option("dbtable","accounts").option("user","training").option("password","training").load()

In [10]: mydf001=sqlContext.read.format("jdbc").option("url","jdbc:mysql://localhost/loudacre")\
....: .option("dbtable","accounts").option("user","training").option("password","training").load()
17/10/03 05:59:53 INFO hive.HiveContext: default warehouse location is /user/hive/warehouse
17/10/03 05:59:53 INFO hive.HiveContext: Initializing metastore client version 1.1.0 using Spark classes.
17/10/03 05:59:53 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0-cdh5.7.0
17/10/03 05:59:53 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0-cdh5.7.0
17/10/03 05:59:56 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost.localdomain:9083
17/10/03 05:59:56 INFO hive.metastore: Opened a connection to metastore, current connections: 1
17/10/03 05:59:56 INFO hive.metastore: Connected to metastore.
17/10/03 05:59:56 INFO session.SessionState: Created local directory: /tmp/c2d22d09-7425-4bb3-94c3-39cb32267c7d_resources
17/10/03 05:59:56 INFO session.SessionState: Created HDFS directory: /tmp/hive/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d
17/10/03 05:59:56 INFO session.SessionState: Created local directory: /tmp/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d
17/10/03 05:59:56 INFO session.SessionState: Created HDFS directory: /tmp/hive/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d/_tmp_space.db
17/10/03 05:59:56 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr.

In [11]:

In [11]: type(mydf001)
Out[11]: pyspark.sql.dataframe.DataFrame

In [12]: mydf001.count()
17/10/03 06:00:29 INFO spark.SparkContext: Starting job: count at NativeMethodAccessorImpl.java:-2
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Registering RDD 2 (count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Got job 0 (count at NativeMethodAccessorImpl.java:-2) with 1 output partitions
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[2] at count at NativeMethodAccessorImpl.java:-2), which has no missing parents
17/10/03 06:00:30 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 11.0 KB, free 11.0 KB)
17/10/03 06:00:31 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 5.2 KB, free 16.1 KB)
17/10/03 06:00:31 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:36793 (size: 5.2 KB, free: 208.8 MB)
17/10/03 06:00:31 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
17/10/03 06:00:31 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[2] at count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:31 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/10/03 06:00:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 1911 bytes)
17/10/03 06:00:31 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
17/10/03 06:00:32 INFO codegen.GenerateMutableProjection: Code generated in 425.82589 ms
17/10/03 06:00:32 INFO codegen.GenerateUnsafeProjection: Code generated in 78.278589 ms
17/10/03 06:00:33 INFO codegen.GenerateMutableProjection: Code generated in 84.676206 ms
17/10/03 06:00:33 INFO codegen.GenerateUnsafeRowJoiner: Code generated in 60.144399 ms
17/10/03 06:00:33 INFO codegen.GenerateUnsafeProjection: Code generated in 95.977074 ms
17/10/03 06:00:34 INFO jdbc.JDBCRDD: closed connection
17/10/03 06:00:34 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1334 bytes result sent to driver
17/10/03 06:00:34 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3081 ms on localhost (1/1)
17/10/03 06:00:34 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/10/03 06:00:34 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (count at NativeMethodAccessorImpl.java:-2) finished in 3.163 s
17/10/03 06:00:34 INFO scheduler.DAGScheduler: looking for newly runnable stages
17/10/03 06:00:34 INFO scheduler.DAGScheduler: running: Set()
17/10/03 06:00:34 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
17/10/03 06:00:34 INFO scheduler.DAGScheduler: failed: Set()
17/10/03 06:00:34 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[5] at count at NativeMethodAccessorImpl.java:-2), which has no missing parents
17/10/03 06:00:34 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 12.1 KB, free 28.3 KB)
17/10/03 06:00:34 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.6 KB, free 33.9 KB)
17/10/03 06:00:34 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:36793 (size: 5.6 KB, free: 208.8 MB)
17/10/03 06:00:34 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
17/10/03 06:00:34 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[5] at count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:34 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
17/10/03 06:00:34 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, partition 0,NODE_LOCAL, 1999 bytes)
17/10/03 06:00:34 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)
17/10/03 06:00:34 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
17/10/03 06:00:34 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 32 ms
17/10/03 06:00:35 INFO codegen.GenerateMutableProjection: Code generated in 52.636353 ms
17/10/03 06:00:35 INFO codegen.GenerateMutableProjection: Code generated in 49.757505 ms
17/10/03 06:00:35 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 1). 1666 bytes result sent to driver
17/10/03 06:00:35 INFO scheduler.DAGScheduler: ResultStage 1 (count at NativeMethodAccessorImpl.java:-2) finished in 0.795 s
17/10/03 06:00:35 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 789 ms on localhost (1/1)
17/10/03 06:00:35 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/10/03 06:00:35 INFO scheduler.DAGScheduler: Job 0 finished: count at NativeMethodAccessorImpl.java:-2, took 6.451521 s
Out[12]: 129761

In [13]:

[Spark][Python]Spark 访问 mysql , 生成 dataframe 的例子:的更多相关文章

  1. [Spark][Python]spark 从 avro 文件获取 Dataframe 的例子

    [Spark][Python]spark 从 avro 文件获取 Dataframe 的例子 从如下地址获取文件: https://github.com/databricks/spark-avro/r ...

  2. Spark(Python) 从内存中建立 RDD 的例子

    Spark(Python) 从内存中建立 RDD 的例子: myData = ["Alice","Carlos","Frank"," ...

  3. [Spark][Python]Spark Python 索引页

    Spark Python 索引页 为了查找方便,建立此页 === RDD 基本操作: [Spark][Python]groupByKey例子

  4. [spark][python]Spark map 处理

    map 就是对一个RDD的各个元素都施加处理,得到一个新的RDD 的过程 [training@localhost ~]$ cat names.txtYear,First Name,County,Sex ...

  5. crontab定时运行python脚本访问MySQL遇到问题

    最近写了一个python脚本来定时备份MySQL数据库.具体实现如下: 1)python脚本中使用os.system("mysqldump -h127.0.0.1 -uroot -ppass ...

  6. python+pymysql访问mysql数据库

    今天跟大家分享两种场景的python连接MySQL方法: 场景一:连接远程MySQL 首先,安装pymysql:在命令行执行pip install pymysql指令. 然后,导入pymysql: i ...

  7. [Spark][Python]Spark Join 小例子

    [training@localhost ~]$ hdfs dfs -cat people.json {"name":"Alice","pcode&qu ...

  8. 今天看到可以用sqlalchemy在python上访问Mysql

    from sqlalchemy import create_engine, MetaData, and_ 具体的还没有多看.

  9. 基础 ADO.NET 访问MYSQL 与 MSSQL 数据库例子

    虽然实际开发时都是用 Entity 了,但是基础还是要掌握和复习的 ^^ //set connection string, server,database,username,password MySq ...

随机推荐

  1. Linux  改动inittab文件及忘记密码等导致无法进入系统的解决办法

    改动inittab文件及忘记密码等导致无法进入系统的解决办法[摘] by:授客 QQ:1033553122 inittab是linux的系统启动模式配置文件,在”/etc“文件目录下没,其内容是: # ...

  2. Web API 方法的返回类型、格式器、过滤器

    一.Action方法的返回类型 a) 操作方法的返回类型有四种:void.简单或复杂类型.HttpResponseMessage类型.IHttpActionResult类型. b) 如果返回类型为vo ...

  3. [Winform-WebBrowser]-在html页面中js调用winForm类方法

    在winform项目中嵌入了网页,想通过html页面调用后台方法,如何实现呢?其实很简单,主要有三部: 1.在被调用方法类上加上[ComVisible(true)]标签,意思就是当前类可以com组件的 ...

  4. JAVA开发学习

    一.安装JAVA开发工具IDEA,下载Ultimate旗舰版版本,Community社区版不支持Java EE开发...... 下载地址:https://www.jetbrains.com/idea/ ...

  5. Linux自制编译内核

    今天我们来自己学习编译内核并使用它.自制内核是个人定制版,定制自己专属的内核环境. 我们先看看编译步骤有哪些: 步骤: 1.# tar xf linux-3.10.37.tar.xz -C /usr/ ...

  6. git merge 步骤

    这两天用git比较多,自己学习的过程踩了不少误区,特意记录下来. 当多人合作开发使用git作为代码管理仓库时,要注意自己的更新不能冲掉别人的更新,因为自己一开始不了解的时候就出现了这种情况.首先一定要 ...

  7. 有效的括号golang实现

    给定一个只包括 '(',')','{','}','[',']' 的字符串,判断字符串是否有效. 有效字符串需满足: 左括号必须用相同类型的右括号闭合. 左括号必须以正确的顺序闭合. 注意空字符串可被认 ...

  8. SDE ST_Geometry SQL st_intersects查询很慢的解决方法

    环境:服务端 SDE 10.0 oracle 11.2,客户端 PLSQL 11,oracle 11.2 为了调试方便,以下测试都是把sql提取出来在PLSQL上做 需求是已知一个多边形的点坐标,要在 ...

  9. C#异步编程のawait和async关键字来写异步程序

    一.await和async关键字 .Net平台不断推出了新的异步编程模型,在.net4.5中加入了关键字await和async,顾名思义,await是指方法执行可等待,即可挂起直到有结果(不是必须立即 ...

  10. Django之知识总结

    1. 课程介绍 - 数据类型 - 函数 - 面向对象三大特性:继承,封装,多态 - socket:本质传输字节:所有网络通信都基于socket - 数据库设计:单表.FK.M2M (自己作业:自己领域 ...