Spark SQL External Data Sources JDBC官方实现写测试
通过Spark SQL External Data Sources JDBC实现将RDD的数据写入到MySQL数据库中。
jdbc.scala重要API介绍:
/**
* Save this RDD to a JDBC database at `url` under the table name `table`.
* This will run a `CREATE TABLE` and a bunch of `INSERT INTO` statements.
* If you pass `true` for `allowExisting`, it will drop any table with the
* given name; if you pass `false`, it will throw if the table already
* exists.
*/
def createJDBCTable(url: String, table: String, allowExisting: Boolean) /**
* Save this RDD to a JDBC database at `url` under the table name `table`.
* Assumes the table already exists and has a compatible schema. If you
* pass `true` for `overwrite`, it will `TRUNCATE` the table before
* performing the `INSERT`s.
*
* The table must already exist on the database. It must have a schema
* that is compatible with the schema of this RDD; inserting the rows of
* the RDD in order via the simple statement
* `INSERT INTO table VALUES (?, ?, ..., ?)` should not fail.
*/
def insertIntoJDBC(url: String, table: String, overwrite: Boolean)
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._ val sqlContext = new SQLContext(sc)
import sqlContext._ #数据准备
val url = "jdbc:mysql://hadoop000:3306/test?user=root&password=root" val arr2x2 = Array[Row](Row.apply("dave", 42), Row.apply("mary", 222))
val arr1x2 = Array[Row](Row.apply("fred", 3))
val schema2 = StructType(StructField("name", StringType) :: StructField("id", IntegerType) :: Nil) val arr2x3 = Array[Row](Row.apply("dave", 42, 1), Row.apply("mary", 222, 2))
val schema3 = StructType(StructField("name", StringType) :: StructField("id", IntegerType) :: StructField("seq", IntegerType) :: Nil) import org.apache.spark.sql.jdbc._ ================================CREATE======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2) srdd.createJDBCTable(url, "person", false)
sqlContext.jdbcRDD(url, "person").collect.foreach(println)
[dave,42]
[mary,222] ==============================CREATE with overwrite========================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x3), schema3)
srdd.createJDBCTable(url, "person2", false)
sqlContext.jdbcRDD(url, "person2").collect.foreach(println)
[mary,222,2]
[dave,42,1] val srdd2 = sqlContext.applySchema(sc.parallelize(arr1x2), schema2)
srdd2.createJDBCTable(url, "person2", true)
sqlContext.jdbcRDD(url, "person2").collect.foreach(println)
[fred,3] ================================CREATE then INSERT to append======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2)
val srdd2 = sqlContext.applySchema(sc.parallelize(arr1x2), schema2)
srdd.createJDBCTable(url, "person3", false)
sqlContext.jdbcRDD(url, "person3").collect.foreach(println)
[mary,222]
[dave,42] srdd2.insertIntoJDBC(url, "person3", false)
sqlContext.jdbcRDD(url, "person3").collect.foreach(println)
[mary,222]
[dave,42]
[fred,3] ================================CREATE then INSERT to truncate======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2)
val srdd2 = sqlContext.applySchema(sc.parallelize(arr1x2), schema2) srdd.createJDBCTable(url, "person4", false)
sqlContext.jdbcRDD(url, "person4").collect.foreach(println)
[dave,42]
[mary,222] srdd2.insertIntoJDBC(url, "person4", true)
[fred,3] ================================Incompatible INSERT to append======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2)
val srdd2 = sqlContext.applySchema(sc.parallelize(arr2x3), schema3)
srdd.createJDBCTable(url, "person5", false)
srdd2.insertIntoJDBC(url, "person5", true)
java.sql.SQLException: Column count doesn't match value count at row 1
Spark SQL External Data Sources JDBC官方实现写测试的更多相关文章
- Spark SQL External Data Sources JDBC官方实现读测试
在最新的master分支上官方提供了Spark JDBC外部数据源的实现,先尝为快. 通过spark-shell测试: import org.apache.spark.sql.SQLContext v ...
- Spark SQL External Data Sources JDBC简易实现
在spark1.2版本中最令我期待的功能是External Data Sources,通过该API可以直接将External Data Sources注册成一个临时表,该表可以和已经存在的表等通过sq ...
- Spark SQL 之 Data Sources
#Spark SQL 之 Data Sources 转载请注明出处:http://www.cnblogs.com/BYRans/ 数据源(Data Source) Spark SQL的DataFram ...
- Spark(3) - External Data Source
Introduction Spark provides a unified runtime for big data. HDFS, which is Hadoop's filesystem, is t ...
- Spark SQL External DataSource简介
随着Spark1.2的发布,Spark SQL开始正式支持外部数据源.这使得Spark SQL支持了更多的类型数据源,如json, parquet, avro, csv格式.只要我们愿意,我们可以开发 ...
- How to: Provide Credentials for the Dashboards Module when Using External Data Sources
XAF中使用dashboard模块时,如果使用了sql数据源,可以使用此方法提供连接信息 https://www.devexpress.com/Support/Center/Question/Deta ...
- 【转载】Spark SQL之External DataSource外部数据源
http://blog.csdn.net/oopsoom/article/details/42061077 一.Spark SQL External DataSource简介 随着Spark1.2的发 ...
- Apache Spark 2.2.0 中文文档 - Spark SQL, DataFrames and Datasets Guide | ApacheCN
Spark SQL, DataFrames and Datasets Guide Overview SQL Datasets and DataFrames 开始入门 起始点: SparkSession ...
- What’s new for Spark SQL in Apache Spark 1.3(中英双语)
文章标题 What’s new for Spark SQL in Apache Spark 1.3 作者介绍 Michael Armbrust 文章正文 The Apache Spark 1.3 re ...
随机推荐
- Python排序算法
不觉已经有半年没写了,时间真是容易荒废,这半年过了个春节,去拉萨旅行.本职工作也很忙,没有开展系统的学习和总结. 今年开始静下心来从基础开始学习,主要分为三部分,算法.线性代数.概率统计. 首先学习算 ...
- 图表控件== 百度 echarts的入门学习
花了3天的时间 去学习跟试用之前两款的图表控件 hightcharts(商业,人性化,新手非常方便试用,图表少了点) 跟chartjs==>搭配vue更好 控件,整体而言都还可以. http:/ ...
- 将excel2003文档文件转换为excel2007格式
在sharepoint 2010 中,excel2007或excel 2010文档格式,支持web app 应用,能够在浏览器在线打开,查看,但excel 2003格式的文档只能用office客户端打 ...
- SharePoint 2016 开发 工具Preview发布
博客地址:http://blog.csdn.net/FoxDave 之前装了SharePoint,但是并不能在Visual Studio 2015里面做开发,因为没有相应的office tool. 但 ...
- VS2015打开工程 未能正确加载“”包的问题
启动vs2015专业版时,出现类似于这样的提示框,有好几个,点击是或否,但下次打开还是会出现.寻找了网上的一些解决办法,例如用vs命令窗口或其他,但都无疾而终,下面提供的这个办法,顺利解决此问题 1. ...
- JVM-程序编译与代码早期(编译期)优化
早期(编译期)优化 一.Javac编译器 1.Javac的源代码与调试 Javac的源代码放在JDK_SRC_HOME/langtools/src/shares/classes/com/sun/too ...
- Spring 核心概念以及入门教程
初始Spring 在学习Spring之前我们首先要了解一下企业级应用.企业级应用是指那些为商业组织,大型企业而创建并部署的解决方案及应用. 这些大型企业级应用的结构复杂,涉及的外部资源众多,事务密集, ...
- brute-force search
#include <pcl/search/brute_force.h> #include <pcl/common/common.h> #include <iostream ...
- Linux线程-创建
Linux的线程实现是在内核以外来实现的,内核本身并不提供线程创建.但是内核为提供线程[也就是轻量级进程]提供了两个系统调用__clone()和fork (),这两个系统调用都为准备一些参数,最终都用 ...
- Android-->Genymotion虚拟机(模拟器)的配置
--> Genymotion 是一套完整的工具,它提供了Android虚拟环境.它简直就是开发者.测试人员.推销者甚至是游戏玩家的福音. 我只能说非常好用,模拟器中顶级,具体好处可以度娘. -- ...