通过Spark SQL External Data Sources JDBC实现将RDD的数据写入到MySQL数据库中。

jdbc.scala重要API介绍:

/**
* Save this RDD to a JDBC database at `url` under the table name `table`.
* This will run a `CREATE TABLE` and a bunch of `INSERT INTO` statements.
* If you pass `true` for `allowExisting`, it will drop any table with the
* given name; if you pass `false`, it will throw if the table already
* exists.
*/
def createJDBCTable(url: String, table: String, allowExisting: Boolean) /**
* Save this RDD to a JDBC database at `url` under the table name `table`.
* Assumes the table already exists and has a compatible schema. If you
* pass `true` for `overwrite`, it will `TRUNCATE` the table before
* performing the `INSERT`s.
*
* The table must already exist on the database. It must have a schema
* that is compatible with the schema of this RDD; inserting the rows of
* the RDD in order via the simple statement
* `INSERT INTO table VALUES (?, ?, ..., ?)` should not fail.
*/
def insertIntoJDBC(url: String, table: String, overwrite: Boolean)
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._ val sqlContext = new SQLContext(sc)
import sqlContext._ #数据准备
val url = "jdbc:mysql://hadoop000:3306/test?user=root&password=root" val arr2x2 = Array[Row](Row.apply("dave", 42), Row.apply("mary", 222))
val arr1x2 = Array[Row](Row.apply("fred", 3))
val schema2 = StructType(StructField("name", StringType) :: StructField("id", IntegerType) :: Nil) val arr2x3 = Array[Row](Row.apply("dave", 42, 1), Row.apply("mary", 222, 2))
val schema3 = StructType(StructField("name", StringType) :: StructField("id", IntegerType) :: StructField("seq", IntegerType) :: Nil) import org.apache.spark.sql.jdbc._ ================================CREATE======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2) srdd.createJDBCTable(url, "person", false)
sqlContext.jdbcRDD(url, "person").collect.foreach(println)
[dave,42]
[mary,222] ==============================CREATE with overwrite========================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x3), schema3)
srdd.createJDBCTable(url, "person2", false)
sqlContext.jdbcRDD(url, "person2").collect.foreach(println)
[mary,222,2]
[dave,42,1] val srdd2 = sqlContext.applySchema(sc.parallelize(arr1x2), schema2)
srdd2.createJDBCTable(url, "person2", true)
sqlContext.jdbcRDD(url, "person2").collect.foreach(println)
[fred,3] ================================CREATE then INSERT to append======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2)
val srdd2 = sqlContext.applySchema(sc.parallelize(arr1x2), schema2)
srdd.createJDBCTable(url, "person3", false)
sqlContext.jdbcRDD(url, "person3").collect.foreach(println)
[mary,222]
[dave,42] srdd2.insertIntoJDBC(url, "person3", false)
sqlContext.jdbcRDD(url, "person3").collect.foreach(println)
[mary,222]
[dave,42]
[fred,3] ================================CREATE then INSERT to truncate======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2)
val srdd2 = sqlContext.applySchema(sc.parallelize(arr1x2), schema2) srdd.createJDBCTable(url, "person4", false)
sqlContext.jdbcRDD(url, "person4").collect.foreach(println)
[dave,42]
[mary,222] srdd2.insertIntoJDBC(url, "person4", true)
[fred,3] ================================Incompatible INSERT to append======================================
val srdd = sqlContext.applySchema(sc.parallelize(arr2x2), schema2)
val srdd2 = sqlContext.applySchema(sc.parallelize(arr2x3), schema3)
srdd.createJDBCTable(url, "person5", false)
srdd2.insertIntoJDBC(url, "person5", true)

java.sql.SQLException: Column count doesn't match value count at row 1

Spark SQL External Data Sources JDBC官方实现写测试的更多相关文章

  1. Spark SQL External Data Sources JDBC官方实现读测试

    在最新的master分支上官方提供了Spark JDBC外部数据源的实现,先尝为快. 通过spark-shell测试: import org.apache.spark.sql.SQLContext v ...

  2. Spark SQL External Data Sources JDBC简易实现

    在spark1.2版本中最令我期待的功能是External Data Sources,通过该API可以直接将External Data Sources注册成一个临时表,该表可以和已经存在的表等通过sq ...

  3. Spark SQL 之 Data Sources

    #Spark SQL 之 Data Sources 转载请注明出处:http://www.cnblogs.com/BYRans/ 数据源(Data Source) Spark SQL的DataFram ...

  4. Spark(3) - External Data Source

    Introduction Spark provides a unified runtime for big data. HDFS, which is Hadoop's filesystem, is t ...

  5. Spark SQL External DataSource简介

    随着Spark1.2的发布,Spark SQL开始正式支持外部数据源.这使得Spark SQL支持了更多的类型数据源,如json, parquet, avro, csv格式.只要我们愿意,我们可以开发 ...

  6. How to: Provide Credentials for the Dashboards Module when Using External Data Sources

    XAF中使用dashboard模块时,如果使用了sql数据源,可以使用此方法提供连接信息 https://www.devexpress.com/Support/Center/Question/Deta ...

  7. 【转载】Spark SQL之External DataSource外部数据源

    http://blog.csdn.net/oopsoom/article/details/42061077 一.Spark SQL External DataSource简介 随着Spark1.2的发 ...

  8. Apache Spark 2.2.0 中文文档 - Spark SQL, DataFrames and Datasets Guide | ApacheCN

    Spark SQL, DataFrames and Datasets Guide Overview SQL Datasets and DataFrames 开始入门 起始点: SparkSession ...

  9. What’s new for Spark SQL in Apache Spark 1.3(中英双语)

    文章标题 What’s new for Spark SQL in Apache Spark 1.3 作者介绍 Michael Armbrust 文章正文 The Apache Spark 1.3 re ...

随机推荐

  1. 最简单的一个Oracle定时任务

    最简单的一个Oracle定时任务一.在PLSQL中创建表:create table HWQY.TEST(CARNO     VARCHAR2(30),CARINFOID NUMBER) 二.在PLSQ ...

  2. CSS行内元素和块级元素的居中

    一.水平居中 行内元素和块级元素不同,对于行内元素,只需在父元素中设置text-align=center即可; 对于块级元素有以下几种居中方式: 1.将元素放置在table中,再将table的marg ...

  3. 福州月赛2057 DFS

    题意:告诉你族谱,然后Q条查询s和t的关系,妈妈输出M,爸爸输出F: 题目地址:http://acm.hust.edu.cn/vjudge/contest/view.action?cid=78233# ...

  4. lua 元表与元方法示例

    -- 1.检查是否有元表local t = {1, 2}print(getmetatable(t))     -- nilprint("----------------------" ...

  5. JavaScript中this和$(this)之间的区别以及extend的使用

    jQuery中this和$(this)之间的区别: this返回的是当前对象的html对象,而$(this)返回的是当前对象的jQuery对象 举个正确的Demo实例: $("#textbo ...

  6. day14_API第四天

    1.正则(了解) 1.基本的正则表达式(看懂即可) 字符类[abc] a.b 或 c(简单类)[^abc] 任何字符,除了 a.b 或 c(否定)[a-zA-Z] a 到 z 或 A 到 Z,两头的字 ...

  7. MACOS无限试用Cornerstone的方法

    MacOS上Cornerstone用起来还是比较好用的,除了add文件目录时不把底下的文件add上去之外.其实之前用Versions也还可以,奈何太贵,买不起正版,破解版又不好用.Cornerston ...

  8. ✡ leetcode 164. Maximum Gap 寻找最大相邻数字差 --------- java

    Given an unsorted array, find the maximum difference between the successive elements in its sorted f ...

  9. IOC及Bean容器

    1. 接口及面向接口编程 1.1. 接口 用于沟通的中介物的抽象化 实体把自己提供给外界的一种抽象化说明,用以由内部操作分离出外部沟通方法,使其能被修改内部而不影响外界其他实体与其交互的方式 对应JA ...

  10. 论文笔记之:Hybrid computing using a neural network with dynamic external memory

    Hybrid computing using a neural network with dynamic external memory Nature  2016 原文链接:http://www.na ...