使用spark将内存中的数据写入到hive表中

hive-site.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

   Licensed to the Apache Software Foundation (ASF) under one or more

   contributor license agreements.  See the NOTICE file distributed with

   this work for additional information regarding copyright ownership.

   The ASF licenses this file to You under the Apache License, Version 2.0

   (the "License"); you may not use this file except in compliance with

   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software

   distributed under the License is distributed on an "AS IS" BASIS,

   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

   See the License for the specific language governing permissions and

   limitations under the License.

-->

<configuration>

    <!--hive 的元数据服务, 供spark SQL 使用-->

    <property>

        　　　　<name>hive.metastore.uris</name>

        　　　　<value>thrift://master:9083</value>

        　　　　<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>

        　　</property>

    <!--配置mysql数据库的链接URL和数据库名metastore,?后面的表达式代表如果这个数据库

    不存在,会自动创建-->

    <property>

        <name>javax.jdo.option.ConnectionURL</name>

        <value>jdbc:mysql://master:3306/metastore?createDatabaseIfNotExist=true</value>

        <description>JDBC connect string for a JDBC metastore</description>

    </property>

    <!--指定mysql的链接驱动,配置jdbc的驱动-->

    <property>

        <name>javax.jdo.option.ConnectionDriverName</name>

        <value>com.mysql.jdbc.Driver</value>

        <description>Driver class name for a JDBC metastore</description>

    </property>

    <!--配置mysql的用户名和密码-->

    <property>

        <name>javax.jdo.option.ConnectionUserName</name>

        <value>root</value>

        <description>username to use against metastore database</description>

    </property>

    <property>

        <name>javax.jdo.option.ConnectionPassword</name>

        <value>123456</value>

        <description>password to use against metastore database</description>

    </property>

    <property>

        <name>hive.cli.print.header</name>

        <value>true</value>

        <description>Whether to print the names of the columns in query output.</description>

    </property>

    <property>

        <name>hive.cli.print.current.db</name>

        <value>true</value>

        <description>Whether to include the current database in the Hive prompt.</description>

    </property>

</configuration>

下面是示例代码

package spark_sql

import org.apache.spark.sql.SparkSession

import org.apache.spark.sql.types.{StringType, StructField, StructType}

import test.ProductData

/**

  * @Program: spark01

  * @Author: 努力就是魅力

  * @Since: 2018-10-19 08:30

  *         Description:

  *

  *         使用spark将内存中的数据写入到hive表中，这是一个可以完整运行的例子

  *

  *

  *    下面是hive表查询的结果

  *         hive (hadoop10)> select * from data_block;

  *         OK

  *         data_block.ip	data_block.time	data_block.phonenum

  *         40.234.66.122	2018-10-12 09:35:21

  *         5.150.203.160	2018-10-03 14:41:09	13389202989

  *

  **/

case class Datablock(ip: String, time:String, phoneNum:String)

object WriteTabletoHive {

  def main(args: Array[String]): Unit = {

    val spark = SparkSession

      .builder()

      .master("local[*]")

      .appName("WriteTableToHive")

      .config("spark.sql.warehouse.dir","D:\\reference-data\\spark01\\spark-warehouse")

      .enableHiveSupport()

      .getOrCreate()

    import spark.implicits._

    val schemaString = "ip time phoneNum"

    val fields = schemaString.split(" ")

      .map(fieldName => StructField(fieldName, StringType,nullable = true))

    val schema = StructType(fields)

   // val datablockDS = Seq(Datablock(ProductData.getRandomIp,ProductData.getRecentAMonthRandomTime("yyyy-MM-dd HH:mm:ss"),ProductData.getRandomPhoneNumber)).toDS()

 // val datablockDS = Seq(Datablock("192.168.40.122","2018-01-01 12:25:25","18866556699")).toDS()

    datablockDS.show()

    datablockDS.toDF().createOrReplaceTempView("dataBlock")

      spark.sql("select * from dataBlock")

        .write.mode("append")

        .saveAsTable("hadoop10.data_block")

  }

}

使用spark将内存中的数据写入到hive表中的更多相关文章

hbase使用MapReduce操作4（实现将 HDFS 中的数据写入到 HBase 表中）
实现将 HDFS 中的数据写入到 HBase 表中 Runner类 package com.yjsj.hbase_mr2; import com.yjsj.hbase_mr2.ReadFruitFro ...
将DataFrame数据如何写入到Hive表中
1.将DataFrame数据如何写入到Hive表中?2.通过那个API实现创建spark临时表?3.如何将DataFrame数据写入hive指定数据表的分区中? 从spark1.2 到spark1.3 ...
vlookup函数基本使用--如何将两个Excel表中的数据匹配；excel表中vlookup函数使用方法将一表引到另一表
vlookup函数基本使用--如何将两个Excel表中的数据匹配:excel表中vlookup函数使用方法将一表引到另一表一.将几个学生的籍贯匹配出来‘ 二.使用查找与引用函数 vlookup 三. ...
sql之将一个表中的数据注入另一个表中
sql之将一个表中的数据注入另一个表中需求:现有两张表t1,t2,现需要将t2的数据通过XZQHBM相同对应放入t1表中 t1: t2: 思路:left join 语句: select * from ...
SQL语句的使用,SELECT - 从数据库表中获取数据 UPDATE - 更新数据库表中的数据 DELETE - 从数据库表中删除数据 INSERT INTO - 向数据库表中插入数据
SQL DML 和 DDL 可以把 SQL 分为两个部分:数据操作语言 (DML) 和数据定义语言 (DDL). SQL (结构化查询语言)是用于执行查询的语法. 但是 SQL 语言也包含用于更新. ...
mysql从一个表中拷贝数据到另一个表中sql语句
这一段在找新的工作,今天面试时,要做一套题,其中遇到这么一句话,从一个表中拷贝所有的数据到另一个表中的sql是什么? 原来我很少用到,也没注意过这个问题,面试后我上网查查,回来自己亲手写了写,测试了下 ...
用sqoop将mysql的数据导入到hive表中
1:先将mysql一张表的数据用sqoop导入到hdfs中准备一张表需求将 bbs_product 表中的前100条数据导导出来只要id brand_id和 name 这3个字段数据存 ...
11.把文本文件的数据导入到Hive表中
先在hive里面创建一个表 create table mydb2.t3(id int,name string,age int) row format delimited fields terminat ...
将从数据库中获取的数据写入到Excel表中
pom.xml文件写入代码,maven自动加载poi-3.1-beta2.jar  & ...

随机推荐

【Azure Redis 缓存 Azure Cache For Redis】使用Redis自带redis-benchmark.exe命令测试Azure Redis的性能
问题描述关于Azure Redis的性能问题,在官方文档中,可以查看到不同层级Redis的最大连接数,每秒处理请求的性能. 基本缓存和标准缓存 C0 (250 MB) 缓存 - 最多支持 256 个 ...
怎么用MindManager制作议论文思维导图
大家都写过作文吧,做小学到高考到大学,这是谁也摆脱不了的,但是大家写作文会提前把自己的思路整理出来吗?让自己行文更为顺畅,作文更为流利吗?特别是关于议论文,一直是高考写作的一个重点篇目,写好议论文,就 ...
ntfs和fat32的区别
ntfs和fat32是两种不同的磁盘文件系统格式,虽然他们有一定的相似点,但还是具有很大的差异.今天,小编就带大家了解一下ntfs和fat32的区别. 图1 :u盘一.分区容量 fat32能够有效管 ...
JUC并发工具包之CyclicBarrier
1.简介 CyclicBarrier是一个同步器,允许多个线程等待彼此直到达一个执行点(barrier). CyclicBarrier都是在多个线程必须等到彼此都到达同一个执行点后才执行一段逻辑时才被 ...
Java基础教程——打印流
打印流打印流可以把原本输出到控制台的信息输出到文件中.PrintStream是字节打印流(还有个对应的字符打印流是PrintWriter,这里不涉及) System类中有个变量: public fi ...
论如何优雅的抛出SpringBoot注解的异常
平时我们在写代码的时候肯定要进行很多参数验证,最开始的时候我们一般都是这样处理的如下图看起来好像也没什么,但是如果参数多了呢?你就会看到这样的校验 OMG!!! 有没有感觉稍微有点视觉 ...
uni搜索功能实现
uni搜索功能的实现
PyQt（Python+Qt）学习随笔：containers容器部件GroupBox分组框介绍
老猿Python博文目录专栏:使用PyQt开发图形界面Python应用老猿Python博客地址 1.主要属性 GroupBox分组框是一个对多个部件进行编组的框架容器,可以带有标题(title属性 ...
Hello TLM
前言目标了解TLM程序的基本过程.TLM的英文全称是Transaction Level Modeling,中文翻译为事务级建模.它是在SystemC基础上的一个扩展库. 功能描述模块A向模块B发 ...
python菜鸟教程学习10：数据结构
列表方法 list.append(x):把一个元素添加到列表的结尾,相当于 a[len(a):] = [x]. list.extend(L):通过添加指定列表的所有元素来扩充列表,相当于 a[len( ...

使用spark将内存中的数据写入到hive表中

使用spark将内存中的数据写入到hive表中

hive-site.xml

下面是示例代码

使用spark将内存中的数据写入到hive表中的更多相关文章

随机推荐

热门专题