oozie4.3.0+sqoop1.4.6实现mysql到hive的增量抽取

1.准备数据源

mysql中表bigdata,数据如下：

2. 准备目标表

目标表存放hive中数据库dw_stg表bigdata

保存路径为 hdfs://localhost:9000/user/hive/warehouse/dw_stg.db/bigdata

hive中建表语句如下：

create external table bigdata(

class_id string comment '课程id',

class_name string comment '课程名称',

class_month int comment '课程周期',

teacher string comment '课程老师',

update_time string comment '更新日期'

)

partitioned by(dt string comment '年月日')

row format delimited

fields terminated by '\001'

lines terminated by '\n'

stored as textfile;

注意点：字段分隔符使用\001，行分隔符使用\n ，增加表分区dt格式为yyyMMdd

在hive中创建上面表bigdata.

3. 编写oozie脚本文件

3.1 配置job.properties

# 公共变量

timezone=Asia/Shanghai

jobTracker=dwtest-name1:8032

nameNode=hdfs://dwtest-name1:9000

queueName=default

warehouse=/user/hive/warehouse

dw_stg=${warehouse}/dw_stg.db

dw_mdl=${warehouse}/dw_mdl.db

dw_dm=${warehouse}/dw_dm.db

app_home=/user/oozie/app

oozie.use.system.libpath=true

# coordinator

oozie.coord.application.path=${nameNode}${app_home}/bigdata/coordinator.xml

workflow=${nameNode}${app_home}/bigdata

# source

connection=jdbc:mysql://192.168.1.100:3306/test

username=test

password=test

source_table=bigdata

# target

target_path=${dw_stg}/bigdata

# 脚本启动时间，结束时间

start=2018-01-24T10:00+0800

end=2199-01-01T01:00+0800

3.2 配置coordinator.xml

<coordinator-app name="coord_bigdata" frequency="${coord:days(1)}" start="${start}" end="${end}" timezone="${timezone}" xmlns="uri:oozie:coordinator:0.5">

    <action>

        <workflow>

            <app-path>${workflow}</app-path>

            <configuration>

                <property>

                    <name>startTime</name>

                    <value>${coord:formatTime(coord:dateOffset(coord:nominalTime(), -1, 'DAY'), 'yyyy-MM-dd 00:00:00')}</value>

                </property>

                <property>

                    <name>endTime</name>

                    <value>${coord:formatTime(coord:dateOffset(coord:nominalTime(), 0, 'DAY'), 'yyyy-MM-dd 00:00:00')}</value>

                </property>

                <property>

                    <name>outputPath</name>

                    <value>${target_path}/dt=${coord:formatTime(coord:dateOffset(coord:nominalTime(), 0, 'DAY'), 'yyyyMMdd')}/</value>

                </property>

            </configuration>

        </workflow>

    </action>

</coordinator-app>

注意点：

增量的开始时间startTime获取: 当前时间的前一天输出值为 2018-01-23 00:00:00

${coord:formatTime(coord:dateOffset(coord:nominalTime(), -, 'DAY'), 'yyyy-MM-dd 00:00:00')}

增量的结束时间endTime获取: 输出值为 2018-01-24 00:00:00

${coord:formatTime(coord:dateOffset(coord:nominalTime(), , 'DAY'), 'yyyy-MM-dd 00:00:00')}

输出路径需要带上分区字段dt: 输出值 /user/hive/warehouse/dw_stg.db/bigdata/dt=20180124/

${target_path}/dt=${coord:formatTime(coord:dateOffset(coord:nominalTime(), , 'DAY'), 'yyyyMMdd')}/

3.3 配置workflow.xml

 <?xml version="1.0" encoding="UTF-8"?>

 <!--

   Licensed to the Apache Software Foundation (ASF) under one

   or more contributor license agreements.  See the NOTICE file

   distributed with this work for additional information

   regarding copyright ownership.  The ASF licenses this file

   to you under the Apache License, Version 2.0 (the

   "License"); you may not use this file except in compliance

   with the License.  You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software

   distributed under the License is distributed on an "AS IS" BASIS,

   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

   See the License for the specific language governing permissions and

   limitations under the License.

 -->

 <workflow-app xmlns="uri:oozie:workflow:0.4" name="wf_bigdata">

     <start to="sqoop-node"/>

     <action name="sqoop-node">

         <sqoop xmlns="uri:oozie:sqoop-action:0.2">

             <job-tracker>${jobTracker}</job-tracker>

             <name-node>${nameNode}</name-node>

             <prepare>

                 <delete path="${nameNode}${outputPath}"/>

             </prepare>

             <configuration>

                 <property>

                     <name>mapred.job.queue.name</name>

                     <value>${queueName}</value>

                 </property>

             </configuration>

             <arg>import</arg>

             <arg>--connect</arg>

             <arg>${connection}</arg>

             <arg>--username</arg>

             <arg>${username}</arg>

             <arg>--password</arg>

             <arg>${password}</arg>

             <arg>--verbose</arg>

             <arg>--query</arg>

             <arg>select class_id,class_name,class_month,teacher,update_time from ${source_table} where $CONDITIONS and update_time &gt;= '${startTime}' and update_time &lt; '${endTime}'</arg>

             <arg>--fields-terminated-by</arg>

             <arg>\001</arg>

             <arg>--target-dir</arg>

             <arg>${outputPath}</arg>

             <arg>-m</arg>

             <arg>1</arg>

         </sqoop>

         <ok to="end"/>

         <error to="fail"/>

     </action>

     <kill name="fail">

         <message>Sqoop free form failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>

     </kill>

     <end name="end"/>

 </workflow-app>

4. 上传脚本

将以上3个文件上传到hdfs的oozie目录app下如下：

5. 执行job

oozie job -config job.properties -run

6. 查看job状态

7. 查询hive中表

使用 msck repair table bigdata 自动修复分区，然后查询结果，测试没用问题。

8. 开发中遇到的坑如下：

8.1 workflow.xml中字段分隔符不能带单引号。正确的是<arg>\001</arg> ，错误的是<arg>'\001'</arg>

8.2 由于sqoop的脚本配置在xml中，所以在判断条件时使用小于号"<"会报错,xml文件校验不通过。

解决方法使用 < 代替 "<" ,所以使用大于号时最好也使用 >代替 ">"

oozie4.3.0+sqoop1.4.6实现mysql到hive的增量抽取的更多相关文章

hive表增量抽取到mysql(关系数据库)的通用程序(三)
hive表增量抽取到oracle数据库的通用程序(一) hive表增量抽取到oracle数据库的通用程序(二) 这几天又用到了该功能了,所以又改进了一版,增加了全量抽取和批量抽取两个参数.并且可以设置 ...
【转】Oozie4.2.0配置安装实战
什么是Oozie? Oozie是一种Java Web应用程序,它运行在Java servlet容器——即Tomcat——中,并使用数据库来存储以下内容: 工作流定义当前运行的工作流实例,包括实例的状 ...
oozie4.3.0的安装与配置 + hadoop2.7.3
安装步骤 mysql的配置 oozie的安装 oozie的配置 oozie的启动与登录常用oozie的命令 1. mysql的配置 mysql的安装自行解决,然后在mysql上创建oozie数据库 ...
Oozie4.2.0配置安装实战
软件版本号: Oozie4.2.0.Hadoop2.6.0,Spark1.4.1.Hive0.14.Pig0.15.0.Maven3.2.JDK1.7,zookeeper3.4.6.HBase1.1. ...
【Linux】【MySQL】CentOS7安装最新版MySQL8.0.13（最新版MySQL从安装到运行）
1.前言框框博客在线报时:2018-11-07 19:31:06 当前MySQL最新版本:8.0.13 (听说比5.7快2倍) 官方之前表示:MySQL 8.0 正式版 8.0.11 已发布,MyS ...
CentOS 7.0下使用yum安装MySQL
CentOS7默认数据库是mariadb,配置等用着不习惯,因此决定改成mysql,但是CentOS7的yum源中默认好像是没有mysql的.为了解决这个问题,我们要先下载mysql的repo源. 1 ...
(转)maven3.3.9编译oozie4.3.0
1.Java版本1.8 [root@sht-sgmhadoopdn-04 app]# java -versionjava version "1.8.0_66"Java(TM) SE ...
（0）linux下的Mysql安装与基本使用（编译安装）
一.大致操作步骤环境介绍: OS:center OS6.5 mysql:5.6版本 1.关闭防火墙查看防火墙状态:service iptables status 这样就意味着没有关闭. 运行以下命 ...
MySQL 8.0 正式版 8.0.11 发布：比 MySQL 5.7 快 2 倍
ySQL 8.0 正式版 8.0.11 已发布,官方表示 MySQL 8 要比 MySQL 5.7 快 2 倍,还带来了大量的改进和更快的性能! 注意:从 MySQL 5.7 升级到 MySQL 8. ...

随机推荐

如何解决Ubuntu与Windows双系统时间不同步
导读不知道有没朋友跟我一样是 Ubuntu 和 Windows 双系统?今天有朋友问到我,当他从 Ubuntu 系统重新启动到 Windows 时,会发现 Windows 中的时间变了,他问我有没办 ...
vi命令整理
vi命令整理 u 撤销上一次操作 ctrl+r 恢复上一次操作 : 跳转至第1行 :$ 跳转至最后一行 ctrl+f 向文章末尾翻页 ctrl+b 向文章开始翻页 yy 复制一行 p 粘贴刚刚复制第一 ...
即将来到: CSS Feature Queries (CSS特性查询)
Feature Queries 是CSS3 Conditional Rules specification中的一部分,它支持“@supports”规则,“@supports”规则可以用来测试浏览器是否 ...
k-means聚类学习
4.1.摘要在前面的文章中,介绍了三种常见的分类算法.分类作为一种监督学习方法,要求必须事先明确知道各个类别的信息,并且断言所有待分类项都有一个类别与之对应.但是很多时候上述条件得不到满足,尤其是在 ...
UITextView 添加 pleaceholder
UITextView 默认没有 pleaceholder属性: 我们可以通过多种方式添加在UITextView的代理方法中写 - (void)textViewDidBeginEditing:(UIT ...
spring作用域（Spring Bean Scopes Example）
http://docs.spring.io/spring/docs/2.0.x/reference/beans.html#beans-factory-scopes In Spring, bean sc ...
算法笔记_012:埃拉托色尼筛选法（Java）
1 问题描述 Compute the Greatest Common Divisor of Two Integers using Sieve of Eratosthenes. 翻译:使用埃拉托色尼筛选 ...
canves 图片旋转 demo
<!DOCTYPE htmls> <html> <head> <title></title> <style> </styl ...
OFBiz：添加实体栏位
如何添加实体栏位?这里演示为PostalAddress添加planet栏位.打开applications/party/entitydef/entitymodel.xml,找到PostalAddress ...
spring注解 annotation
@Resourse(name=" xxx") 意味从上下文找xxx名字一样的然后引入 @Repository("personDao") 意味生成一个 bean ...

oozie4.3.0+sqoop1.4.6实现mysql到hive的增量抽取

oozie4.3.0+sqoop1.4.6实现mysql到hive的增量抽取的更多相关文章

随机推荐

热门专题