Sqoop_mysql,hive,hdfs导入导出操作

前言: 搭建环境,这里使用cdh版hadoop+hive+sqoop+mysql

下载 hadoop-2.5.0-cdh5.3.6.tar.gz

　　 hive-0.13.1-cdh5.3.6.tar.gz

　　 sqoop-1.4.5-cdh5.3.6.tar.gz

配置 Hadoop

　　 *.env(3个)--jdk_Path

　　 core-sit.xml

　　　　fs.defaultFS

　　　　hadoop.tmp.dir

　　 hdfs-site.xml

　　　　dfs.replication

　　　　mapred-site.xml
　　　　mapreduce.framework.name--yarn

　　　　mapreduce.jobhistory.address        # 10020

　　　　mapreduce.jobhistory.webapp.address # 19888

　　 yarn-site.xml

　　　　yarn.resourcemanager.hostname

　　　　yarn.nodemanager.aux-services--mapreduce_shuffle

　　　　yarn.log-aggregation-enable--true

　　　　yarn.log-aggregation.retain-seconds--108600

　　 slave

　　　　主机地址

PS: 格式化namenode,启动hdfs与yarn

　　 $ bin/hdfs dfs -mkdir   /tmp

　　 $ bin/hdfs dfs -mkdir -p /user/hive/warehouse

　　 $ bin/hdfs dfs -chmod g+w  /tmp

　　 $ bin/hdfs dfs -chmod g+w   /user/hive/warehouse

配置Hive

　　 hive-env.sh

　　　　HADOOP_HOME=/opt/cdh-5.6.3/hadoop-2.5.0-cdh5.3.6

　　　　export HIVE_CONF_DIR=/opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/conf

　　　　hive-log4j.properties

　　　　hive.log.threshold=ALL

　　　　hive.root.logger=INFO,DRFA

　　　　hive.log.dir=/opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs

　　　　hive.log.file=hive.log

　　 hive-site.xml # 事先将mysql部署好

　　　　javax.jdo.option.ConnectionURL--jdbc:mysql://hadoop09-linux-01.ibeifeng.com:3306/chd_metastore?createDatabaseIfNotExist=true

　　　　javax.jdo.option.ConnectionDriverName--com.mysql.jdbc.Driver

　　　　javax.jdo.option.ConnectionUserNam

　　　　javax.jdo.option.ConnectionPassword

　　　　hive.cli.print.header--true

　　　　hive.cli.print.current.db--true

　　　　hive.fetch.task.conversion--more

PS: hive目录下

　　 $ mkdir logs

将准备好的mysql.jar包放入lib

　　 启动 $ bin/hive

配置Sqoop

　　 sqoop-env.sh

　　　　export HADOOP_COMMON_HOME=/opt/cdh-5.6.3/hadoop-2.5.0-cdh5.3.6

　　　　export HADOOP_MAPRED_HOME=/opt/cdh-5.6.3/hadoop-2.5.0-cdh5.3.6

　　　　export HIVE_HOME=/opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6

将准备好的mysql.jar包放入lib

一、准备数据

# 在我的mysql下创建数据库和表,并插入几条数据

  mysql> create database if not exists student default character set utf8 collate utf8_general_ci;

  mysql> use student;

  mysql> create table if not exists stu_info( id int(10) primary key not null auto_increment, name varchar(20) not null) default character set utf8 collate utf8_general_ci;

  mysql> insert into stu_info(name) values("李建");

  mysql> insert into stu_info(name) values("张明");

  mysql> insert into stu_info(name) values("赵兴");

  mysql> insert into stu_info(name) values("陈琦");

  mysql> insert into stu_info(name) values("刘铭");

  mysql> select id,name from stu_info;

  +----+--------+

  | id | name   |

  +----+--------+

  |  1 | 李建   |

  |  2 | 张明   |

  |  3 | 赵兴   |

  |  4 | 陈琦   |

  |  5 | 刘铭   |

  +----+--------+

  5 rows in set (0.00 sec)

二、使用sqoop将mysql中的这张表导入到hdfs上

bin/sqoop import \

--connect \

jdbc:mysql://10.0.0.108:3306/student \

--username root \

--password root \

--table stu_info \

--target-dir /student \

--num-mappers 1 \

--fields-terminated-by '\t'

三、使用sqoop将mysql中的这张表导入到hive

方式一、

1. 在hive中创建数据库和表

    create database if not exists student;

    create table if not exists stu_info(id int,name string) row format delimited fields terminated by '\t';

2. bin/sqoop import \

    --connect jdbc:mysql://hadoop09-linux-01.ibeifeng.com:3306/student \

    --username root --password root \

    --table stu_info \

    --delete-target-dir \

    --target-dir /user/hive/warehouse/student.db/stu_info \

    --hive-import \

    --hive-database student \

    --hive-table stu_info \

    --hive-overwrite \

    --num-mappers 1 \

    --fields-terminated-by '\t'

方式二、

1. 使用sqoop create-hive-table,但必须创建出自定义数据库,否则目标路径将是元数据库

2. bin/sqoop create-hive-table 、

    --connect jdbc:mysql://10.0.0.108:3306/student 、

    --username root --password root \

    --table stu_info \

    --hive-table student.stu_info

3. bin/sqoop import --connect jdbc:mysql://10.0.0.108:3306/student \

    --username root --password root \

    --table stu_info \

    --hive-import \

    --hive-database student \

    --hive-table stu_info \

    --hive-overwrite  \

    --num-mappers 1 \

    --fields-terminated-by '\t' \

    --delete-target-dir \

    --target-dir /user/hive/warehouse/student.db/stu_info

4. 在hive中查询会发现数据全部为NULL

    但是从hdfs上查看却是正常的,确定hive无法解析数据,定位在分隔符问题

    使用--fields-terminated-by '\001' 即可  # \001就是ctrl+A,hive默认分隔符,mysql默认分隔符为","

五、从hdfs或hive导出数据到mysql表

1. 在mysql上准备好数据库和表

2. 数据库我就直接使用student数据库

    create table if not exists stu_info_export like stu_info;

3. 根据hdfs/hive表数据分隔符为主

    bin/sqoop export \

    --connect jdbc:mysql://10.0.0.108/student \

    --username root --password root \

    --table stu_info_export \

    --export-dir /user/hive/warehouse/student.db/stu_info \

    --num-mappers 1 \

    --input-fields-terminated-by '\001'

六、sqoop --option-file

另外 企业级增量迁移数据使用 --option-file + shell脚本

-- $ sqoop import --connect jdbc:mysql://localhost/db --username foo --table TEST

-- $ sqoop --options-file /users/homer/work/import.txt --table TEST

注意:脚本格式开头直接导入导出命令然后一行一个属性,如:

-->import

   --connect

   jdbc:mysql://localhost/db

   --username

   foo

七、使用sqoop job

$ bin/sqoop job --delete <job_id>

$ bin/sqoop job --list

$ bin/sqoop job --show <job_id>

$ bin/sqoop job --exec <job_id>

$ bin/sqoop job --create job_id -- <job-info>

$ bin/sqoop job --create stu_info -- \

 import \

--connect \

jdbc:mysql://hadoop09-linux-01.ibeifeng.com:3306/sqoop \

 --username root \

--password root \

--table tohdfs  \

--target-dir /sqoop \

--num-mappers 1 \

--fields-terminated-by '\t' \

--check-column id  \

--incremental append  \

--last-value 11

PS: 增量导入(与--delete-target-dir冲突)

    --check-column id

    --incremental append/lastmodified(时间戳的更改)

    --last-value 11

另外:

    --columns field1,field2,field3

    --query <ql> # 需要加 $CONDITIONS,且不能和--table连用

    --where <where xxx> # 无需加$CONDITIONS

Sqoop_mysql,hive,hdfs导入导出操作的更多相关文章

从零自学Hadoop(16)：Hive数据导入导出，集群数据迁移上
阅读目录序导入文件到Hive 将其他表的查询结果导入表动态分区插入将SQL语句的值插入到表中模拟数据文件下载系列索引本文版权归mephisto和博客园共有,欢迎转载,但须保留此段声明,并 ...
Hive数据导入导出的几种方式
一,Hive数据导入的几种方式首先列出讲述下面几种导入方式的数据和hive表. 导入: 本地文件导入到Hive表: Hive表导入到Hive表; HDFS文件导入到Hive表; 创建表的过程中从其他 ...
c# .Net :Excel NPOI导入导出操作教程之读取Excel文件信息及输出
c# .Net :Excel NPOI导入导出操作教程之读取Excel文件信息及输出 using NPOI.HSSF.UserModel;using NPOI.SS.UserModel;using S ...
Winform开发框架之通用数据导入导出操作的事务性操作完善
1.通用数据导入导出操作模块回顾在我的Winfrom开发框架里面,有一个通用的导入模块,它在默默处理这把规范的Excel数据导入到不同的对象表里面,一直用它来快速完成数据导入的工作.很早在随笔< ...
循序渐进开发WinForm项目（5)--Excel数据的导入导出操作
随笔背景:在很多时候,很多入门不久的朋友都会问我:我是从其他语言转到C#开发的,有没有一些基础性的资料给我们学习学习呢,你的框架感觉一下太大了,希望有个循序渐进的教程或者视频来学习就好了. 其实也许我 ...
利用sqoop将hive数据导入导出数据到mysql
一.导入导出数据库常用命令语句 1)列出mysql数据库中的所有数据库命令 # sqoop list-databases --connect jdbc:mysql://localhost:3306 ...
VB中Excel 2010的导入导出操作
VB中Excel 2010的导入导出操作编写人:左丘文 2015-4-11 近来这已是第二篇在讨论VB的相关问题,今天在这里,我想与大家一起分享一下在VB中如何从Excel中导入数据和导出数据到Ex ...
从零自学Hadoop(17)：Hive数据导入导出，集群数据迁移下
阅读目录序将查询的结果写入文件系统集群数据迁移一集群数据迁移二系列索引本文版权归mephisto和博客园共有,欢迎转载,但须保留此段声明,并给出原文链接,谢谢合作. 文章是哥(mephis ...
Hive 实战(1)--hive数据导入/导出基础
前沿: Hive也采用类SQL的语法, 但其作为数据仓库, 与面向OLTP的传统关系型数据库(Mysql/Oracle)有着天然的差别. 它用于离线的数据计算分析, 而不追求高并发/低延时的应用场景. ...

随机推荐

第二十五篇：在SOUI中做事件分发处理
不同的SOUI控件可以产生不同的事件.SOUI系统中提供了两种事件处理方式:事件订阅 + 事件处理映射表(参见第八篇:SOUI中控件事件的响应) 事件订阅由于直接将事件及事件处理函数连接,不存在事件分 ...
Shadow SSDT详解、WinDbg查看Shadow SSDT
一.获取ShadowSSDT 好吧,我们已经在R3获取SSDT的原始地址及SDT.SST.KiServiceTbale的关系里面提到:所有的SST都保存在系统服务描述表(SDT)中.系统中一共有两个S ...
Java中引用类 strong reference .SoftReference 、 WeakReference 和 PhantomReference的区别
当在 Java 2 平台中首次引入 java.lang.ref 包,其中包含 SoftReference . WeakReference 和 PhantomReference 三个引用类,引用类的 ...
POJ1201 Intervals差分约束系统（最短路）
Description You are given n closed, integer intervals [ai, bi] and n integers c1, ..., cn. Write a p ...
Codeforce 546D
Soldier and Number Game Time Limit:3000MS Memory Limit:262144KB 64bit IO Format:%I64d & ...
【Highcharts】动态删除series
先绘制,后删除多余 var chart = new Highcharts.Chart(options); if (chart.series.length > result.dataList0.l ...
MapReduce应用案例--简单排序
1. 设计思路在MapReduce过程中自带有排序,可以使用这个默认的排序达到我们的目的. MapReduce 是按照key值进行排序的,我们在Map过程中将读入的数据转化成IntWritable类 ...
HDU5737 : Differencia
注意到$b$不变,考虑用归并树来维护这个$b$序列,对于每个节点有序地维护$b$,同时在归并的时候预处理出每个元素在左右儿子里的排名. 在归并树上额外维护区间内$a\geq b$的个数以及赋值标记. ...
BFS+Hash(储存，判重) HDOJ 1067 Gap
题目传送门题意:一个图按照变成指定的图,问最少操作步数分析:状态转移简单,主要是在图的存储以及判重问题,原来队列里装二维数组内存也可以,判重用神奇的hash技术 #include <bits ...
Redis List命令
命令解释 lpush key string 在key对应list的头部添加字符串元素,返回1表示成功,0表示key存在且不是list类型. rpush key string 同上,尾插入. ...

Sqoop_mysql,hive,hdfs导入导出操作

Sqoop_mysql,hive,hdfs导入导出操作的更多相关文章

随机推荐

热门专题