Sqoop安装与应用过程
1. 参考说明
参考文档:
http://sqoop.apache.org/docs/1.99.7/admin/Installation.html
2. 安装环境说明
2.1. 环境说明
CentOS7.4+ Hadoop2.7.5的伪分布式环境
|
主机名 |
NameNode |
SecondaryNameNode |
DataNodes |
|
centoshadoop.smartmap.com |
192.168.1.80 |
192.168.1.80 |
192.168.1.80 |
Hadoop的安装目录为:/opt/hadoop/hadoop-2.7.5
3. 安装
Sqoop由两部分组成:客户端(client)和服务端(server)。需要在集群的其中某个节点上安装server,该节点的服务端可以作为其他Sqoop客户端的入口点。在服务端的节点上必须安装有Hadoop。客户端可以安装在任意数量的机子上。在装有客户端的机子上不需要安装Hadoop。
3.1. Sqoop下载
3.2. Sqoop解压
将下载的sqoop-1.99.7-bin-hadoop200.tar.gz解压到/opt/hadoop/sqoop-1.99.7目录下
4. 配置
4.1. 修改Hadoop配置
4.1.1. 修改core-site.xml文件
[root@centoshadoop hadoop]# vi
/opt/hadoop/hadoop-2.7.5/etc/hadoop/core-site.xml
<property>
<name>hadoop.proxyuser.sqoop2.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.sqoop2.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
4.1.2. 修改container-executor.cfg文件
[root@centoshadoop hadoop]# vi
/opt/hadoop/hadoop-2.7.5/etc/hadoop/container-executor.cfg
allowed.system.users=sqoop2,hadoop,root
4.1.3. 修改yarn-site.xml文件
[root@centoshadoop hadoop]# vi
/opt/hadoop/hadoop-2.7.5/etc/hadoop/yarn-site.xml
4.2. 创建第三方数据库驱动目录
[root@centoshadoop sqoop-1.99.7]# mkdir
/opt/hadoop/sqoop-1.99.7/jdbcDriver
[root@centoshadoop sqoop-1.99.7]# chmod g+rwx,o+rwx
/opt/hadoop/sqoop-1.99.7/jdbcDriver
4.3. 上传驱动到存放JDBC驱动的目录下
上传软件包mysql-connector-java-5.1.46.jar到/opt/hadoop/sqoop-1.99.7/jdbcDriver目录下
4.4. 修改profile文件
vi
/etc/profile
# export
JAVA_HOME=/opt/java/jdk1.7.0_45
export
JAVA_HOME=/opt/java/jdk1.8.0_171
# export
CLASSPATH=.:$JAVA_HOME/lib
export
HADOOP_HOME=/opt/hadoop/hadoop-2.7.5
export
HADOOP_INSTALL=$HADOOP_HOME
export
HADOOP_MAPRED_HOME=$HADOOP_HOME
export
HADOOP_COMMON_HOME=$HADOOP_HOME
export
HADOOP_HDFS_HOME=$HADOOP_HOME
export
YARN_HOME=$HADOOP_HOME
export
HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export
PIG_HOME=/opt/hadoop/pig-0.17.0
export
HIVE_HOME=/opt/hadoop/hive-2.3.3
export
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export
PATH=$PATH:$PIG_HOME/bin
export
PATH=$PATH:$HIVE_HOME/bin
export
SQOOP_HOME=/opt/hadoop/sqoop-1.99.7
export
PATH=$PATH:$SQOOP_HOME/bin
export
CLASSPATH=$CLASSPATH:$SQOOP_HOME/server/lib:$SQOOP_HOME/shell/lib:$SQOOP_HOME/tools/lib
export
SQOOP_SERVER_EXTRA_LIB=/opt/hadoop/sqoop-1.99.7/jdbcDriver/
export
CLASSPATH=.:$JAVA_HOME/lib:$HIVE_HOME/lib
export
LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64/server:/usr/local/lib:$HADOOP_HOME/lib/native
export
JAVA_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH
export
HADOOP_USER_NAME=hadoop
4.5. 将JDK升级为1.8版本
将JDK切换成1.8的版本,并修改所有与JAVA_HOME相关的变量
4.6. 修改Sqoop的配置文件
[root@centoshadoop bin]# vi
/opt/hadoop/sqoop-1.99.7/conf/sqoop.properties
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/opt/hadoop/hadoop-2.7.5/etc/hadoop/
[root@centoshadoop bin]# vi /opt/hadoop/sqoop-1.99.7/bin/sqoop.sh
export
HADOOP_COMMON_HOME=${HADOOP_HOME}/share/hadoop/common
export
HADOOP_HDFS_HOME=${HADOOP_HOME}/share/hadoop/hdfs
export
HADOOP_MAPRED_HOME=${HADOOP_HOME}/share/hadoop/mapreduce
export
HADOOP_YARN_HOME=${HADOOP_HOME}/share/hadoop/yarn
5. 启动Hadoop
5.1. 启动YARN与HDFS
cd
/opt/hadoop/hadoop-2.7.5/sbin
start-all.sh
5.2. 启动historyserver
cd
/opt/hadoop/hadoop-2.7.5/sbin
mr-jobhistory-daemon.sh start historyserver
6. 验证Sqoop安装与配置
[root@centoshadoop sqoop-1.99.7]# sqoop2-tool verify
Setting
conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf
Sqoop
home directory: /opt/hadoop/sqoop-1.99.7
Sqoop
tool executor:
Version: 1.99.7
Revision:
435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
Compiled on Tue Jul 19 16:08:27 PDT
2016 by abefine
Running
tool: class org.apache.sqoop.tools.tool.VerifyTool
1 [main]
INFO org.apache.sqoop.core.SqoopServer - Initializing Sqoop server.
6 [main]
INFO org.apache.sqoop.core.PropertiesConfigurationProvider - Starting config file poller thread
SLF4J:
Class path contains multiple SLF4J bindings.
SLF4J:
Found binding in
[jar:file:/opt/hadoop/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J:
Found binding in
[jar:file:/opt/hadoop/hive-2.3.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J:
See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
Verification was successful.
Tool
class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.
7. 初始化Sqoop的Repository
[root@centoshadoop sqoop-1.99.7]# sqoop2-tool upgrade
Setting
conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf
Sqoop
home directory: /opt/hadoop/sqoop-1.99.7
Sqoop
tool executor:
Version: 1.99.7
Revision:
435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
Compiled on Tue Jul 19 16:08:27 PDT
2016 by abefine
Running
tool: class org.apache.sqoop.tools.tool.UpgradeTool
0 [main]
INFO org.apache.sqoop.core.PropertiesConfigurationProvider - Starting config file poller thread
SLF4J:
Class path contains multiple SLF4J bindings.
SLF4J:
Found binding in
[jar:file:/opt/hadoop/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J:
Found binding in
[jar:file:/opt/hadoop/hive-2.3.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J:
See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
Tool
class org.apache.sqoop.tools.tool.UpgradeTool has finished correctly.
[root@centoshadoop sqoop-1.99.7]#
8. 应用Sqoop工具
8.1. 服务端
8.1.1. 启动与关闭sqoop2-server
[root@centoshadoop bin]# cd
/opt/hadoop/sqoop-1.99.7/bin/
[root@centoshadoop bin]# sqoop2-server start
Setting
conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf
Sqoop
home directory: /opt/hadoop/sqoop-1.99.7
Starting
the Sqoop2 server...
[root@centoshadoop bin]# sqoop2-server stop
Setting
conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf
Sqoop
home directory: /opt/hadoop/sqoop-1.99.7
Stopping
the Sqoop2 server...
Sqoop2
server stopped.
[root@centoshadoop bin]# jps
29780
NameNode
30085
SecondaryNameNode
30247
ResourceManager
31767
SqoopJettyServer
29913
DataNode
32107
Jps
30364
NodeManager
29245
JobHistoryServer
[root@centoshadoop bin]#
8.2. 客户端
8.2.1. 启动运行Sqoop的交互式Shell环境
[root@centoshadoop bin]# cd
/opt/hadoop/sqoop-1.99.7/bin/
[root@centoshadoop bin]# sqoop2-shell
sqoop:000> :exit
8.2.2. 设置错误显示
sqoop:000> set option --name
verbose --value true
Verbose
option was changed to true
8.2.3. 连接Sqoop服务端
sqoop:000> set server -host
192.168.1.80
Server
is set successfully
# set
server --host ubuntu02 --port 12000 --webapp sqoop
sqoop:000> show version
--all
client
version:
Sqoop
1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
Compiled by abefine on Tue Jul 19 16:08:27 PDT
2016
server
version:
Sqoop
1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
Compiled by abefine on Tue Jul 19 16:08:27 PDT
2016
API
versions:
[v1]
8.2.4. 创建MySQL的Link对象
sqoop:000> show
connector
+------------------------+---------+------------------------------------------------------------+----------------------+
| Name | Version | Class | Supported Directions |
+------------------------+---------+------------------------------------------------------------+----------------------+
|
generic-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
|
kite-connector | 1.99.7 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO |
|
oracle-jdbc-connector | 1.99.7 |
org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO |
|
ftp-connector | 1.99.7 | org.apache.sqoop.connector.ftp.FtpConnector | TO |
|
hdfs-connector | 1.99.7 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO |
|
kafka-connector | 1.99.7 | org.apache.sqoop.connector.kafka.KafkaConnector | TO |
|
sftp-connector | 1.99.7 | org.apache.sqoop.connector.sftp.SftpConnector | TO |
+------------------------+---------+------------------------------------------------------------+----------------------+
sqoop:000>
sqoop:000> create link -c
generic-jdbc-connector
Creating
link for connector with name generic-jdbc-connector
Please
fill following values to create new link object
Name: mysql-link
Database
connection
Driver
class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://192.168.1.80:3306/sqooptest?useSSL=false
Username: root
Password: ***
Fetch
Size:
Connection Properties:
There
are currently 0 values in the map:
entry#
SQL
Dialect
Identifier enclose: # 空格
New link
was successfully created with validation status OK and name
mysql-link
sqoop:000>
8.2.5. 创建HDFS的Link对象
sqoop:000> create link
-connector hdfs-connector
Creating
link for connector with name hdfs-connector
Please
fill following values to create new link object
Name:
hdfs-link
HDFS
cluster
URI: hdfs://192.168.1.80:9000
Conf
directory:
Additional configs: /opt/hadoop/hadoop-2.7.5/etc/hadoop
There
are currently 0 values in the map:
entry#
New link
was successfully created with validation status OK and name hdfs-link
sqoop:000>
8.2.6. 显示Link对象
sqoop:000> show link
--all
2
link(s) to show:
link
with name mysql-link (Enabled: true, Created by root at 5/23/18 3:48 PM, Updated
by root at 5/23/18 3:48 PM)
Using
Connector generic-jdbc-connector with name {1}
Database connection
Driver
class: com.mysql.jdbc.Driver
Connection
String: jdbc:mysql://192.168.1.80:3306/sqooptest?useSSL=false
Username:
root
Password:
Fetch
Size:
Connection
Properties:
protocol
= tcp
SQL
Dialect
Identifier
enclose: `
link
with name hdfs-link (Enabled: true, Created by root at 5/23/18 4:00 PM, Updated
by root at 5/23/18 4:00 PM)
Using
Connector hdfs-connector with name {1}
HDFS
cluster
URI:
hdfs://192.168.1.80:9000
Conf
directory:
Additional
configs::
sqoop:000>
8.2.7. 创建MySQL导数据到HDFS的Job
8.2.7.1. 准备数据
create
database sqooptest character set utf8;
CREATE
TABLE topic (
id int(11) NOT NULL
AUTO_INCREMENT,
name varchar(50) DEFAULT
NULL,
address varchar(100) DEFAULT
NULL,
time datetime DEFAULT
NULL,
content varchar(500) DEFAULT
NULL,
PRIMARY KEY(id)
)ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT
CHARSET=UTF8;
INSERT
INTO topic(name, address, time, content) values('11', '111', '2018-05-01
08:00:00', 'hello world');
INSERT
INTO topic(name, address, time, content) values('11', '111', '2018-05-01
08:00:00', NULL);
8.2.7.2. 创建任务(MySQLàHDFS)
sqoop:000> create job -f
mysql-link -t hdfs-link
Creating
job for links with from name mysql-link and to name hdfs-link
Please
fill following values to create new job object
Name: mysql2hdfs
Database
source
Schema
name: sqooptest #必填,数据库名称
Table
name: topic #必填,表名
SQL
statement:
Column
names:
There
are currently 0 values in the list:
element#
Partition column: id
Partition column nullable:
Boundary
query:
Incremental read
Check
column:
Last
value:
Target
configuration
Override
null value:
Null
value:
File
format:
0 :
TEXT_FILE
1 :
SEQUENCE_FILE
2 :
PARQUET_FILE
Choose:
0
Compression codec:
0 :
NONE
1 :
DEFAULT
2 :
DEFLATE
3 :
GZIP
4 :
BZIP2
5 :
LZO
6 :
LZ4
7 :
SNAPPY
8 :
CUSTOM
Choose:
0
Custom
codec:
Output
directory:/sqoop/sqooptest/mysqltable/ #必填
Append
mode:
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra
mapper jars:
There
are currently 0 values in the list:
element#
New job
was successfully created with validation status OK and name mysql2hdfs
sqoop:000>
8.2.7.3. 启动Job
[root@centoshadoop sbin]# hadoop fs –mkdir -p
/sqoop/sqooptest/mysqltable
[root@centoshadoop sbin]# hadoop fs –chmod –R 777 /sqoop
sqoop:000> show
job
+----+-------------------------+-------------------------------------+----------------------------+---------+
| Id
| Name | From Connector | To Connector | Enabled |
+----+-------------------------+-------------------------------------+----------------------------+---------+
| 1 | mysql2hdfs | mysql-link
(generic-jdbc-connector) | hdfs-link (hdfs-connector) | true |
+----+-------------------------+-------------------------------------+----------------------------+---------+
sqoop:000>
sqoop:000> set option --name
verbose --value true
Verbose
option was changed to true
sqoop:000> start job -name
mysql2hdfs
Submission details
Job
Name: mysql2hdfs
Server
URL: http://localhost:12000/sqoop/
Created
by: root
Creation
date: 2018-05-24 08:57:14 CST
Lastly
updated by: root
External
ID: job_1527123411152_0002
http://centoshadoop:8088/proxy/application_1527123411152_0002/
Source
Connector schema: Schema{name= sqooptest . topic ,columns=[
FixedPoint{name=id,nullable=true,type=FIXED_POINT,byteSize=4,signed=true},
Text{name=name,nullable=true,type=TEXT,charSize=null},
Text{name=address,nullable=true,type=TEXT,charSize=null},
Date{name=time,nullable=true,type=DATE_TIME,hasFraction=true,hasTimezone=false},
Text{name=content,nullable=true,type=TEXT,charSize=null}]}
2018-05-24 08:57:14 CST: BOOTING - Progress is not available
sqoop:000>
8.2.7.4. 查看结果
[root@centoshadoop sbin]# hadoop fs -ls
/sqoop/sqooptest/mysqltable
Found 1
items
-rw-r--r-- 1 root supergroup 97 2018-05-24 08:57
/sqoop/sqooptest/mysqltable/363e7159-49c3-40b2-937e-897acf2ad80b.txt
[root@centoshadoop sbin]# hadoop
fs -cat
/sqoop/sqooptest/mysqltable/363e7159-49c3-40b2-937e-897acf2ad80b.txt
3,'11','111','2018-05-01 08:00:00.000','hello
world'
4,'11','111','2018-05-01 08:00:00.000',NULL
[root@centoshadoop sbin]#
Sqoop安装与应用过程的更多相关文章
- [Hadoop] Sqoop安装过程详解
Sqoop是一个用来将Hadoop和关系型数据库中的数据相互转移的工具,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,也可 ...
- 图解MySQL5.5详细安装与配置过程
MySQL是一个开源的关系型数据库管理系统,原由瑞典MySQL AB公司开发,目前属于Oracle公司旗下.MySQL是目前世界上开源数据库中最受欢迎的产品之一,是应用最为广泛的开源数据库.MySQL ...
- Win7上Git安装及配置过程
Win7上Git安装及配置过程 文档名称 Win7上Git安装及配置过程 创建时间 2012/8/20 修改时间 2012/8/20 创建人 Baifx 简介(收获) 1.在win7上安装msysgi ...
- 记录sqoop同步失败问题解决过程,过程真的是很崎岖。(1月6日解决)
记录sqoop同步失败问题解决过程,过程真的是很崎岖.事发原因:最近突然出现sqoop export to mysql时频繁出错.看了下日志是卡在某条数据过不去了,看异常.看sqoop生成的mr并未发 ...
- SQL Server 2008安装和配置过程
下面我将用图解的方式,来介绍SQL Server 2008安装和配置过程,希望对大家有所帮助. 闲言少叙,直奔主题!点击setup.exe安装文件后,如果系统没有以下组件,则会出现如下提示! 安装20 ...
- VMware workstation12 pro安装Ubuntu14.04LTS过程笔记
由于近期需要加强自己在Linux/C++编程方面的需要,把原来的CentOS6.5格了....在windows8.1系统上重新安装了VMware和Ubuntu... VMware安装Ubuntu的过程 ...
- Ubuntu安装google chrome过程
Ubuntu安装google chrome过程: # wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd6 ...
- 如何将mysql数据导入Hadoop之Sqoop安装
Sqoop是一款开源的工具,主要用于在Hadoop(Hive)与传统的数据库(mysql.postgresql...)间进行数据的传递,可以将一个关系型数据库(例如 : MySQL ,Oracle , ...
- eclipse在线安装JBoss Tool过程
eclipse在线安装JBoss Tool过程 1.打开eclipse,依次点击"help--->Install New Software..." 2.再单击"Ad ...
随机推荐
- java中连接各种数据的方法
1.oraclethin驱动连接字符串:jdbc:oracle:thin:用户名/密码@localhost:1521:cake驱动类:oracle.jdbc.driver.OracleDriver 2 ...
- 前端基础——css
前端基础——css css的内容主要包括:盒子模型.定位.单位与取值.属性.选择器.
- 在Ubuntu 16.04上利用Jexus+.Net Core+Linux版SQL Server部署ZKEACMS.Core
百度传课视频地址:https://chuanke.baidu.com/v5849090-223278-1498090.html 一.SQL Server on Linux的安装 官方文档:https: ...
- Oracle数据库中NARCHAR转换成NUMBER类型
1.修改后的代码: public void addDirectorActorKeyword(long idStart, long idEnd) { SeriesMgr seriesMgr = new ...
- Java 多线程学习笔记:wait、notify、notifyAll的阻塞和恢复
前言:昨天尝试用Java自行实现生产者消费者问题(Producer-Consumer Problem),在coding时,使用到了Condition的await和signalAll方法,然后顺便想起了 ...
- windows线程退出的方法
线程的handle用处: 线程的handle是指向“线程的内核对象”的,而不是指向线程本身.每个内核对象只是内核分配的一个内存块,并且只能由内核访问.该内存块是一种数据结构,它的成员负责维护对象的各种 ...
- spring-boot-starter-actuator
首先在pom中添加依赖 pom.xml <?xml version="1.0" encoding="UTF-8"?> <project xml ...
- mysql 删除单表内多个字段重复的数据
mysql 删除单表内多个字段重复的数据 DELETE from lot_log_payflow WHERE (pay_no,sub_flow_type) in () s1) AND id ) s2) ...
- WPF ViewBox中的TextBlock自适应
想让 TextBlock即换行又能自动根据内容进行缩放,说到自动缩放,当然是ViewBox控件了,而TextBlock有TextWrapping属性控制换行, 所以在ViewBox中套用一个TextB ...
- DataSet 多表关系
protected void Page_Load(object sender, EventArgs e) { string connectionString = @"Data Source= ...