1.  参考说明

参考文档:

http://sqoop.apache.org/

http://sqoop.apache.org/docs/1.99.7/admin/Installation.html

2.  安装环境说明

2.1.  环境说明

CentOS7.4+ Hadoop2.7.5的伪分布式环境

主机名

NameNode

SecondaryNameNode

DataNodes

centoshadoop.smartmap.com

192.168.1.80

192.168.1.80

192.168.1.80

Hadoop的安装目录为:/opt/hadoop/hadoop-2.7.5

3.  安装

Sqoop由两部分组成:客户端(client)和服务端(server)。需要在集群的其中某个节点上安装server,该节点的服务端可以作为其他Sqoop客户端的入口点。在服务端的节点上必须安装有Hadoop。客户端可以安装在任意数量的机子上。在装有客户端的机子上不需要安装Hadoop。

3.1.  Sqoop下载

http://sqoop.apache.org/

3.2.  Sqoop解压

将下载的sqoop-1.99.7-bin-hadoop200.tar.gz解压到/opt/hadoop/sqoop-1.99.7目录下

4.  配置

4.1.  修改Hadoop配置

4.1.1.  修改core-site.xml文件

[root@centoshadoop hadoop]# vi
/opt/hadoop/hadoop-2.7.5/etc/hadoop/core-site.xml

<property>

<name>hadoop.proxyuser.sqoop2.hosts</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.sqoop2.groups</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.root.hosts</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.root.groups</name>

<value>*</value>

</property>

4.1.2.  修改container-executor.cfg文件

[root@centoshadoop hadoop]# vi
/opt/hadoop/hadoop-2.7.5/etc/hadoop/container-executor.cfg

allowed.system.users=sqoop2,hadoop,root

4.1.3.  修改yarn-site.xml文件

[root@centoshadoop hadoop]# vi
/opt/hadoop/hadoop-2.7.5/etc/hadoop/yarn-site.xml

4.2.  创建第三方数据库驱动目录

[root@centoshadoop sqoop-1.99.7]# mkdir
/opt/hadoop/sqoop-1.99.7/jdbcDriver

[root@centoshadoop sqoop-1.99.7]# chmod g+rwx,o+rwx
/opt/hadoop/sqoop-1.99.7/jdbcDriver

4.3.  上传驱动到存放JDBC驱动的目录下

上传软件包mysql-connector-java-5.1.46.jar到/opt/hadoop/sqoop-1.99.7/jdbcDriver目录下

4.4.  修改profile文件

vi
/etc/profile

# export
JAVA_HOME=/opt/java/jdk1.7.0_45

export
JAVA_HOME=/opt/java/jdk1.8.0_171

# export
CLASSPATH=.:$JAVA_HOME/lib

export
HADOOP_HOME=/opt/hadoop/hadoop-2.7.5

export
HADOOP_INSTALL=$HADOOP_HOME

export
HADOOP_MAPRED_HOME=$HADOOP_HOME

export
HADOOP_COMMON_HOME=$HADOOP_HOME

export
HADOOP_HDFS_HOME=$HADOOP_HOME

export
YARN_HOME=$HADOOP_HOME

export
HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export
PIG_HOME=/opt/hadoop/pig-0.17.0

export
HIVE_HOME=/opt/hadoop/hive-2.3.3

export
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export
PATH=$PATH:$PIG_HOME/bin

export
PATH=$PATH:$HIVE_HOME/bin

export
SQOOP_HOME=/opt/hadoop/sqoop-1.99.7

export
PATH=$PATH:$SQOOP_HOME/bin

export
CLASSPATH=$CLASSPATH:$SQOOP_HOME/server/lib:$SQOOP_HOME/shell/lib:$SQOOP_HOME/tools/lib

export
SQOOP_SERVER_EXTRA_LIB=/opt/hadoop/sqoop-1.99.7/jdbcDriver/

export
CLASSPATH=.:$JAVA_HOME/lib:$HIVE_HOME/lib

export
LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64/server:/usr/local/lib:$HADOOP_HOME/lib/native

export
JAVA_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH

export
HADOOP_USER_NAME=hadoop

4.5.  将JDK升级为1.8版本

将JDK切换成1.8的版本,并修改所有与JAVA_HOME相关的变量

4.6.  修改Sqoop的配置文件

[root@centoshadoop bin]# vi
/opt/hadoop/sqoop-1.99.7/conf/sqoop.properties

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/opt/hadoop/hadoop-2.7.5/etc/hadoop/

[root@centoshadoop bin]# vi /opt/hadoop/sqoop-1.99.7/bin/sqoop.sh

export
HADOOP_COMMON_HOME=${HADOOP_HOME}/share/hadoop/common

export
HADOOP_HDFS_HOME=${HADOOP_HOME}/share/hadoop/hdfs

export
HADOOP_MAPRED_HOME=${HADOOP_HOME}/share/hadoop/mapreduce

export
HADOOP_YARN_HOME=${HADOOP_HOME}/share/hadoop/yarn

5.  启动Hadoop

5.1.  启动YARN与HDFS

cd
/opt/hadoop/hadoop-2.7.5/sbin

start-all.sh

5.2.  启动historyserver

cd
/opt/hadoop/hadoop-2.7.5/sbin

mr-jobhistory-daemon.sh start historyserver

6.  验证Sqoop安装与配置

[root@centoshadoop sqoop-1.99.7]# sqoop2-tool verify

Setting
conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf

Sqoop
home directory: /opt/hadoop/sqoop-1.99.7

Sqoop
tool executor:

Version: 1.99.7

Revision:
435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

Compiled on Tue Jul 19 16:08:27 PDT
2016 by abefine

Running
tool: class org.apache.sqoop.tools.tool.VerifyTool

1   [main]
INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.

6   [main]
INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread

SLF4J:
Class path contains multiple SLF4J bindings.

SLF4J:
Found binding in
[jar:file:/opt/hadoop/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J:
Found binding in
[jar:file:/opt/hadoop/hive-2.3.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J:
See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.

Verification was successful.

Tool
class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.

7.  初始化Sqoop的Repository

[root@centoshadoop sqoop-1.99.7]# sqoop2-tool upgrade

Setting
conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf

Sqoop
home directory: /opt/hadoop/sqoop-1.99.7

Sqoop
tool executor:

Version: 1.99.7

Revision:
435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

Compiled on Tue Jul 19 16:08:27 PDT
2016 by abefine

Running
tool: class org.apache.sqoop.tools.tool.UpgradeTool

0   [main]
INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread

SLF4J:
Class path contains multiple SLF4J bindings.

SLF4J:
Found binding in
[jar:file:/opt/hadoop/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J:
Found binding in
[jar:file:/opt/hadoop/hive-2.3.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J:
See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.

Tool
class org.apache.sqoop.tools.tool.UpgradeTool has finished correctly.

[root@centoshadoop sqoop-1.99.7]#

8.  应用Sqoop工具

8.1.  服务端

8.1.1.  启动与关闭sqoop2-server

[root@centoshadoop bin]# cd
/opt/hadoop/sqoop-1.99.7/bin/

[root@centoshadoop bin]# sqoop2-server start

Setting
conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf

Sqoop
home directory: /opt/hadoop/sqoop-1.99.7

Starting
the Sqoop2 server...

[root@centoshadoop bin]# sqoop2-server stop

Setting
conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf

Sqoop
home directory: /opt/hadoop/sqoop-1.99.7

Stopping
the Sqoop2 server...

Sqoop2
server stopped.

[root@centoshadoop bin]# jps

29780
NameNode

30085
SecondaryNameNode

30247
ResourceManager

31767
SqoopJettyServer

29913
DataNode

32107
Jps

30364
NodeManager

29245
JobHistoryServer

[root@centoshadoop bin]#

8.2.  客户端

8.2.1.  启动运行Sqoop的交互式Shell环境

[root@centoshadoop bin]# cd
/opt/hadoop/sqoop-1.99.7/bin/

[root@centoshadoop bin]# sqoop2-shell

sqoop:000> :exit

8.2.2.  设置错误显示

sqoop:000> set option --name
verbose --value true

Verbose
option was changed to true

8.2.3.  连接Sqoop服务端

sqoop:000> set server -host
192.168.1.80

Server
is set successfully

# set
server --host ubuntu02 --port 12000 --webapp sqoop

sqoop:000> show version
--all

client
version:

Sqoop
1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

Compiled by abefine on Tue Jul 19 16:08:27 PDT
2016

server
version:

Sqoop
1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

Compiled by abefine on Tue Jul 19 16:08:27 PDT
2016

API
versions:

[v1]

8.2.4.  创建MySQL的Link对象

sqoop:000> show
connector

+------------------------+---------+------------------------------------------------------------+----------------------+

|          Name          | Version |                           Class                            | Supported Directions |

+------------------------+---------+------------------------------------------------------------+----------------------+

|
generic-jdbc-connector | 1.99.7  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector       | FROM/TO              |

|
kite-connector         | 1.99.7  | org.apache.sqoop.connector.kite.KiteConnector              | FROM/TO              |

|
oracle-jdbc-connector  | 1.99.7 |
org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO              |

|
ftp-connector          | 1.99.7  | org.apache.sqoop.connector.ftp.FtpConnector                | TO                   |

|
hdfs-connector         | 1.99.7  | org.apache.sqoop.connector.hdfs.HdfsConnector              | FROM/TO              |

|
kafka-connector        | 1.99.7  | org.apache.sqoop.connector.kafka.KafkaConnector            | TO                   |

|
sftp-connector         | 1.99.7  | org.apache.sqoop.connector.sftp.SftpConnector              | TO                   |

+------------------------+---------+------------------------------------------------------------+----------------------+

sqoop:000>

sqoop:000> create link -c
generic-jdbc-connector

Creating
link for connector with name generic-jdbc-connector

Please
fill following values to create new link object

Name: mysql-link

Database
connection

Driver
class: com.mysql.jdbc.Driver

Connection String: jdbc:mysql://192.168.1.80:3306/sqooptest?useSSL=false

Username: root

Password: ***

Fetch
Size:

Connection Properties:

There
are currently 0 values in the map:

entry#

SQL
Dialect

Identifier enclose:       # 空格

New link
was successfully created with validation status OK and name
mysql-link

sqoop:000>

8.2.5.  创建HDFS的Link对象

sqoop:000> create link
-connector hdfs-connector

Creating
link for connector with name hdfs-connector

Please
fill following values to create new link object

Name:
hdfs-link

HDFS
cluster

URI: hdfs://192.168.1.80:9000

Conf
directory:

Additional configs: /opt/hadoop/hadoop-2.7.5/etc/hadoop

There
are currently 0 values in the map:

entry#

New link
was successfully created with validation status OK and name hdfs-link

sqoop:000>

8.2.6.  显示Link对象

sqoop:000> show link
--all

2
link(s) to show:

link
with name mysql-link (Enabled: true, Created by root at 5/23/18 3:48 PM, Updated
by root at 5/23/18 3:48 PM)

Using
Connector generic-jdbc-connector with name {1}

Database connection

Driver
class: com.mysql.jdbc.Driver

Connection
String: jdbc:mysql://192.168.1.80:3306/sqooptest?useSSL=false

Username:
root

Password:

Fetch
Size:

Connection
Properties:

protocol
= tcp

SQL
Dialect

Identifier
enclose: `

link
with name hdfs-link (Enabled: true, Created by root at 5/23/18 4:00 PM, Updated
by root at 5/23/18 4:00 PM)

Using
Connector hdfs-connector with name {1}

HDFS
cluster

URI:
hdfs://192.168.1.80:9000

Conf
directory:

Additional
configs::

sqoop:000>

8.2.7.  创建MySQL导数据到HDFS的Job

8.2.7.1. 准备数据

create
database sqooptest character set utf8;

CREATE
TABLE topic (

id int(11) NOT NULL
AUTO_INCREMENT,

name varchar(50) DEFAULT
NULL,

address varchar(100) DEFAULT
NULL,

time datetime DEFAULT
NULL,

content varchar(500) DEFAULT
NULL,

PRIMARY KEY(id)

)ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT
CHARSET=UTF8;

INSERT
INTO topic(name, address, time, content) values('11', '111', '2018-05-01
08:00:00', 'hello world');

INSERT
INTO topic(name, address, time, content) values('11', '111', '2018-05-01
08:00:00', NULL);

8.2.7.2. 创建任务(MySQLàHDFS)

sqoop:000> create job -f
mysql-link -t hdfs-link

Creating
job for links with from name mysql-link and to name hdfs-link

Please
fill following values to create new job object

Name:  mysql2hdfs

Database
source

Schema
name: sqooptest   #必填,数据库名称

Table
name: topic  #必填,表名

SQL
statement:

Column
names:

There
are currently 0 values in the list:

element#

Partition column: id

Partition column nullable:

Boundary
query:

Incremental read

Check
column:

Last
value:

Target
configuration

Override
null value:

Null
value:

File
format:

0 :
TEXT_FILE

1 :
SEQUENCE_FILE

2 :
PARQUET_FILE

Choose:
0

Compression codec:

0 :
NONE

1 :
DEFAULT

2 :
DEFLATE

3 :
GZIP

4 :
BZIP2

5 :
LZO

6 :
LZ4

7 :
SNAPPY

8 :
CUSTOM

Choose:
0

Custom
codec:

Output
directory:/sqoop/sqooptest/mysqltable/    #必填

Append
mode:

Throttling resources

Extractors:

Loaders:

Classpath configuration

Extra
mapper jars:

There
are currently 0 values in the list:

element#

New job
was successfully created with validation status OK  and name  mysql2hdfs

sqoop:000>

8.2.7.3. 启动Job

[root@centoshadoop sbin]# hadoop fs –mkdir -p
/sqoop/sqooptest/mysqltable

[root@centoshadoop sbin]# hadoop fs –chmod –R 777 /sqoop

sqoop:000> show
job

+----+-------------------------+-------------------------------------+----------------------------+---------+

| Id
|          Name           |           From Connector            |        To Connector        | Enabled |

+----+-------------------------+-------------------------------------+----------------------------+---------+

| 1  |  mysql2hdfs | mysql-link
(generic-jdbc-connector) | hdfs-link (hdfs-connector) | true    |

+----+-------------------------+-------------------------------------+----------------------------+---------+

sqoop:000>

sqoop:000> set option --name
verbose --value true

Verbose
option was changed to true

sqoop:000> start job -name
mysql2hdfs

Submission details

Job
Name: mysql2hdfs

Server
URL: http://localhost:12000/sqoop/

Created
by: root

Creation
date: 2018-05-24 08:57:14 CST

Lastly
updated by: root

External
ID: job_1527123411152_0002

http://centoshadoop:8088/proxy/application_1527123411152_0002/

Source
Connector schema: Schema{name=  sqooptest  .  topic  ,columns=[

FixedPoint{name=id,nullable=true,type=FIXED_POINT,byteSize=4,signed=true},

Text{name=name,nullable=true,type=TEXT,charSize=null},

Text{name=address,nullable=true,type=TEXT,charSize=null},

Date{name=time,nullable=true,type=DATE_TIME,hasFraction=true,hasTimezone=false},

Text{name=content,nullable=true,type=TEXT,charSize=null}]}

2018-05-24 08:57:14 CST: BOOTING  - Progress is not available

sqoop:000>

8.2.7.4. 查看结果

[root@centoshadoop sbin]# hadoop fs -ls
/sqoop/sqooptest/mysqltable

Found 1
items

-rw-r--r--   1 root supergroup         97 2018-05-24 08:57
/sqoop/sqooptest/mysqltable/363e7159-49c3-40b2-937e-897acf2ad80b.txt

[root@centoshadoop sbin]# hadoop
fs -cat
/sqoop/sqooptest/mysqltable/363e7159-49c3-40b2-937e-897acf2ad80b.txt

3,'11','111','2018-05-01 08:00:00.000','hello
world'

4,'11','111','2018-05-01 08:00:00.000',NULL

[root@centoshadoop sbin]#

Sqoop安装与应用过程的更多相关文章

  1. [Hadoop]&nbsp;Sqoop安装过程详解

    Sqoop是一个用来将Hadoop和关系型数据库中的数据相互转移的工具,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,也可 ...

  2. 图解MySQL5.5详细安装与配置过程

    MySQL是一个开源的关系型数据库管理系统,原由瑞典MySQL AB公司开发,目前属于Oracle公司旗下.MySQL是目前世界上开源数据库中最受欢迎的产品之一,是应用最为广泛的开源数据库.MySQL ...

  3. Win7上Git安装及配置过程

    Win7上Git安装及配置过程 文档名称 Win7上Git安装及配置过程 创建时间 2012/8/20 修改时间 2012/8/20 创建人 Baifx 简介(收获) 1.在win7上安装msysgi ...

  4. 记录sqoop同步失败问题解决过程,过程真的是很崎岖。(1月6日解决)

    记录sqoop同步失败问题解决过程,过程真的是很崎岖.事发原因:最近突然出现sqoop export to mysql时频繁出错.看了下日志是卡在某条数据过不去了,看异常.看sqoop生成的mr并未发 ...

  5. SQL Server 2008安装和配置过程

    下面我将用图解的方式,来介绍SQL Server 2008安装和配置过程,希望对大家有所帮助. 闲言少叙,直奔主题!点击setup.exe安装文件后,如果系统没有以下组件,则会出现如下提示! 安装20 ...

  6. VMware workstation12 pro安装Ubuntu14.04LTS过程笔记

    由于近期需要加强自己在Linux/C++编程方面的需要,把原来的CentOS6.5格了....在windows8.1系统上重新安装了VMware和Ubuntu... VMware安装Ubuntu的过程 ...

  7. Ubuntu安装google chrome过程

    Ubuntu安装google chrome过程: # wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd6 ...

  8. 如何将mysql数据导入Hadoop之Sqoop安装

    Sqoop是一款开源的工具,主要用于在Hadoop(Hive)与传统的数据库(mysql.postgresql...)间进行数据的传递,可以将一个关系型数据库(例如 : MySQL ,Oracle , ...

  9. eclipse在线安装JBoss Tool过程

    eclipse在线安装JBoss Tool过程 1.打开eclipse,依次点击"help--->Install New Software..." 2.再单击"Ad ...

随机推荐

  1. (转)学会数据库读写分离、分表分库——用Mycat,这一篇就够了!

    原文:https://www.cnblogs.com/joylee/p/7513038.html 系统开发中,数据库是非常重要的一个点.除了程序的本身的优化,如:SQL语句优化.代码优化,数据库的处理 ...

  2. 常见数据结构的Java实现

    单链表的Java实现 首先参考wiki上的单链表说明,单链表每个节点包含数据和指向链表中下一个节点的指针或引用.然后看代码 import java.lang.*; public class Singl ...

  3. Android_view的生命周期

    onFinishInflate() 当View中所有的子控件均被映射成xml后触发 onMeasure( int , int ) 确定所有子元素的大小 onLayout( boolean , int ...

  4. 全网最详细的hive-site.xml配置文件里添加<name>hive.cli.print.header</name>和<name>hive.cli.print.current.db</name>前后的变化(图文详解)

    不多说,直接上干货! 比如,你是从hive-default.xml.template,复制一份,改名为hive-site.xml 一般是 <configuration> <prope ...

  5. Log4j按级别输出日志到不同文件配置

    1.自定义LogFileAppender类,继承DailyRollingFileAppender,实现Log4j按级别输出日志到不同文件. package com.liying.mango.commo ...

  6. 选择排序:直接选择排序&堆排序

    上一篇中, 介绍了交换排序中的冒泡排序和快速排序, 那么这一篇就来介绍一下 选择排序和堆排序, 以及他们与快速排序的比较. 一.直接选择排序 1. 思想 在描述直接选择排序思想之前, 先来一个假设吧. ...

  7. html中img图片居中

    直接看代码 style="display:block; margin:0 auto;" 可以看到,蓝色的代码是让 块状元素居中 而红色的代码,是让img转成块状元素 ···原来im ...

  8. [Python学习笔记-006] 使用stomp.py校验JMS selector的正确性

    了解Jenkins的人都知道,JMS selector是基于SQL92语法实现的,本文将介绍使用stomp.py和ActiveMQ来校验JMS selector的正确性. Q: 什么是stomp.py ...

  9. ResNet 论文研读笔记

    Deep Residual Learning for Image Recognition 原文链接 摘要 深度神经网络很难去训练,本文提出了一个残差学习框架来简化那些非常深的网络的训练,该框架使得层能 ...

  10. RecyclerView 使用指南

    最近看了很多 RecyclerView 的使用文章,一直晕乎乎的,完全不知道套路是啥.很多人都是直接上代码,但是却没有详细说明代码的使用,于是打算自己写写,理理思路.顺便帮助那些正在学习 Androi ...