Sqoop Export HDFS

Sqoop Export应用场景——直接导出

直接导出

　　我们先复制一个表，然后将上一篇博文(Sqoop Import HDFS)导入的数据再导出到我们所复制的表里。

sqoop export \

--connect 'jdbc:mysql://202.193.60.117/dataweb?useUnicode=true&characterEncoding=utf-8' \

--username root \

--password-file /user/hadoop/.password \

--table user_info_copy \

--export-dir /user/hadoop/user_info \

--input-fields-terminated-by "," //此处分隔符根据建表时所用分隔符确定，可查看博客sqoop导出hive数据到mysql错误： Caused by: java.lang.RuntimeException: Can't parse input data

　　运行过程如下：

// :: INFO mapreduce.Job:  map % reduce %

// :: INFO mapreduce.Job:  map % reduce %

// :: INFO mapreduce.Job: Job job_1529567189245_0010 completed successfully

// :: INFO mapreduce.Job: Counters:

    File System Counters

        FILE: Number of bytes read=

        FILE: Number of bytes written=

        FILE: Number of read operations=

        FILE: Number of large read operations=

        FILE: Number of write operations=

        HDFS: Number of bytes read=

        HDFS: Number of bytes written=

        HDFS: Number of read operations=

        HDFS: Number of large read operations=

        HDFS: Number of write operations=

    Job Counters

        Launched map tasks=3

        Data-local map tasks=3   //map数为3，在下面可以指定map数来执行导出操作

        Total time spent by all maps in occupied slots (ms)=

        Total time spent by all reduces in occupied slots (ms)=

        Total time spent by all map tasks (ms)=

        Total vcore-seconds taken by all map tasks=

        Total megabyte-seconds taken by all map tasks=

    Map-Reduce Framework

        Map input records=

        Map output records=

        Input split bytes=

        Spilled Records=

        Failed Shuffles=

        Merged Map outputs=

        GC time elapsed (ms)=

        CPU time spent (ms)=

        Physical memory (bytes) snapshot=

        Virtual memory (bytes) snapshot=

        Total committed heap usage (bytes)=

    File Input Format Counters

        Bytes Read=

    File Output Format Counters

        Bytes Written=

// :: INFO mapreduce.ExportJobBase: Transferred  bytes in 38.2702 seconds (18.1865 bytes/sec)

// :: INFO mapreduce.ExportJobBase: Exported  records.

　　导入成功后我们再手动查看一下数据库。

　　上图表示我们的导入是成功的。

指定Map个数

sqoop export \

--connect 'jdbc:mysql://202.193.60.117/dataweb?useUnicode=true&characterEncoding=utf-8' \

--username root \

--password-file /user/hadoop/.password \

--table user_info_copy \

--export-dir /user/hadoop/user_info \

--input-fields-terminated-by "," \

-m 1 //map数设定为1

　　先清除本地数据库数据之后再测试。

// :: INFO mapreduce.Job:  map % reduce %

// :: INFO mapreduce.Job:  map % reduce %

// :: INFO mapreduce.Job: Job job_1529567189245_0011 completed successfully

// :: INFO mapreduce.Job: Counters:

    File System Counters

        FILE: Number of bytes read=

        FILE: Number of bytes written=

        FILE: Number of read operations=

        FILE: Number of large read operations=

        FILE: Number of write operations=

        HDFS: Number of bytes read=

        HDFS: Number of bytes written=

        HDFS: Number of read operations=

        HDFS: Number of large read operations=

        HDFS: Number of write operations=

    Job Counters

        Launched map tasks=1

        Data-local map tasks=1   //map数变为了1个

        Total time spent by all maps in occupied slots (ms)=

        Total time spent by all reduces in occupied slots (ms)=

        Total time spent by all map tasks (ms)=

        Total vcore-seconds taken by all map tasks=

        Total megabyte-seconds taken by all map tasks=

    Map-Reduce Framework

        Map input records=

        Map output records=

        Input split bytes=

        Spilled Records=

        Failed Shuffles=

        Merged Map outputs=

        GC time elapsed (ms)=

        CPU time spent (ms)=

        Physical memory (bytes) snapshot=

        Virtual memory (bytes) snapshot=

        Total committed heap usage (bytes)=

    File Input Format Counters

        Bytes Read=

    File Output Format Counters

        Bytes Written=

// :: INFO mapreduce.ExportJobBase: Transferred  bytes in 25.1976 seconds (12.9774 bytes/sec)  //执行时间也较上面减少了

// :: INFO mapreduce.ExportJobBase: Exported  records.

Sqoop Export应用场景——插入和更新

　　先将已经插入的信息作一点修改，然后重新导入，导入之后会将我们修改的信息又给复原回去。

　　执行命令

sqoop export \

--connect 'jdbc:mysql://202.193.60.117/dataweb?useUnicode=true&characterEncoding=utf-8' \

--username root \

--password-file /user/hadoop/.password \

--table user_info_copy \

--export-dir /user/hadoop/user_info \

--input-fields-terminated-by "," \

-m  \

--update-key id \

--update-mode allowinsert  //默认为updateonly（只更新），也可以设置为allowinsert（允许插入）

　　执行完毕后，信息又重新修改了回来。

Sqoop Export应用场景

事务处理

　　在将HDFS上的数据导入到数据库中之前先导入到一个临时表tmp中，如果导入成功的话，再转移到目标表中去。

sqoop export \

--connect 'jdbc:mysql://202.193.60.117/dataweb?useUnicode=true&characterEncoding=utf-8' \

--username root \

--password-file /user/hadoop/.password \

--table user_info_copy \

--staging-table user_info_tmp \  //临时表需要提前创建，可直接复制再重命名

--clear-staging-table \

--export-dir /user/hadoop/user_info \

--input-fields-terminated-by ","

// :: INFO mapreduce.Job:  map % reduce %

// :: INFO mapreduce.Job:  map % reduce %

// :: INFO mapreduce.Job: Job job_1529567189245_0014 completed successfully

// :: INFO mapreduce.Job: Counters:

    File System Counters

        FILE: Number of bytes read=

        FILE: Number of bytes written=

        FILE: Number of read operations=

        FILE: Number of large read operations=

        FILE: Number of write operations=

        HDFS: Number of bytes read=

        HDFS: Number of bytes written=

        HDFS: Number of read operations=

        HDFS: Number of large read operations=

        HDFS: Number of write operations=

    Job Counters

        Launched map tasks=

        Data-local map tasks=

        Total time spent by all maps in occupied slots (ms)=

        Total time spent by all reduces in occupied slots (ms)=

        Total time spent by all map tasks (ms)=

        Total vcore-seconds taken by all map tasks=

        Total megabyte-seconds taken by all map tasks=

    Map-Reduce Framework

        Map input records=

        Map output records=

        Input split bytes=

        Spilled Records=

        Failed Shuffles=

        Merged Map outputs=

        GC time elapsed (ms)=

        CPU time spent (ms)=

        Physical memory (bytes) snapshot=

        Virtual memory (bytes) snapshot=

        Total committed heap usage (bytes)=

    File Input Format Counters

        Bytes Read=

    File Output Format Counters

        Bytes Written=

// :: INFO mapreduce.ExportJobBase: Transferred  bytes in 36.8371 seconds (18.894 bytes/sec)

// :: INFO mapreduce.ExportJobBase: Exported  records.

// :: INFO mapreduce.ExportJobBase: Starting to migrate data from staging table to destination.

// :: INFO manager.SqlManager: Migrated 3 records from `user_info_tmp` to `user_info_copy`

字段不对应问题

　　先将数据库中的表内容导入到hdfs上（但不是所有的内容都导入，而是只导入部分字段，在这里就没有导入id字段），然后再从hdfs导出到本地数据库中。

[hadoop@centpy hadoop-2.6.]$ sqoop import  --connect jdbc:mysql://202.193.60.117/dataweb

> --username root

> --password-file /user/hadoop/.password

> --table user_info

> --columns name,password,intStatus //确定导入哪些字段

> --target-dir /user/hadoop/user_info

> --delete-target-dir

> --fields-terminated-by ","

> -m 1

 [hadoop@centpy hadoop-2.6.]$ hdfs dfs -cat /user/hadoop/user_info/part-m-* admin,, hello,, hahaha,haha,

　　可以看到我们此处导入的数据和数据库相比少了“id”这个字段，接下来，我们如果不使用上面的columns字段，仍然按照原来的方式导入，肯定会报错，因为这和我们的数据库格式和字段不匹配。如下所示：

[hadoop@centpy hadoop-2.6.]$ sqoop export \

> --connect 'jdbc:mysql://202.193.60.117/dataweb?useUnicode=true&characterEncoding=utf-8' \

> --username root \

> --password-file /user/hadoop/.password \

> --table user_info_copy \

> --export-dir /user/hadoop/user_info \

> --input-fields-terminated-by ","  \

> -m 1

　　要实现字段不匹配导入必须使用columns字段导出。

[hadoop@centpy hadoop-2.6.]$ sqoop export \

> --connect 'jdbc:mysql://202.193.60.117/dataweb?useUnicode=true&characterEncoding=utf-8' \

> --username root \

> --password-file /user/hadoop/.password \

> --table user_info_copy \

> --columns name,password,intStatus \

> --export-dir /user/hadoop/user_info \

> --input-fields-terminated-by ","  \

以上就是博主为大家介绍的这一板块的主要内容，这都是博主自己的学习过程，希望能给大家带来一定的指导作用，有用的还望大家点个支持，如果对你没用也望包涵，有错误烦请指出。如有期待可关注博主以第一时间获取更新哦，谢谢！

Sqoop Export HDFS的更多相关文章

（MySQL里的数据）通过Sqoop Import HDFS 里和通过Sqoop Export HDFS 里的数据到（MySQL）（五）
下面我们结合 HDFS,介绍 Sqoop 从关系型数据库的导入和导出一.MySQL里的数据通过Sqoop import HDFS 它的功能是将数据从关系型数据库导入 HDFS 中,其流程图如下所示. ...
（MySQL里的数据）通过Sqoop Import Hive 里和通过Sqoop Export Hive 里的数据到（MySQL）
Sqoop 可以与Hive系统结合,实现数据的导入和导出,用户需要在 sqoop-env.sh 中添加HIVE_HOME的环境变量. 具体,见我的如下博客: hadoop2.6.0(单节点)下Sqoo ...
（MySQL里的数据）通过Sqoop Import HBase 里和通过Sqoop Export HBase 里的数据到（MySQL）
Sqoop 可以与HBase系统结合,实现数据的导入和导出,用户需要在 sqoop-env.sh 中添加HBASE_HOME的环境变量. 具体,见我的如下博客: hadoop2.6.0(单节点)下Sq ...
sqoop导入hdfs上的数据到oracle
/opt/sqoop-/bin/sqoop export --table mytablename --connect jdbc:oracle:thin:@**.**.**.**:***:dbasena ...
Hadoop生态组件Hive，Sqoop安装及Sqoop从HDFS/hive抽取数据到关系型数据库Mysql
一般Hive依赖关系型数据库Mysql,故先安装Mysql $: yum install mysql-server mysql-client [yum安装] $: /etc/init.d/mysqld ...
通过sqoop将hdfs数据导入MySQL
简介:Sqoop是一款开源的工具,主要用于在Hadoop(Hive)与传统的数据库(mysql.postgresql...)间进行数据的传递,可以将一个关系型数据库(例如 : MySQL ,Oracl ...
Sqoop与HDFS、Hive、Hbase等系统的数据同步操作
Sqoop与HDFS结合下面我们结合 HDFS,介绍 Sqoop 从关系型数据库的导入和导出. Sqoop import 它的功能是将数据从关系型数据库导入 HDFS 中,其流程图如下所示. 我们来 ...
一个sqoop export案例中踩到的坑
案例分析: 需要将hdfs上的数据导出到mysql里的一张表里. 虚拟机集群的为:centos1-centos5 问题1: 在centos1上将hdfs上的数据导出到centos1上的mysql里: ...
sqoop从hdfs 中导出数据到mysql
bin/sqoop export \ --connect "jdbc:mysql://mini1:3306/study?useUnicode=true&characterEncodi ...

随机推荐

TS学习之泛型
可以使用泛型来创建可重用的组件,一个组件可以支持多种类型的数据不适用泛型的函数 function myfn(args: number): number { return args; } functi ...
UI面试题(1)
1.请创建一个数组对象[@“ad”,@“bc”,@“sdf”,@“yu”],并且对该数组对象进行排序(使用冒泡排序); NSMutableArray *array = [NSMutableArraya ...
PopupWindow 从底部弹出窗体
第一步 : 初始化PopupWindow private void initPop() { if (view == null) { // 照片 view = View.inflate(Registe ...
关于Android项目中，突然就R类找不到已存在的资源文件的解决方法
项目代码早上打开正常,下午开的时候突然提示R类找不到已存在的布局文件,于是试了各种方法,CLEAN啊,重启啊,均无效,然后去网上搜了下,遇到这个问题的人还不少. 看到其中有这么一条解决方法,删除导入的 ...
restful用法
http://www.cnblogs.com/wen-wen/p/6149847.html一.创建services文件夹services文件夹下1.BaseService.jsclass Servic ...
MQTT协议实现Eclipse Paho学习总结二
一.概述前一篇博客(MQTT协议实现Eclipse Paho学习总结一) 写了一些MQTT协议相关的一些概述和其实现Eclipse Paho的报文类别,同时对心跳包进行了分析.这篇文章,在不涉及MQ ...
Entity Framework Code-First（15）：Cascade Delete
Cascade Delete in Entity Framework Code-First: Cascade delete automatically deletes dependent record ...
Entity Framework Code-First（9.8）：DataAnnotations - Column Attribute
DataAnnotations - Column Attribute: Column attribute can be applied to properties of a class. Defaul ...
ASP.NET WebForm中JavaScript修改了页面上Label的值，如何在后台代码中获取
在用ASP.NET WebForm开发一个项目时,遇到如下的一个情况页面上有一个Textbox控件,还有2个Label 控件. 当Textbox控件中的值更改时,两个Label控件上的值做相应的更改 ...
App知识点汇总
1.Fragment 2.AndroidStudio 用夜神安卓模拟器调试进入夜神模拟器安装目录下的bin目录,执行nox_adb.exe connect 127.0.0.1:62001,Andro ...

Sqoop Export HDFS

Sqoop Export HDFS的更多相关文章

随机推荐

热门专题