从hive将数据导出到mysql

http://abloz.com

2012.7.20

author:周海汉

在上一篇文章《用sqoop进行mysql和hdfs系统间的数据互导》中,提到sqoop可以让RDBMS和HDFS之间互导数据,并且也支持从mysql中导入到HBase,但从HBase直接导入mysql则不是直接支持,而是间接支持。要么将HBase导出到HDFS平面文件,要么将其导出到Hive中,再导出到mysql。本篇讲从hive中导出到mysql。
从hive将数据导出到mysql

一、创建mysql表

mysql> create table award (rowkey varchar(255), productid int, matchid varchar(255), rank varchar(255), tourneyid varchar(255), userid bigint, gameid int, gold int, loginid varchar(255), nick varchar(255), plat varchar(255));
Query OK, 0 rows affected (0.01 sec)

二、尝试用hive作为外部数据库连接hbase,导入mysql

hive> CREATE EXTERNAL TABLE hive_award(key string, productid int,matchid string, rank string, tourneyid string, userid bigint,gameid int,gold int,loginid string,nick string,plat string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:MPID,info:MatchID,info:Rank,info:TourneyID,info:UserId,info:gameID,info:gold,info:loginId,info:nickName,info:platform") TBLPROPERTIES("hbase.table.name" = "award");
hive> desc hive_award;
key string from deserializer
productid int from deserializer
matchid string from deserializer
rank string from deserializer
tourneyid string from deserializer
userid bigint from deserializer
gameid int from deserializer
gold int from deserializer
loginid string from deserializer
nick string from deserializer
plat string from deserializer
[zhouhh@Hadoop46 ~]$ hadoop fs -ls /user/hive/warehouse/
Found 3 items
drwxr-xr-x - zhouhh supergroup 0 2012-07-16 14:08 /user/hive/warehouse/hive_award
drwxr-xr-x - zhouhh supergroup 0 2012-07-16 14:30 /user/hive/warehouse/nnnon
drwxr-xr-x - zhouhh supergroup 0 2012-07-16 13:53 /user/hive/warehouse/test222
[zhouhh@Hadoop46 ~]$ sqoop export --connect jdbc:mysql://Hadoop48/toplists -m 1 --table award --export-dir /user/hive/warehouse/hive_award --input-fields-terminated-by '\0001'
12/07/19 16:13:06 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
12/07/19 16:13:06 INFO tool.CodeGenTool: Beginning code generation
12/07/19 16:13:06 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `award` AS t LIMIT 1
12/07/19 16:13:06 INFO orm.CompilationManager: HADOOP_HOME is /home/zhouhh/hadoop-1.0.0/libexec/..
注: /tmp/sqoop-zhouhh/compile/4366149f0b6dd311c5b622594744fbb0/award.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
12/07/19 16:13:08 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-zhouhh/compile/4366149f0b6dd311c5b622594744fbb0/award.jar
12/07/19 16:13:08 INFO mapreduce.ExportJobBase: Beginning export of award
12/07/19 16:13:09 WARN mapreduce.ExportJobBase: Input path hdfs://Hadoop46:9200/user/hive/warehouse/hive_award contains no files
12/07/19 16:13:11 INFO input.FileInputFormat: Total input paths to process : 0
12/07/19 16:13:11 INFO input.FileInputFormat: Total input paths to process : 0
12/07/19 16:13:13 INFO mapred.JobClient: Running job: job_201207191159_0059
12/07/19 16:13:14 INFO mapred.JobClient: map 0% reduce 0%
12/07/19 16:13:26 INFO mapred.JobClient: Job complete: job_201207191159_0059
12/07/19 16:13:26 INFO mapred.JobClient: Counters: 4
12/07/19 16:13:26 INFO mapred.JobClient: Job Counters
12/07/19 16:13:26 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=7993
12/07/19 16:13:26 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/07/19 16:13:26 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/07/19 16:13:26 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
12/07/19 16:13:26 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 16.9678 seconds (0 bytes/sec)
12/07/19 16:13:26 INFO mapreduce.ExportJobBase: Exported 0 records.
直接导外部表不成功,Input path hdfs://Hadoop46:9200/user/hive/warehouse/hive_award contains no files

三、hive中创建连结hbase的表,在hive中的插入会引起hbase的数据改变:

CREATE TABLE hive_award_data(key string,productid int,matchid string,rank string,
tourneyid string,userid bigint,gameid int,
gold int,loginid string,nick string,plat string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:MPID,info:MatchID,info:Rank,info:TourneyID,info:UserId,info:gameID,info:gold,info:loginId,info:nickName,info:platform") TBLPROPERTIES("hbase.table.name" = "award_test");
hive> insert overwrite table hive_award_data select * from hive_award limit 2;
hbase(main):014:0> scan 'award_test'
ROW COLUMN+CELL
2012-04-27 06:55:00:402713629 column=info:MPID, timestamp=1342754799918, value=5947
2012-04-27 06:55:00:402713629 column=info:MatchID, timestamp=1342754799918, value=433203828
2012-04-27 06:55:00:402713629 column=info:Rank, timestamp=1342754799918, value=2
2012-04-27 06:55:00:402713629 column=info:TourneyID, timestamp=1342754799918, value=4027102
2012-04-27 06:55:00:402713629 column=info:UserId, timestamp=1342754799918, value=402713629
2012-04-27 06:55:00:402713629 column=info:gameID, timestamp=1342754799918, value=1001
2012-04-27 06:55:00:402713629 column=info:loginId, timestamp=1342754799918, value=715878221
2012-04-27 06:55:00:402713629 column=info:nickName, timestamp=1342754799918, value=xxx
2012-04-27 06:55:00:402713629 column=info:platform, timestamp=1342754799918, value=ios
2012-04-27 06:55:00:402713629 column=info:userid, timestamp=1342754445451, value=402713629
2012-04-27 06:55:00:406788559 column=info:MPID, timestamp=1342754799918, value=778
2012-04-27 06:55:00:406788559 column=info:MatchID, timestamp=1342754799918, value=433203930
2012-04-27 06:55:00:406788559 column=info:Rank, timestamp=1342754799918, value=19
2012-04-27 06:55:00:406788559 column=info:TourneyID, timestamp=1342754799918, value=4017780
2012-04-27 06:55:00:406788559 column=info:UserId, timestamp=1342754799918, value=406788559
2012-04-27 06:55:00:406788559 column=info:gameID, timestamp=1342754799918, value=1001
2012-04-27 06:55:00:406788559 column=info:gold, timestamp=1342754799918, value=1
2012-04-27 06:55:00:406788559 column=info:loginId, timestamp=1342754799918, value=13835155880
2012-04-27 06:55:00:406788559 column=info:nickName, timestamp=1342754799918, value=xxx
2012-04-27 06:55:00:406788559 column=info:platform, timestamp=1342754799918, value=android
2 row(s) in 0.0280 seconds
[zhouhh@Hadoop46 ~]$ sqoop export --connect jdbc:mysql://Hadoop48/toplists -m 1 --table award --export-dir /user/hive/warehouse/hive_award_data --input-fields-terminated-by '\0001'
12/07/20 11:32:01 WARN mapreduce.ExportJobBase: Input path hdfs://Hadoop46:9200/user/hive/warehouse/hive_award_data contains no files

创建连接HBase的表,还是不能导入。

四、创建Hive表,将HBase外部表的数据导入

hive> CREATE TABLE hive_myaward(key string,productid int,matchid string,rank string,tourneyid string,userid bigint,gameid int,gold int,loginid string,nick string,plat string);
hive> insert overwrite table hive_myaward select * from hive_award limit 2;
hive> select * from hive_myaward;
OK
2012-04-27 06:55:00:402713629 5947 433203828 2 4027102 402713629 1001 NULL 715878221 杀破天A ios
2012-04-27 06:55:00:406788559 778 433203930 19 4017780 406788559 1001 1 13835155880 亲牛牛旦旦 android
Time taken: 2.257 seconds
[zhouhh@Hadoop46 ~]$ sqoop export --connect jdbc:mysql://Hadoop48/toplists -m 1 --table award --export-dir /user/hive/warehouse/hive_myaward --input-fields-terminated-by '\0001'
java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Access denied for user ''@'Hadoop48' to database 'toplists'

权限问题,再授权一下

mysql> GRANT ALL PRIVILEGES ON *.* TO ''@'Hadoop48';
Query OK, 0 rows affected (0.03 sec)
mysql> GRANT ALL PRIVILEGES ON *.* TO ''@'localhost';
Query OK, 0 rows affected (0.00 sec)

五、解决Hive中遇到的空值NULL的问题:

[zhouhh@Hadoop46 ~]$ sqoop export --connect jdbc:mysql://Hadoop48/toplists -m 1 --table award --export-dir /user/hive/warehouse/hive_myaward --input-fields-terminated-by '\0001'
...
12/07/20 11:49:25 INFO mapred.JobClient: map 0% reduce 0%
12/07/20 11:49:37 INFO mapred.JobClient: Task Id : attempt_201207191159_0227_m_000000_0, Status : FAILED
java.lang.NumberFormatException: For input string: "\N"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

\N是什么东西呢?

[zhouhh@Hadoop46 ~]$ hadoop fs -cat /user/hive/warehouse/hive_myaward/000000_0
2012-04-27 06:55:00:4027136295947433203828240271024027136291001\N715878221杀破天Aios
2012-04-27 06:55:00:4067885597784332039301940177804067885591001113835155880亲牛牛旦旦android
hive> select * from hive_myaward;
OK
2012-04-27 06:55:00:402713629 5947 433203828 2 4027102 402713629 1001 NULL 715878221 杀破天A ios
2012-04-27 06:55:00:406788559 778 433203930 19 4017780 406788559 1001 1 13835155880 亲牛牛旦旦 android
Time taken: 2.257 seconds

由于Hive的NULL用\N来表示,字段用\01来分割,换行用\n来换行,所以需增加相应的指示,注意转义字符\:
见:https://issues.cloudera.org/browse/SQOOP-188

[zhouhh@Hadoop46 ~]$ sqoop export --connect jdbc:mysql://Hadoop48/toplists -m 1 --table award --export-dir /user/hive/warehouse/hive_myaward/000000_0 --input-null-string "\\\\N" --input-null-non-string "\\\\N" --input-fields-terminated-by "\\01" --input-lines-terminated-by "\\n"
12/07/20 12:53:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
12/07/20 12:53:56 INFO tool.CodeGenTool: Beginning code generation
12/07/20 12:53:56 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `award` AS t LIMIT 1
12/07/20 12:53:56 INFO orm.CompilationManager: HADOOP_HOME is /home/zhouhh/hadoop-1.0.0/libexec/..
注: /tmp/sqoop-zhouhh/compile/4427d3db678bb145c995073e0924dc0b/award.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
12/07/20 12:53:57 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-zhouhh/compile/4427d3db678bb145c995073e0924dc0b/award.java to /home/zhouhh/./award.java
12/07/20 12:53:57 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-zhouhh/compile/4427d3db678bb145c995073e0924dc0b/award.jar
12/07/20 12:53:57 INFO mapreduce.ExportJobBase: Beginning export of award
12/07/20 12:53:58 INFO input.FileInputFormat: Total input paths to process : 1
12/07/20 12:53:58 INFO input.FileInputFormat: Total input paths to process : 1
12/07/20 12:53:58 INFO mapred.JobClient: Running job: job_201207191159_0232
12/07/20 12:53:59 INFO mapred.JobClient: map 0% reduce 0%
12/07/20 12:54:12 INFO mapred.JobClient: map 100% reduce 0%
12/07/20 12:54:17 INFO mapred.JobClient: Job complete: job_201207191159_0232
12/07/20 12:54:17 INFO mapred.JobClient: Counters: 18
12/07/20 12:54:17 INFO mapred.JobClient: Job Counters
12/07/20 12:54:17 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=12114
12/07/20 12:54:17 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/07/20 12:54:17 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/07/20 12:54:17 INFO mapred.JobClient: Rack-local map tasks=1
12/07/20 12:54:17 INFO mapred.JobClient: Launched map tasks=1
12/07/20 12:54:17 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
12/07/20 12:54:17 INFO mapred.JobClient: File Output Format Counters
12/07/20 12:54:17 INFO mapred.JobClient: Bytes Written=0
12/07/20 12:54:17 INFO mapred.JobClient: FileSystemCounters
12/07/20 12:54:17 INFO mapred.JobClient: HDFS_BYTES_READ=335
12/07/20 12:54:17 INFO mapred.JobClient: FILE_BYTES_WRITTEN=30172
12/07/20 12:54:17 INFO mapred.JobClient: File Input Format Counters
12/07/20 12:54:17 INFO mapred.JobClient: Bytes Read=0
12/07/20 12:54:17 INFO mapred.JobClient: Map-Reduce Framework
12/07/20 12:54:17 INFO mapred.JobClient: Map input records=2
12/07/20 12:54:17 INFO mapred.JobClient: Physical memory (bytes) snapshot=78696448
12/07/20 12:54:17 INFO mapred.JobClient: Spilled Records=0
12/07/20 12:54:17 INFO mapred.JobClient: CPU time spent (ms)=390
12/07/20 12:54:17 INFO mapred.JobClient: Total committed heap usage (bytes)=56623104
12/07/20 12:54:17 INFO mapred.JobClient: Virtual memory (bytes) snapshot=891781120
12/07/20 12:54:17 INFO mapred.JobClient: Map output records=2
12/07/20 12:54:17 INFO mapred.JobClient: SPLIT_RAW_BYTES=123
12/07/20 12:54:17 INFO mapreduce.ExportJobBase: Transferred 335 bytes in 19.6631 seconds (17.037 bytes/sec)
12/07/20 12:54:17 INFO mapreduce.ExportJobBase: Exported 2 records.

导出到mysql成功

mysql> use toplists;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select * from award;
+-------------------------------+-----------+-----------+------+-----------+-----------+--------+------+-------------+-------+---------+
| rowkey | productid | matchid | rank | tourneyid | userid | gameid | gold | loginid | nick | plat |
+-------------------------------+-----------+-----------+------+-----------+-----------+--------+------+-------------+-------+---------+
| 2012-04-27 06:55:00:402713629 | 5947 | 433203828 | 2 | 4027102 | 402713629 | 1001 | NULL | 715878221 | ???A | ios |
| 2012-04-27 06:55:00:406788559 | 778 | 433203930 | 19 | 4017780 | 406788559 | 1001 | 1 | 13835155880 | ????? | android |
+-------------------------------+-----------+-----------+------+-----------+-----------+--------+------+-------------+-------+---------+
2 rows in set (0.00 sec)

虽然mysql中有了数据,不过,导入的却是乱码
在《Hive导出到Mysql中中文乱码的问题》这篇文章中继续解决。

摘自:http://abloz.com/2012/07/20/export-data-to-mysql-from-the-hive.html

从hive将数据导出到mysql(转)的更多相关文章

  1. MSSQL数据导出到MYSQL

    MSSQL数据导出到MYSQL 花了一天时间把MSSQL里的数据导出到MYSQL, 好麻烦,二个数据库都是阿里云买的云服务器. 先上阿里云控制面板,备份下MSSQL数据库,下载备份下来,在本地电脑上还 ...

  2. 使用JDBC+POI把Excel中的数据导出到MySQL

    POI是Apache的一套读MS文档的API,用它还是可以比较方便的读取Office文档的.目前支持Word,Excel,PowerPoint生成的文档,还有Visio和Publisher的. htt ...

  3. 创建function实现hive表结果导出到mysql

    1. 创建临时function (这里两个包都是hive自带的,不需要自己开发的,可以根据名称查找对应的版本) add jar /opt/local/hive/lib/hive-contrib-.ja ...

  4. es 数据 导出 到 MySQL

    暂时没有找到直接 导出到 mysql 数据库的工具 或者项目 目前实现思路: 使用 elasticdump  工具 实现 从 es 数据 导出到 json 文件 ,然后 使用 脚本程序 操作 改 js ...

  5. 用java代码调用shell脚本执行sqoop将hive表中数据导出到mysql

    1:创建shell脚本 touch sqoop_options.sh chmod 777 sqoop_options.sh 编辑文件  特地将执行map的个数设置为变量  测试 可以java代码传参数 ...

  6. 使用 sqoop 将 hive 数据导出到 mysql (export)

    使用sqoop将hive中的数据传到mysql中 1.新建hive表 hive> create external table sqoop_test(id int,name string,age ...

  7. hive的数据导出方式

    hive有三种导出数据的方式 >导出数据到本地 >导出数据到hdfs >导出数据到另一个表   导出数据到本地文件系统 insert overwrite local director ...

  8. hive表数据导出到csv乱码原因及解决方案

    转载自http://blog.csdn.net/lgdlxc/article/details/42126225 Hive表中的数据使用hive - e"select * from table ...

  9. Sqoop2 将hdfs中的数据导出到MySQL

    1.进入sqoop2终端: [root@master /]# sqoop2 2.为客户端配置服务器: sqoop:000> set server --host master --port 120 ...

随机推荐

  1. 第二章ARP——地址解析协议

    本章我们要讨论的问题是只对 T C P / I P协议簇有意义的I P地址.数据链路如以太网或令牌环网都有自己的寻址机制(常常为 48 bit地址),这是使用数据链路的任何网络层都必须遵从的.一个网络 ...

  2. 读<分布式一致性原理>初识zookeeper

    zookeeper是什么 zookeeper是一个典型的分布式数据一致性的解决方案,分布式应用程序可以基于它实现诸如:数据发布/订阅,负载均衡,命名服务,分布式协调/通知 ,集群管理,Master选举 ...

  3. Sping实战之通过JAVA代码装配Bean

    尽管在很多场景下通过组件扫描和自动装配实现Spring的自动化配置是更为推荐的方式,但有时候自动化配置的方案行不通,因此需要明确配置Spring.比如说,你想要将第三方库中的组件装配到你的应用中,在这 ...

  4. winform 实现局部更新(如ajax实现)而整个界面不产生闪烁的解决方案

    转自原文winform 实现局部更新(如ajax实现)而整个界面不产生闪烁的解决方案 一.通过对窗体和控件使用双缓冲来减少图形闪烁(当绘制图片时出现闪烁时,使用双缓冲) 对于大多数应用程序,.NET ...

  5. python学习——练习题(10)

    """ 题目:暂停一秒输出,并格式化当前时间. """ import sys import time def answer1(): &quo ...

  6. python拷贝目录下的文件

    #!/usr/bin/env python # Version = 3.5.2 import shutil base_dir = '/data/media/' file = '/backup/temp ...

  7. Quartz.NET文档 入门教程

    概述 Quartz.NET是一个开源的作业调度框架,非常适合在平时的工作中,定时轮询数据库同步,定时邮件通知,定时处理数据等. Quartz.NET允许开发人员根据时间间隔(或天)来调度作业.它实现了 ...

  8. Opencv 图像矩

    #include <iostream>#include <opencv2/opencv.hpp> using namespace std;using namespace cv; ...

  9. cs api 之一

    无法创建   无法创建网络   执行顺序  

  10. shiro 实现 网站登录记住我功能 学习记录(四)

    在很多网站都有在登录的时候,比如说记住我 几天之内  只要再此打开这个网站,都不需要再登录的情况: 1.前台JSP增加 单选框:记住我 如 2.在处理登录的 Controller 代码中增加接收这个参 ...