kudu是cloudera开源的运行在hadoop平台上的列式存储系统,拥有Hadoop生态系统应用的常见技术特性,运行在一般的商用硬件上,支持水平扩展,高可用,集成impala后,支持标准sql语句,相对于hbase易用性强,详细介绍

  impala是Cloudera公司主导开发的新型查询系统,它提供SQL语义,能查询存储在Hadoop的HDFS和HBase中的PB级大数据。已有的Hive系统虽然也提供了SQL语义,但由于Hive底层执行使用的是MapReduce引擎,仍然是一个批处理过程,难以满足查询的交互性。相比之下,Impala的最大特点也是最大卖点就是它的快速,导入数据实测可达30+W/s,详细介绍

导入流程:准备数据--》上传hdfs--》导入impala临时表--》导入kudu表

1.准备数据

app@hadoop01:/software/develop/pujh>cat genBiData.sh
#!/usr/bash date
echo ''>data.txt
chmod data.txt for((i=;i<=20593279;i++))
do
echo "$i|aa$i|aa$i$i|aa$i$i$i" >>data.txt;
done; date app@hadoop01:/software/develop/pujh> sed 's/|/,/g' data.txt > temp.csv
app@hadoop01:/software/develop/pujh>chmod 777 tmp.csv

2.上传到hdfs

su - root
su - hdfs
hadoop dfs -mkdir /input/data/pujh
hadoop dfs -chmod -R /input/data/pujh
hadoop dfs -put /software/develop/pujh /input/data/pujh
hadoop dfs -ls /input/data/pujh
hdfs@hadoop01:>./hadoop dfs -ls /input/data/pujh
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it. Found items
-rwxrwxrwx hdfs supergroup -- : /input/data/pujh/aa.txt
-rwxrwxrwx hdfs supergroup -- : /input/data/pujh/data.txt
-rwxrwxrwx hdfs supergroup -- : /input/data/pujh/data2kw.csv
-rwxrwxrwx hdfs supergroup -- : /input/data/pujh/data_2kw.txt
-rwxrwxrwx hdfs supergroup -- : /input/data/pujh/genBiData.sh

3.导入impala临时表

创建impala临时表

employee_temp
create table employee_temp ( eid int, name String,salary String, destination String) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';hdfs@hadoop02>./impala-shell 
Starting Impala Shell without Kerberos authentication
Connected to hadoop02:
Server version: impala version 2.8.-cdh5.11.2 RELEASE (build f89269c4b96da14a841e94bdf6d4d48821b0d658)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.8.0-cdh5.11.2 (f89269c) built on Fri Aug :: PDT ) The HISTORY command lists all shell commands in chronological order.
***********************************************************************************
[hadoop02:] > show databases;
Query: show databases
+------------------+----------------------------------------------+
| name | comment |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| default | Default Hive database |
| td_test | |
+------------------+----------------------------------------------+
Fetched row(s) in .01s
[hadoop02:] > show tables;
Query: show tables
+----------------+
| name |
+----------------+
| employee |
| my_first_table |
+----------------+
Fetched row(s) in .00s [hadoop02:] > create table employee_temp ( eid int, name String,salary String, destination String) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Query: create table employee_temp ( eid int, name String,salary String, destination String) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' Fetched row(s) in .32s
[hadoop02:] > show tables;
Query: show tables
+----------------+
| name |
+----------------+
| employee |
| employee_temp |
| my_first_table |
+----------------+
Fetched row(s) in .01s

将hadoop上的文件导入impala临时表

load data inpath '/input/data/pujh/temp.csv' into table employee_temp;
[hadoop02:] > load data inpath '/input/data/pujh/temp.csv' into table employee_temp;
Query: load data inpath '/input/data/pujh/temp.csv' into table employee_temp
ERROR: AnalysisException: Unable to LOAD DATA from hdfs://hadoop01:8020/input/data/pujh/temp.csv because Impala does not have WRITE permissions on its parent directory hdfs://hadoop01:8020/input/data/pujh [hadoop02:] > load data inpath '/input/data/pujh/temp.csv' into table employee_temp;
Query: load data inpath '/input/data/pujh/temp.csv' into table employee_temp
+----------------------------------------------------------+
| summary |
+----------------------------------------------------------+
| Loaded file(s). Total files in destination location: |
+----------------------------------------------------------+
Fetched row(s) in .44s
[hadoop02:] > select * from employee_temp limit ;
Query: select * from employee_temp limit
Query submitted at: -- :: (Coordinator: http://hadoop02:25000)
Query progress can be monitored at: http://hadoop02:25000/query_plan?query_id=4246eaa38a3d8bbb:953ce4d300000000
+------+------+--------+-------------+
| eid | name | salary | destination |
+------+------+--------+-------------+
| NULL | NULL | | |
| | aa1 | aa11 | aa111 |
+------+------+--------+-------------+
Fetched row(s) in .19s
[hadoop02:] > select * from employee_temp limit ;
Query: select * from employee_temp limit
Query submitted at: -- :: (Coordinator: http://hadoop02:25000)
Query progress can be monitored at: http://hadoop02:25000/query_plan?query_id=cb4c3cf5d647c97a:75d2985f00000000
+------+------+--------+-------------+
| eid | name | salary | destination |
+------+------+--------+-------------+
| NULL | NULL | | |
| | aa1 | aa11 | aa111 |
| | aa2 | aa22 | aa222 |
| | aa3 | aa33 | aa333 |
| | aa4 | aa44 | aa444 |
| | aa5 | aa55 | aa555 |
| | aa6 | aa66 | aa666 |
| | aa7 | aa77 | aa777 |
| | aa8 | aa88 | aa888 |
| | aa9 | aa99 | aa999 |
+------+------+--------+-------------+
Fetched row(s) in .02s
[hadoop02:] > select count(*) from employee_temp;
Query: select count(*) from employee_temp
Query submitted at: -- :: (Coordinator: http://hadoop02:25000)
Query progress can be monitored at: http://hadoop02:25000/query_plan?query_id=5a4c1107de118395:bfe96a1600000000
+----------+
| count(*) |
+----------+
| |
+----------+
Fetched row(s) in .65s

3.从impala临时表employee_temp 导入kudu表employee_kudu

创建kudu表

create table employee_kudu ( eid int, name String,salary String, destination String,PRIMARY KEY(eid)) PARTITION BY HASH PARTITIONS 16 STORED AS KUDU;

[hadoop02:] > create table employee_kudu ( eid int, name String,salary String, destination String,PRIMARY KEY(eid)) PARTITION BY HASH PARTITIONS  STORED AS KUDU;
Query: create table employee_kudu ( eid int, name String,salary String, destination String,PRIMARY KEY(eid)) PARTITION BY HASH PARTITIONS STORED AS KUDU Fetched row(s) in .94s
[hadoop02:] > show tables;
Query: show tables
+----------------+
| name |
+----------------+
| employee |
| employee_kudu |
| employee_temp |
| my_first_table |

界面查看是否创建成功

从impala临时表employee_temp 导入kudu表employee_kudu

[hadoop02:] > insert into employee_kudu select * from employee_temp;
Query: insert into employee_kudu select * from employee_temp
Query submitted at: -- :: (Coordinator: http://hadoop02:25000)
Query progress can be monitored at: http://hadoop02:25000/query_plan?query_id=2e47536cc5c82392:ef4d552600000000
WARNINGS: Row with null value violates nullability constraint on table 'impala::default.employee_kudu'. Modified row(s), row error(s) in .75s
[hadoop02:] > select count(*) from employee_kudu;
Query: select count(*) from employee_kudu
Query submitted at: -- :: (Coordinator: http://hadoop02:25000)
Query progress can be monitored at: http://hadoop02:25000/query_plan?query_id=6d4bad44a980f229:fd7878d00000000
+----------+
| count(*) |
+----------+
| |
+----------+
Fetched row(s) in .18s

kudu导入文件(基于impala)的更多相关文章

  1. 基于Impala平台打造交互查询系统

    本文来自网易云社区 原创: 蒋鸿翔 DataFunTalk 本文根据网易大数据蒋鸿翔老师DataFun Talk--"大数据从底层处理到数据驱动业务"中分享的<基于Impal ...

  2. 2017年最新VOS2009/VOS3000最新手机号段导入文件(手机归属地数据)

    VOS2009.vos3000.vos5000最新手机号段归属地数据库导入文件. 基于2017年4月最新版手机号段归属地制作 共360569条记录,兼容所有版本的昆石VOS,包括VOS2009.vos ...

  3. 关于AVD不能导入文件的解决方案

    安卓虚拟机导入文件时报以下异常: [2013-01-23 16:09:18 - ddms] transfer error: Read-only file system [2013-01-23 16:0 ...

  4. iOS发展- 文件共享(使用iTunes导入文件, 并显示现有文件)

    到今天实现功能, 由iTunes导入文件的应用程序, 并在此文档进行编辑的应用. 就像我们平时经常使用 PDF阅读这样的事情, 们能够自己导入我们的电子书. 源代码下载:https://github. ...

  5. javaCV开发详解之3:收流器实现,录制流媒体服务器的rtsp/rtmp视频文件(基于javaCV-FFMPEG)

    javaCV系列文章: javacv开发详解之1:调用本机摄像头视频 javaCV开发详解之2:推流器实现,推本地摄像头视频到流媒体服务器以及摄像头录制视频功能实现(基于javaCV-FFMPEG.j ...

  6. PHP:phpMyAdmin如何解决本地导入文件(数据库)为2M的限制

    经验地址:http://jingyan.baidu.com/article/e75057f2a2288eebc91a89b7.html 当我们从别人那里导出数据库在本地导入时,因为数据库文件大于2M而 ...

  7. Mysql 导入文件提示 --secure-file-priv option 问题

    MYSQL导入数据出现:The MySQL server is running with the --secure-file-priv option so it cannot execute this ...

  8. Vue 导入文件import、路径@和.的区别

    ***import: html文件中,通过script标签引入js文件.而vue中,通过import xxx from xxx路径的方式导入文件,不光可以导入js文件. from前的:“xxx”指的是 ...

  9. 使用SQL Developer导入文件时出现的一个奇怪的问题

    SQL Developer 的版本是 17.3.1.279 当我导入文件的时候,在Data Preview 的阶段,发现无论选择还是取消选择 Header,文件中的第一行总会被当作字段名. 后来在Or ...

随机推荐

  1. 企业面试题-find结合sed查找替换

    题:把/oldboy目录及其子目录下所有以扩展名.sh结尾的文件中包含oldboy的字符串全部替换成oldgirl 解答: 建立测试数据: [root@tan data]# mkdir /oldboy ...

  2. css的性质

    css两个性质: 1.继承性 2.层叠行(选择器的一种选择能力,谁的权重大就选谁) A.选不中,走继承性,(font系列.color.text系列)权重是0 a)有多个父级都设置了这样的样式   走就 ...

  3. Shell 脚本元组+for循环

    #!/bin/bash#by:V log_dir=(/data/logs/anjubao_syncapi /data/logs/anjubao_wechat) daytime=`date -d ' - ...

  4. linux 查看当前系统版本号

    为避免现场未能完全安装系统,使用yum 安装版本需一致 第一种方法: [root@sky9896sky]# lsb_release -a bash:lsb_release: command not f ...

  5. Navicat For MySQL--外键建立与cannot add foreign key constraint分析

    hrm_job.png 参考资料: https://blog.csdn.net/ytm15732625529/article/details/53729155 https://www.cnblogs. ...

  6. Windows安装Anaconda出现failed to create menus

    当出现上述问题时,有以下的解决办法: (1)默认安装,即一直next: (2)安装路径里不要包含英文以外的语言,即安装路径全部用英文命名: (3)先不要安装python,或者将安装的python配置好 ...

  7. 编写简单的windows桌面计算器程序

    编译环境:VS2017 主文件为: #include "stdafx.h" #include "WindowsProject5.h" #include &quo ...

  8. Coprime Sequence (HDU 6025)前缀和与后缀和的应用

    题意:给出一串数列,这串数列的gcd为1,要求取出一个数使取出后的数列gcd最大. 题解:可以通过对数列进行预处理,求出从下标为1开始的数对于前面的数的gcd(数组从下标0开始),称为前缀gcd,再以 ...

  9. java-IO流-字节流-概述及分类、FileInputStream、FileOutputStream、available()方法、定义小数组、BufferedInputStream、BufferedOutputStream、flush和close方法的区别、流的标准处理异常代码

    1.IO流概述及其分类 * 1.概念      * IO流用来处理设备之间的数据传输      * Java对数据的操作是通过流的方式      * Java用于操作流的类都在IO包中      *  ...

  10. pxe+Kickstart自动装机补充知识点

    1.vmlinuzvmlinuz是可引导的.压缩的内核.“vm”代表“Virtual Memory”.Linux 支持虚拟内存,不像老的操作系统比如DOS有640KB内存的限制.Linux能够使用硬盘 ...