https://blog.csdn.net/wiborgite/article/details/78813342

背景说明:

基于CHD quick VM环境,在一个VM中同时包含了HDFS、YARN、HBase、Hive、Impala等组件。

本文将一个文本数据从HDFS加载到Hive,同步元数据后,在Impala中进行数据操作。

-----------------------------------------------------------------------------------------Linux Shell的操作-----------------------------------------------------------

1、将PC本地的数据文件上传到VM中/home/data目录下

  1.  
    [root@quickstart data]# pwd
  2.  
    /home/data
  3.  
    [root@quickstart data]# ls
  4.  
    p10pco2a.dat stock_data2.csv
  5.  
    [root@quickstart data]# head p10pco2a.dat
  6.  
    WOCE_P10,1993,279.479,-16.442,172.219,24.9544,34.8887,1.0035,363.551,2
  7.  
    WOCE_P10,1993,279.480,-16.440,172.214,24.9554,34.8873,1.0035,363.736,2
  8.  
    WOCE_P10,1993,279.480,-16.439,172.213,24.9564,34.8868,1.0033,363.585,2
  9.  
    WOCE_P10,1993,279.481,-16.438,172.209,24.9583,34.8859,1.0035,363.459,2
  10.  
    WOCE_P10,1993,279.481,-16.437,172.207,24.9594,34.8859,1.0033,363.543,2
  11.  
    WOCE_P10,1993,279.481,-16.436,172.205,24.9604,34.8858,1.0035,363.432,2
  12.  
    WOCE_P10,1993,279.489,-16.417,172.164,24.9743,34.8867,1.0036,362.967,2
  13.  
    WOCE_P10,1993,279.490,-16.414,172.158,24.9742,34.8859,1.0035,362.960,2
  14.  
    WOCE_P10,1993,279.491,-16.412,172.153,24.9747,34.8864,1.0033,362.998,2
  15.  
    WOCE_P10,1993,279.492,-16.411,172.148,24.9734,34.8868,1.0031,363.022,2

2、将/home/data/p10pco2a.dat文件上传到HDFS

  1.  
    [root@quickstart data]# hdfs dfs -put p10pco2a.dat /tmp/
  2.  
    [root@quickstart data]# hdfs dfs -ls /tmp
  3.  
    -rw-r--r-- 1 root supergroup 281014 2017-12-14 18:47 /tmp/p10pco2a.dat

-----------------------------------------------------------------------Hive的操作----------------------------------------------------------------------------

1、启动Hive CLI

# hive

2、Hive中创建数据库

CREATE DATABASE  weather;

3、Hive中创建表

  1.  
    create table weather.weather_everydate_detail
  2.  
    (
  3.  
    section string,
  4.  
    year bigint,
  5.  
    date double,
  6.  
    latim double,
  7.  
    longit double,
  8.  
    sur_tmp double,
  9.  
    sur_sal double,
  10.  
    atm_per double,
  11.  
    xco2a double,
  12.  
    qf bigint
  13.  
    )
  14.  
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

4、将HDFS中的数据加载到已创建的Hive表中

  1.  
    LOAD DATA INPATH '/tmp/p10pco2a.dat' INTO TABLE weather.weather_everydate_detail;
  2.  
     
  3.  
    hive> LOAD DATA INPATH '/tmp/p10pco2a.dat' INTO TABLE weather.weather_everydate_detail;
  4.  
    Loading data to table weather.weather_everydate_detail
  5.  
    Table weather.weather_everydate_detail stats: [numFiles=1, totalSize=281014]
  6.  
    OK
  7.  
    Time taken: 1.983 seconds

5、查看Hive表确保数据已加载

  1.  
    use weather;
  2.  
    select * from weather.weather_everydate_detail limit 10;
  3.  
    select count(*) from weather.weather_everydate_detail;
  1.  
    hive> select * from weather.weather_everydate_detail limit 10;
  2.  
    OK
  3.  
    WOCE_P10 1993 279.479 -16.442 172.219 24.9544 34.8887 1.0035 363.551 2
  4.  
    WOCE_P10 1993 279.48 -16.44 172.214 24.9554 34.8873 1.0035 363.736 2
  5.  
    WOCE_P10 1993 279.48 -16.439 172.213 24.9564 34.8868 1.0033 363.585 2
  6.  
    WOCE_P10 1993 279.481 -16.438 172.209 24.9583 34.8859 1.0035 363.459 2
  7.  
    WOCE_P10 1993 279.481 -16.437 172.207 24.9594 34.8859 1.0033 363.543 2
  8.  
    WOCE_P10 1993 279.481 -16.436 172.205 24.9604 34.8858 1.0035 363.432 2
  9.  
    WOCE_P10 1993 279.489 -16.417 172.164 24.9743 34.8867 1.0036 362.967 2
  10.  
    WOCE_P10 1993 279.49 -16.414 172.158 24.9742 34.8859 1.0035 362.96 2
  11.  
    WOCE_P10 1993 279.491 -16.412 172.153 24.9747 34.8864 1.0033 362.998 2
  12.  
    WOCE_P10 1993 279.492 -16.411 172.148 24.9734 34.8868 1.0031 363.022 2
  13.  
    Time taken: 0.815 seconds, Fetched: 10 row(s)
  14.  
    hive> select count(*) from weather.weather_everydate_detail;
  15.  
    Query ID = root_20171214185454_c783708d-ad4b-46cc-9341-885c16a286fe
  16.  
    Total jobs = 1
  17.  
    Launching Job 1 out of 1
  18.  
    Number of reduce tasks determined at compile time: 1
  19.  
    In order to change the average load for a reducer (in bytes):
  20.  
    set hive.exec.reducers.bytes.per.reducer=<number>
  21.  
    In order to limit the maximum number of reducers:
  22.  
    set hive.exec.reducers.max=<number>
  23.  
    In order to set a constant number of reducers:
  24.  
    set mapreduce.job.reduces=<number>
  25.  
    Starting Job = job_1512525269046_0001, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1512525269046_0001/
  26.  
    Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1512525269046_0001
  27.  
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
  28.  
    2017-12-14 18:55:27,386 Stage-1 map = 0%, reduce = 0%
  29.  
    2017-12-14 18:56:11,337 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 39.36 sec
  30.  
    2017-12-14 18:56:18,711 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 41.88 sec
  31.  
    MapReduce Total cumulative CPU time: 41 seconds 880 msec
  32.  
    Ended Job = job_1512525269046_0001
  33.  
    MapReduce Jobs Launched:
  34.  
    Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 41.88 sec HDFS Read: 288541 HDFS Write: 5 SUCCESS
  35.  
    Total MapReduce CPU Time Spent: 41 seconds 880 msec
  36.  
    OK
  37.  
    4018
  38.  
    Time taken: 101.82 seconds, Fetched: 1 row(s)

6、执行一个普通查询:

  1.  
    hive> select * from weather_everydate_detail where sur_sal=34.8105;
  2.  
    OK
  3.  
    WOCE_P10 1993 312.148 34.602 141.951 24.0804 34.8105 1.0081 361.29 2
  4.  
    WOCE_P10 1993 312.155 34.602 141.954 24.0638 34.8105 1.0079 360.386 2
  5.  
    Time taken: 0.138 seconds, Fetched: 2 row(s)
  1.  
    hive> select * from weather_everydate_detail where sur_sal=34.8105;
  2.  
    OK
  3.  
    WOCE_P10 1993 312.148 34.602 141.951 24.0804 34.8105 1.0081 361.29 2
  4.  
    WOCE_P10 1993 312.155 34.602 141.954 24.0638 34.8105 1.0079 360.386 2
  5.  
    Time taken: 1.449 seconds, Fetched: 2 row(s)

-----------------------------------------------------------------------------------------------------Impala的操作-----------------------------------------------------------
1、启动Impala CLI

# impala-shell 

2、在Impala中同步元数据

  1.  
    [quickstart.cloudera:21000] > INVALIDATE METADATA;
  2.  
    Query: invalidate METADATA
  3.  
    Query submitted at: 2017-12-14 19:01:12 (Coordinator: http://quickstart.cloudera:25000)
  4.  
    Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=43460ace5d3a9971:9a50f46600000000
  5.  
     
  6.  
    Fetched 0 row(s) in 3.25s

3、在Impala中查看Hive中表的结构

  1.  
    [quickstart.cloudera:21000] > use weather;
  2.  
    Query: use weather
  3.  
    [quickstart.cloudera:21000] > desc weather.weather_everydate_detail;
  4.  
    Query: describe weather.weather_everydate_detail
  5.  
    +---------+--------+---------+
  6.  
    | name | type | comment |
  7.  
    +---------+--------+---------+
  8.  
    | section | string | |
  9.  
    | year | bigint | |
  10.  
    | date | double | |
  11.  
    | latim | double | |
  12.  
    | longit | double | |
  13.  
    | sur_tmp | double | |
  14.  
    | sur_sal | double | |
  15.  
    | atm_per | double | |
  16.  
    | xco2a | double | |
  17.  
    | qf | bigint | |
  18.  
    +---------+--------+---------+
  19.  
    Fetched 10 row(s) in 3.70s

4、查询记录数量

  1.  
    [quickstart.cloudera:21000] > select count(*) from weather.weather_everydate_detail;
  2.  
    Query: select count(*) from weather.weather_everydate_detail
  3.  
    Query submitted at: 2017-12-14 19:03:11 (Coordinator: http://quickstart.cloudera:25000)
  4.  
    Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=5542894eeb80e509:1f9ce37f00000000
  5.  
    +----------+
  6.  
    | count(*) |
  7.  
    +----------+
  8.  
    | 4018 |
  9.  
    +----------+
  10.  
    Fetched 1 row(s) in 2.51s

说明:对比Impala与Hive中的count查询,2.15 VS 101.82,Impala的优势还是相当明显的

5、执行一个普通查询

  1.  
    [quickstart.cloudera:21000] > select * from weather_everydate_detail where sur_sal=34.8105;
  2.  
    Query: select * from weather_everydate_detail where sur_sal=34.8105
  3.  
    Query submitted at: 2017-12-14 19:20:27 (Coordinator: http://quickstart.cloudera:25000)
  4.  
    Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=c14660ed0bda471f:d92fcf0e00000000
  5.  
    +----------+------+---------+--------+---------+---------+---------+---------+---------+----+
  6.  
    | section | year | date | latim | longit | sur_tmp | sur_sal | atm_per | xco2a | qf |
  7.  
    +----------+------+---------+--------+---------+---------+---------+---------+---------+----+
  8.  
    | WOCE_P10 | 1993 | 312.148 | 34.602 | 141.951 | 24.0804 | 34.8105 | 1.0081 | 361.29 | 2 |
  9.  
    | WOCE_P10 | 1993 | 312.155 | 34.602 | 141.954 | 24.0638 | 34.8105 | 1.0079 | 360.386 | 2 |
  10.  
    +----------+------+---------+--------+---------+---------+---------+---------+---------+----+
  11.  
    Fetched 2 row(s) in 0.25s
  1.  
    [quickstart.cloudera:21000] > select * from weather_everydate_detail where sur_tmp=24.0804;
  2.  
    Query: select * from weather_everydate_detail where sur_tmp=24.0804
  3.  
    Query submitted at: 2017-12-14 23:15:32 (Coordinator: http://quickstart.cloudera:25000)
  4.  
    Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=774e2b3b81f4eed7:8952b5b400000000
  5.  
    +----------+------+---------+--------+---------+---------+---------+---------+--------+----+
  6.  
    | section | year | date | latim | longit | sur_tmp | sur_sal | atm_per | xco2a | qf |
  7.  
    +----------+------+---------+--------+---------+---------+---------+---------+--------+----+
  8.  
    | WOCE_P10 | 1993 | 312.148 | 34.602 | 141.951 | 24.0804 | 34.8105 | 1.0081 | 361.29 | 2 |
  9.  
    +----------+------+---------+--------+---------+---------+---------+---------+--------+----+
  10.  
    Fetched 1 row(s) in 3.86s

6.结论

对于Hive中需要编译为mapreduce执行的SQL,在Impala中执行是有明显的速度优势的,但是Hive也不是所有的查询都要编译为mapreduce,此类型的查询,impala相比于Hive就没啥优势了。

[转]impala操作hive数据实例的更多相关文章

  1. Spark SQL 操作Hive 数据

    Spark 2.0以前版本:val sparkConf = new SparkConf().setAppName("soyo")    val spark = new SparkC ...

  2. R语言读取Hive数据表

    R通过RJDBC包连接Hive 目前Hive集群是可以通过跳板机来访问 HiveServer, 将Hive 中的批量数据读入R环境,并进行后续的模型和算法运算. 1. 登录跳板机后需要首先在Linux ...

  3. KUDU数据导入尝试一:TextFile数据导入Hive,Hive数据导入KUDU

    背景 SQLSERVER数据库中单表数据几十亿,分区方案也已经无法查询出结果.故:采用导出功能,导出数据到Text文本(文本>40G)中. 因上原因,所以本次的实验样本为:[数据量:61w条,文 ...

  4. impala操作hase、hive

    impala中使用复杂类型(Hive):    如果Hive中创建的表带有复杂类型(array,struct,map),且储存格式(stored as textfile)为text或者默认,那么在im ...

  5. Hive数据类型和DDL操作

    hive命令 在Linux下的命令行中直接输入如下命令,可以查看帮助信息: # hive -help 常用的如-e.-f参数. 使用-e参数,可以直接在命令行传递SQL语句进行hive表数据的查询: ...

  6. 大数据学习day25------spark08-----1. 读取数据库的形式创建DataFrame 2. Parquet格式的数据源 3. Orc格式的数据源 4.spark_sql整合hive 5.在IDEA中编写spark程序(用来操作hive) 6. SQL风格和DSL风格以及RDD的形式计算连续登陆三天的用户

    1. 读取数据库的形式创建DataFrame DataFrameFromJDBC object DataFrameFromJDBC { def main(args: Array[String]): U ...

  7. hive数据操作

    mdl是数据操作类的语言,包括向数据表加载文件,写查询结果等操作 hive有四种导入数据的方式 >从本地加载数据 LOAD DATA LOCAL INPATH './examples/files ...

  8. hive基本的操作语句(实例简单易懂,create table XX as select XX)

    hive建表语句DML:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Cr ...

  9. Spark记录-Spark-Shell客户端操作读取Hive数据

    1.拷贝hive-site.xml到spark/conf下,拷贝mysql-connector-java-xxx-bin.jar到hive/lib下 2.开启hive元数据服务:hive  --ser ...

随机推荐

  1. Idea查看并过滤某个接口或者类的实现

    查看当前类的父类 会出现一个图 不过这个有点鸡肋,我们通常想看一个类的实现有哪些,虽然有个方法可以,但是没有图. 查看类的实现 在你想查看的类上 Ctrl+H(Ctrl+Alt+B是直接弹窗,不能过滤 ...

  2. spring Boot 入门--为什么用spring boot

    为什么用spring boot 回答这个问题不得不说下spring 假设你受命用Spring开发一个简单的Hello World Web应用程序.你该做什么? 我能想到一些 基本的需要.  一个项目 ...

  3. Redis 高级部分

    一.主从复制   image.png Rdis 的主从复制特点   image.png 1. 配置主从 实现方式同样有两种: 命令方式和配置文件方式 命令方式 只需要在从服务器上执行如下命令即可 sl ...

  4. thinkphp5 去除缓存

    array_map('unlink', glob(TEMP_PATH . '/*.php')); rmdir(TEMP_PATH);

  5. js重点--原型链

    通过将一个构造函数的原型对象指向父类的实例,就可以调用父类中的实例属性及父类的原型对象属性,实现继承. function animals(){ this.type = "animals&qu ...

  6. js重点--匿名函数

    推荐博客:https://www.cnblogs.com/pssp/p/5216668.html 函数是必须要有函数名的,不然没有办法找到它,使用它. 如果没有名字必须要有一个依附体,如:将这个匿名函 ...

  7. C++回顾day03---<string字符串操作>

    一:string优点 相比于char*的字符串,C++标准程序库中的string类不必担心内存是否足够.字符串长度等等 而且作为一个类出现,他集成的操作函数足以完成我们大多数情况下的需要. 二:str ...

  8. css中自定义字体

    css代码如下: @font-face { font-family: 'HelveticaNeueLTPro-Th'; src: url('../fonts/HelveticaNeueLTPro-Th ...

  9. 深入理解JAVA中的代理模式

    前言 代理是什么 事故现场:我家的宠物今天生病了,而我又没有相关的医学知识,所以我只让我的宠物多喝热水吗? 结果显然是不行的,所以我需要去找宠物医生这些更专业的人来帮我的宠物治病. 这个时候,代理就出 ...

  10. docker部署archery

    一.centos7部署docker 1 通过 uname -r 命令查看你当前的内核版本 uname -r 2  确保 yum 包更新到最新. yum update 3 卸载旧版本 yum remov ...