【Hadoop】HIVE 小结概览

一、HIVE概览小结

二、HIVE安装

Hive只在一个节点上安装即可

.上传tar包

.解压

    tar -zxvf hive-0.9..tar.gz -C /cloud/

.配置mysql metastore（切换到root用户）

    配置HIVE_HOME环境变量

    rpm -qa | grep mysql

    rpm -e mysql-libs-5.1.-.el6_3.i686 --nodeps

    rpm -ivh MySQL-server-5.1.-.glibc23.i386.rpm

    rpm -ivh MySQL-client-5.1.-.glibc23.i386.rpm

    修改mysql的密码

    /usr/bin/mysql_secure_installation

    （注意：删除匿名用户，允许用户远程连接）

    登陆mysql

    mysql -u root -p

.配置hive

    cp hive-default.xml.template hive-site.xml

    修改hive-site.xml（删除所有内容，只留一个<property></property>）

    添加如下内容：

    <property>

      <name>javax.jdo.option.ConnectionURL</name>

      <value>jdbc:mysql://weekend01:3306/hive?createDatabaseIfNotExist=true</value>

      <description>JDBC connect string for a JDBC metastore</description>

    </property>

    <property>

      <name>javax.jdo.option.ConnectionDriverName</name>

      <value>com.mysql.jdbc.Driver</value>

      <description>Driver class name for a JDBC metastore</description>

    </property>

    <property>

      <name>javax.jdo.option.ConnectionUserName</name>

      <value>root</value>

      <description>username to use against metastore database</description>

    </property>

    <property>

      <name>javax.jdo.option.ConnectionPassword</name>

      <value>root</value>

      <description>password to use against metastore database</description>

    </property>

.安装hive和mysq完成后，将mysql的连接jar包拷贝到$HIVE_HOME/lib目录下

    如果出现没有权限的问题，在mysql授权(在安装mysql的机器上执行)

    mysql -uroot -p

    #(执行下面的语句  *.*:所有库下的所有表   %：任何IP地址或主机都可以连接)

    GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '' WITH GRANT OPTION;

    FLUSH PRIVILEGES;

.建表(默认是内部表)

    create table trade_detail(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t';

    建分区表

    create table td_part(id bigint, account string, income double, expenses double, time string) partitioned by (logdate string) row format delimited fields terminated by '\t';

    建外部表

    create external table td_ext(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t' location '/td_ext';

.创建分区表

    普通表和分区表区别：有大量数据增加的需要建分区表

    create table book (id bigint, name string) partitioned by (pubdate string) row format delimited fields terminated by '\t'; 

    分区表加载数据

    load data local inpath './book.txt' overwrite into table book partition (pubdate='2010-08-22');

    load data local inpath '/root/data.am' into table beauty partition (nation="USA");

    select nation, avg(size) from beauties group by nation order by avg(size);

三、HIVE基础

.上传hive安装包

.解压

.配置

    .1安装mysql

        查询以前安装的mysql相关包

        rpm -qa | grep mysql

        暴力删除这个包

        rpm -e mysql-libs-5.1.-.el6_3.i686 --nodeps

        rpm -ivh MySQL-server-5.1.-.glibc23.i386.rpm

        rpm -ivh MySQL-client-5.1.-.glibc23.i386.rpm

        执行命令设置mysql

        /usr/bin/mysql_secure_installation

        将hive添加到环境变量当中

        GRANT ALL PRIVILEGES ON hive.* TO 'root'@'%' IDENTIFIED BY '' WITH GRANT OPTION;

        FLUSH PRIVILEGES

        在hive当中创建两张表

        create table trade_detail (id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t';

        create table user_info (id bigint, account string, name  string, age int) row format delimited fields terminated by '\t';

        将mysq当中的数据直接导入到hive当中

        sqoop import --connect jdbc:mysql://192.168.1.10:3306/itcast --username root --password 123 --table trade_detail --hive-import --hive-overwrite --hive-table trade_detail --fields-terminated-by '\t'

        sqoop import --connect jdbc:mysql://192.168.1.10:3306/itcast --username root --password 123 --table user_info --hive-import --hive-overwrite --hive-table user_info --fields-terminated-by '\t'

        创建一个result表保存前一个sql执行的结果

        create table result row format delimited fields terminated by '\t' as select t2.account, t2.name, t1.income, t1.expenses, t1.surplus from user_info t2 join (select account, sum(income) as income, sum(expenses) as expenses, sum(income-expenses) as surplus from trade_detail group by account) t1 on (t1.account = t2.account);

        create table user (id int, name string) row format delimited fields terminated by '\t'

        将本地文件系统上的数据导入到HIVE当中

        load data local inpath '/root/user.txt' into table user;

        创建外部表

        create external table stubak (id int, name string) row format delimited fields terminated by '\t' location '/stubak';

        创建分区表

        普通表和分区表区别：有大量数据增加的需要建分区表

        create table book (id bigint, name string) partitioned by (pubdate string) row format delimited fields terminated by '\t'; 

        分区表加载数据

        load data local inpath './book.txt' overwrite into table book partition (pubdate='2010-08-22');

四、HIVE SQL

set hive.cli.print.header=true;

CREATE TABLE page_view(viewTime INT, userid BIGINT,

     page_url STRING, referrer_url STRING,

     ip STRING COMMENT 'IP Address of the User')

 COMMENT 'This is the page view table'

 PARTITIONED BY(dt STRING, country STRING)

 ROW FORMAT DELIMITED

   FIELDS TERMINATED BY '\001'

STORED AS SEQUENCEFILE;   TEXTFILE

//sequencefile

create table tab_ip_seq(id int,name string,ip string,country string)

    row format delimited

    fields terminated by ','

    stored as sequencefile;

insert overwrite table tab_ip_seq select * from tab_ext;

//create & load

create table tab_ip(id int,name string,ip string,country string)

    row format delimited

    fields terminated by ','

    stored as textfile;

load data local inpath '/home/hadoop/ip.txt' into table tab_ext;

//external

CREATE EXTERNAL TABLE tab_ip_ext(id int, name string,

     ip STRING,

     country STRING)

 ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

 STORED AS TEXTFILE

 LOCATION '/external/hive';

// CTAS  用于创建一些临时表存储中间结果

CREATE TABLE tab_ip_ctas

   AS

SELECT id new_id, name new_name, ip new_ip,country new_country

FROM tab_ip_ext

SORT BY new_id;

//insert from select   用于向临时表中追加中间结果数据

create table tab_ip_like like tab_ip;

insert overwrite table tab_ip_like

    select * from tab_ip;

//CLUSTER <--相对高级一点，你可以放在有精力的时候才去学习>

create table tab_ip_cluster(id int,name string,ip string,country string)

clustered by(id) into  buckets;

load data local inpath '/home/hadoop/ip.txt' overwrite into table tab_ip_cluster;

set hive.enforce.bucketing=true;

insert into table tab_ip_cluster select * from tab_ip;

select * from tab_ip_cluster tablesample(bucket  out of  on id); 

//PARTITION

create table tab_ip_part(id int,name string,ip string,country string)

    partitioned by (part_flag string)

    row format delimited fields terminated by ',';

load data local inpath '/home/hadoop/ip.txt' overwrite into table tab_ip_part

     partition(part_flag='part1');

load data local inpath '/home/hadoop/ip_part2.txt' overwrite into table tab_ip_part

     partition(part_flag='part2');

select * from tab_ip_part;

select * from tab_ip_part  where part_flag='part2';

select count(*) from tab_ip_part  where part_flag='part2';

alter table tab_ip change id id_alter string;

ALTER TABLE tab_cts ADD PARTITION (partCol = 'dt') location '/external/hive/dt';

show partitions tab_ip_part;

//write to hdfs

insert overwrite local directory '/home/hadoop/hivetemp/test.txt' select * from tab_ip_part where part_flag='part1';

insert overwrite directory '/hiveout.txt' select * from tab_ip_part where part_flag='part1';

//array

create table tab_array(a array<int>,b array<string>)

row format delimited

fields terminated by '\t'

collection items terminated by ',';

示例数据

tobenbrone,laihama,woshishui     ,

abc,iloveyou,itcast     ,

select a[] from tab_array;

select * from tab_array where array_contains(b,'word');

insert into table tab_array select array(),array(name,ip) from tab_ext t; 

//map

create table tab_map(name string,info map<string,string>)

row format delimited

fields terminated by '\t'

collection items terminated by ';'

map keys terminated by ':';

示例数据：

fengjie            age:;size:36A;addr:usa

furong        age:;size:39C;addr:beijing;weight:180KG

load data local inpath '/home/hadoop/hivetemp/tab_map.txt' overwrite into table tab_map;

insert into table tab_map select name,map('name',name,'ip',ip) from tab_ext; 

//struct

create table tab_struct(name string,info struct<age:int,tel:string,addr:string>)

row format delimited

fields terminated by '\t'

collection items terminated by ','

load data local inpath '/home/hadoop/hivetemp/tab_st.txt' overwrite into table tab_struct;

insert into table tab_struct select name,named_struct('age',id,'tel',name,'addr',country) from tab_ext;

//cli shell

hive -S -e 'select country,count(*) from tab_ext' > /home/hadoop/hivetemp/e.txt

有了这种执行机制，就使得我们可以利用脚本语言（bash shell,python）进行hql语句的批量执行

select * from tab_ext sort by id desc limit ;

select a.ip,b.book from tab_ext a join tab_ip_book b on(a.name=b.name);

//UDF

select if(id=,first,no-first),name from tab_ext;

hive>add jar /home/hadoop/myudf.jar;

hive>CREATE TEMPORARY FUNCTION my_lower AS 'org.dht.Lower';

select my_upper(name) from tab_ext;

五、HIVE 自定义函数

.要继承org.apache.hadoop.hive.ql.exec.UDF类实现evaluate

自定义函数调用过程：

.添加jar包（在hive命令行里面执行）

hive> add jar /root/NUDF.jar;

.创建临时函数

hive> create temporary function getNation as 'cn.itcast.hive.udf.NationUDF';

.调用

hive> select id, name, getNation(nation) from beauty;

.将查询结果保存到HDFS中

hive> create table result row format delimited fields terminated by '\t' as select * from beauty order by id desc;

hive> select id, getAreaName(id) as name from tel_rec;

create table result row format delimited fields terminated by '\t' as select id, getNation(nation) from beauties;

【Hadoop】HIVE 小结概览的更多相关文章

Hadoop Hive概念学习系列之hive的正则表达式初步（六）
说在前面的话 hive的正则表达式,是非常重要!作为大数据开发人员,用好hive,正则表达式,是必须品! Hive中的正则表达式还是很强大的.数据工作者平时也离不开正则表达式.对此,特意做了个hive ...
Hive创建表格报【Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException】引发的血案
在成功启动Hive之后感慨这次终于没有出现Bug了,满怀信心地打了长长的创建表格的命令,结果现实再一次给了我一棒,报了以下的错误Error, return code 1 from org.apache ...
FineReport中hadoop,hive数据库连接解决方案
1. 描述 Hadoop是个很流行的分布式计算解决方案,Hive是基于hadoop的数据分析工具.一般来说我们对Hive的操作都是通过cli来进行,也就是Linux的控制台,但是,这样做本质上是每个连 ...
hive 使用where条件报错 java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.ppd.ExprWalkerInfo.getConvertedNode
hadoop 版本 2.6.0 hive版本 1.1.1 错误: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.ppd.ExprWalk ...
hadoop+hive使用中遇到的问题汇总
问题排查方式一般的错误,查看错误输出,按照关键字google 异常错误(如namenode.datanode莫名其妙挂了):查看hadoop($HADOOP_HOME/logs)或hive日志 h ...
Hadoop Hive基础sql语法
目录 Hive 是基于Hadoop 构建的一套数据仓库分析系统,它提供了丰富的SQL查询方式来分析存储在Hadoop 分布式文件系统中的数据,可以将结构化的数据文件映射为一张数据库表,并提供完整的 ...
Sqoop是一款开源的工具，主要用于在HADOOP(Hive)与传统的数据库(mysql、oracle...)间进行数据的传递
http://niuzhenxin.iteye.com/blog/1706203 Sqoop是一款开源的工具,主要用于在HADOOP(Hive)与传统的数据库(mysql.postgresql.. ...
Hadoop Hive与Hbase整合+thrift
Hadoop Hive与Hbase整合+thrift 1. 简介 Hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供完整的sql查询功能,可以将sql语句 ...
Hadoop Hive sql语法详解
Hadoop Hive sql语法详解 Hive 是基于Hadoop 构建的一套数据仓库分析系统,它提供了丰富的SQL查询方式来分析存储在Hadoop 分布式文件系统中的数据,可以将结构化的数据文件 ...

随机推荐

linux oracle磁盘满了
最近,查看我们一台linux服务器,发现硬盘空间都已经使用了95%,很是疑惑啊,怎么回事那?难道是数据库文件太大了? Filesystem Size Used Avail Us ...
sql-in和not in
IN .NOT IN这个指令可以让我们依照一或数个不连续 (discrete) 的值的限制之内抓出数据库中的值 in和not in in:存在与...里面的 not in:不存在与..里面的其指令语 ...
Android Fresco (Facebook开源的图片加载管理库)
Fresco是Facebook开源的一个图片加载和管理库. 这里是Fresco的GitHub网址. 同类型的开源库市面有非常多,比如Picasso, Universal Image Loader, G ...
BUAA1389愤怒的DZY（最大值最小化）
http://acm.buaa.edu.cn/problem/1389/ 愤怒的DZY[问题描述]“愤怒的小鸟”如今已经是家喻户晓的游戏了,机智的WJC最近发明了一个类似的新游戏:“愤怒的DZY”.游 ...
json 数据交换格式与java
http://wiki.mbalib.com/wiki/数据交换数据交换是指为了满足不同信息系统之间数据资源的共享需要,依据一定的原则,采取相应的技术,实现不同信息系统之间数据资源共享的过程. 数据 ...
严格遵守“第一级DOM”能够让你避免与兼容性有关的任何问题
1级DOM:1级DOM在1998年10月份成为W3C的提议,由DOM核心与DOM HTML两个模块组成.DOM核心能映射以XML为基础的文档结构,允许获取和操作文档的任意部分.DOM HTML通过添加 ...
JSP 使用
JSP教程: http://www.w3cschool.cc/jsp/jsp-tutorial.html jsp语法: 任何语言都有自己的语法,JAVA中有,JSP虽然是在JAVA上的一种应用,但是依 ...
设置button不同状态下的背景色，即把这个颜色变成图片设置成，背景图片
- (void)setBackgroundColor:(UIColor *)backgroundColor forState:(UIControlState)state { [self setBack ...
dll劫持技术
DLL劫持技术当一个可执行文件运行时,Windows加载器将可执行模块映射到进程的地址空间中,加载器分析可执行模块的输入表,并设法找出任何需要的DLL,并将它们映射到进程的地址空间中. DLL劫持原理 ...
linux+apache url大小写敏感问题
Linux对文件目录大小写敏感,URL大小写敏感会导致网页打不开,解决方法之一是启用Apache的mod_speling.so模块. 1.确认/usr/lib/httpd/modules目录下是否存在 ...

【Hadoop】HIVE 小结概览

【Hadoop】HIVE 小结概览的更多相关文章

随机推荐

热门专题