hadoop之hive高级操作

在输出结果较多，需要输出到文件中时，可以在hive CLI之外执行hive -e "sql" > output.txt操作

但当SQL语句太长或太多时，这种方式不是很方便，可以考虑将SQL语句存为sql.hql文件中，然后执行 hive -f sql.hql >output.txt操作

如果是多个语句，且要输出到多个文件，只好把SQL写在shell脚本中，下面附一个例子

start_day=$

end_day=$

start_date=`date  +"%Y-%m-%d" -d  "${start_day}"`

end_date=`date  +"%Y-%m-%d" -d  "${end_day}"`

active="

use ycappdata;

select ctl_dt,'active' ,count(distinct dvid) from sa_daydau_detail

where ctl_dt between '${start_date}' and '${end_date}'

group by ctl_dt,'active' ;"

loss="

use ycappdata;

select date_add(from_unixtime(unix_timestamp(lastactivedate,'yyyy/MM/dd hh:mm:ss'),'yyyy-MM-dd'),),'loss' ,count(distinct deviceid) from ext_db_apploginstats

where from_unixtime(unix_timestamp(lastactivedate,'yyyy/MM/dd hh:mm:ss'),'yyyy-MM-dd') between date_sub('${start_date}',) and date_sub('${end_date}',)

group by date_add(from_unixtime(unix_timestamp(lastactivedate,'yyyy/MM/dd hh:mm:ss'),'yyyy-MM-dd'),),'loss';"

active_month_distribute="

use ycappdata;

select a.ctl_dt,'active_month_distribute',concat('m',month(start_dt)),count(distinct b.dvid) from

(select ctl_dt,dvid from sa_daydau_detail where ctl_dt between '${start_date}' and '${end_date}')a

left outer join

(select start_dt,dvid from sa_firststartdate_dvid where start_dt between '2017-01-01' and '${end_date}')b

on lower(a.dvid)=lower(b.dvid)

group by a.ctl_dt,'active_month_distribute',concat('m',month(start_dt)) ;"

active_date_distribute="

use ycappdata;

select a.ctl_dt,'active_date_distribute',

case when datediff(a.ctl_dt,b.start_dt)= then 'd0' when datediff(a.ctl_dt,b.start_dt)<= then 'd30'

when datediff(a.ctl_dt,b.start_dt)<= then 'd60'  when datediff(a.ctl_dt,b.start_dt)<= then 'd90'

when datediff(a.ctl_dt,b.start_dt)<= then 'd120' when datediff(a.ctl_dt,b.start_dt)<= then 'd150'

when datediff(a.ctl_dt,b.start_dt)<= then 'd180' else 'd181' end,count(distinct b.dvid) from

(select ctl_dt,dvid from sa_daydau_detail where ctl_dt between '${start_date}' and '${end_date}')a

left outer join

(select start_dt,dvid from sa_firststartdate_dvid where start_dt between '2017-01-01' and '${end_date}')b

on lower(a.dvid)=lower(b.dvid)

group by a.ctl_dt,'active_date_distribute',case when datediff(a.ctl_dt,b.start_dt)= then 'd0'  when datediff(a.ctl_dt,b.start_dt)<= then 'd30'

when datediff(a.ctl_dt,b.start_dt)<= then 'd60'   when datediff(a.ctl_dt,b.start_dt)<= then 'd90'

when datediff(a.ctl_dt,b.start_dt)<= then 'd120' when datediff(a.ctl_dt,b.start_dt)<= then 'd150'

when datediff(a.ctl_dt,b.start_dt)<= then 'd180' else 'd181' end ;"

hive -e "${active}"                  >> app_operate.txt

hive -e "${loss}"                    >> app_operate.txt

hive -e "${active_month_distribute}" >> app_operate.txt

hive -e "${active_date_distribute}"  >> app_operate.txt

while [ ${start_day} -le ${end_day} ]

do

current_date=`date  +"%Y-%m-%d" -d  "${start_day}"`

week_active="

use ycappdata;

select '${current_date}','week_active',count(distinct dvid)  from sa_daydau_detail

where ctl_dt between date_sub('${current_date}',pmod(datediff('${current_date}', '2017-01-02'), )) and '${current_date}'

group by '${current_date}','week_active';  "

month_active="

use ycappdata;

select '${current_date}','month_active',count(distinct dvid)  from sa_daydau_detail

where ctl_dt between trunc('${current_date}','MM') and '${current_date}'

group by '${current_date}','month_active';  "

active_active_distribute="

use ycappdata;

select '${current_date}','active_active_distribute',concat('d',days),count(distinct ab.dvid) from

(select b.dvid,count(distinct b.ctl_dt) as days from

(select ctl_dt,dvid from sa_daydau_detail

where ctl_dt='${current_date}')a

join

(select ctl_dt,dvid from sa_daydau_detail

where ctl_dt between date_sub('${current_date}',) and '${current_date}')b

on  lower(a.dvid)=lower(b.dvid)

group by b.dvid )ab

group by '${current_date}','active_active_distribute',concat('d',days);"

newuser_retain="

use ycappdata;

select a.start_dt,'newuser_retain',concat('d',datediff(b.ctl_dt,a.start_dt)),count(distinct b.dvid) from

(select start_dt,dvid from sa_firststartdate_dvid

where start_dt between date_sub('${current_date}',) and '${current_date}')a

left outer join

(select ctl_dt,dvid from sa_daydau_detail

where ctl_dt between date_sub('${current_date}',) and '${current_date}')b

on lower(a.dvid)=lower(b.dvid)

group by a.start_dt,'newuser_retain',concat('d',datediff(b.ctl_dt,a.start_dt)); "

active_retain="

use ycappdata;

select a.ctl_dt,'active_retain',concat('d',datediff(b.ctl_dt,a.ctl_dt)),count(distinct b.dvid) from

(select ctl_dt,dvid from sa_daydau_detail

where ctl_dt between date_sub('${current_date}',) and '${current_date}')a

left outer join

(select ctl_dt,dvid from sa_daydau_detail

where ctl_dt between date_sub('${current_date}',) and '${current_date}')b

on lower(a.dvid)=lower(b.dvid)

where a.ctl_dt<=b.ctl_dt

group by a.ctl_dt,'active_retain',concat('d',datediff(b.ctl_dt,a.ctl_dt)); "

echo "${week_active}"

echo "${month_active}"

echo "${active_active_distribute}"

echo "${newuser_retain}"

echo "${active_retain}" 

hive -e "${week_active}"                 >> app_operate.txt

hive -e "${month_active}"                >> app_operate.txt

hive -e "${active_active_distribute}"    >> app_operate.txt

hive -e "${newuser_retain}"              >> app_operate.txt

hive -e "${active_retain}"               >> app_operate.txt

start_day=`date  +"%Y%m%d" -d  "${start_day} 1 days" `

done

hadoop之hive高级操作的更多相关文章

Hadoop 上Hive 的操作
数据dept表的准备: --创建dept表 CREATE TABLE dept( deptno int, dname string, loc string) ROW FORMAT DELIMITED ...
大数据技术之_08_Hive学习_04_压缩和存储（Hive高级）+ 企业级调优（Hive优化）
第8章压缩和存储(Hive高级)8.1 Hadoop源码编译支持Snappy压缩8.1.1 资源准备8.1.2 jar包安装8.1.3 编译源码8.2 Hadoop压缩配置8.2.1 MR支持的压缩 ...
初识Hadoop、Hive
2016.10.13 20:28 很久没有写随笔了,自打小宝出生后就没有写过新的文章.数次来到博客园,想开始新的学习历程,总是被各种琐事中断.一方面确实是最近的项目工作比较忙,各个集群频繁地上线加多版 ...
Hadoop之Hive篇
想了解Hadoop整体结构及各框架角色建议飞入这篇文章,写的很好:http://www.open-open.com/lib/view/open1385685943484.html .以下文章是本人参考 ...
大数据技术生态圈形象比喻（Hadoop、Hive、Spark 关系）
[摘要] 知乎上一篇很不错的科普文章,介绍大数据技术生态圈(Hadoop.Hive.Spark )的关系. 链接地址:https://www.zhihu.com/question/27974418 [ ...
hadoop记录-hive常见设置
分区表 set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict;create tabl ...
hadoop安装hive及java调用hive
1.安装hive 在安装hive前,请确保已经安装好了hadoop,如未安装,请参考centoos 安装hadoop集群进行安装: 1.1.下载,解压下载hive2.1.1:http://mirr ...
HIVE简单操作
1.hive命令登录HIVE数据库后,执行show databases;命令可以看到hive数据库中有一个默认的default数据库. [root@hadoop hive]# hive Logging ...
Hadoop生态圈-Hive快速入门篇之HQL的基础语法
Hadoop生态圈-Hive快速入门篇之HQL的基础语法作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 本篇博客的重点是介绍Hive中常见的数据类型,DDL数据定义,DML数据操作 ...

随机推荐

.net程序运行流程
程序员用.net开发的程序要在计算机上运行,首先程序经过编译后,会生成机器指令,一般以一个文件的形式保存,这个文件在外存储器上(存储器分外存与内存.外存:硬盘,U盘等:) 然后cpu会把硬盘上的文件读 ...
Maven软件项目管理工具
http://my.oschina.net/jgy/blog/125503 拷贝mavne安装文件夹conf以下的settings.xml到用户主文件夹下改动改文件 <localReposit ...
Method for browsing internet of things and apparatus using the same
A method for browsing Internet of things (IoT) and an apparatus using the same are provided. In the ...
Windows10内置ubuntu子系统安装后中文环境设置
原文:Windows10内置ubuntu子系统安装后中文环境设置第一开启相关设置,使用小娜(Win键+c)直接查找关键字打开更快. ①设置→查找"针对开发人员"→开发人员模式 ...
WPF，Silverlight与XAML读书笔记第三十九 - 可视化效果之3D图形
原文:WPF,Silverlight与XAML读书笔记第三十九 - 可视化效果之3D图形说明:本系列基本上是<WPF揭秘>的读书笔记.在结构安排与文章内容上参照<WPF揭秘> ...
TOP计划猿10最佳实践文章
本文转自:EETproject教师专辑 http://forum.eet-cn.com/FORUM_POST_10011_1200263220_0.HTM?click_from=8800111934, ...
BZOJ 3329 Xorequ 数字DP+矩阵乘法
标题效果:特定n,乞讨[1,n]内[1,2^n]差多少x满足x^3x=2x x^3x=2x相当于x^2x = 3x 和3x=x+2x 和2x=x<<1 因此x满足条件IFFx&(x ...
Nginx之Eclipse开发环境配置
C开发的IDE很多,为什么使用Eclipse?原因: 1. 历史原因:使用eclipse时间长,比较熟悉. 2. 功能原因:使用eclipse查看源码,可以在各个函数与头文件间直接跳转.这是所谓号称& ...
WPF Dispatcher的使用
<Window x:Class="DispatcherExam.MainWindow" xmlns="http://schemas.micro ...
Linux ssh密钥自动登录专题
在开发中,经常需要从一台主机ssh登陆到另一台主机去,每次都需要输一次login/Password,很繁琐.使用密钥登陆就可以不用输入用户名和密码了实现从主机A免密码登陆到主机B(即把主机A的pub ...

hadoop之hive高级操作

hadoop之hive高级操作的更多相关文章

随机推荐

热门专题