hadoop之hive高级操作
在输出结果较多,需要输出到文件中时,可以在hive CLI之外执行hive -e "sql" > output.txt操作
但当SQL语句太长或太多时,这种方式不是很方便,可以考虑将SQL语句存为sql.hql文件中,然后执行 hive -f sql.hql >output.txt操作
如果是多个语句,且要输出到多个文件,只好把SQL写在shell脚本中,下面附一个例子
start_day=$
end_day=$
start_date=`date +"%Y-%m-%d" -d "${start_day}"`
end_date=`date +"%Y-%m-%d" -d "${end_day}"` active="
use ycappdata;
select ctl_dt,'active' ,count(distinct dvid) from sa_daydau_detail
where ctl_dt between '${start_date}' and '${end_date}'
group by ctl_dt,'active' ;" loss="
use ycappdata;
select date_add(from_unixtime(unix_timestamp(lastactivedate,'yyyy/MM/dd hh:mm:ss'),'yyyy-MM-dd'),),'loss' ,count(distinct deviceid) from ext_db_apploginstats
where from_unixtime(unix_timestamp(lastactivedate,'yyyy/MM/dd hh:mm:ss'),'yyyy-MM-dd') between date_sub('${start_date}',) and date_sub('${end_date}',)
group by date_add(from_unixtime(unix_timestamp(lastactivedate,'yyyy/MM/dd hh:mm:ss'),'yyyy-MM-dd'),),'loss';" active_month_distribute="
use ycappdata;
select a.ctl_dt,'active_month_distribute',concat('m',month(start_dt)),count(distinct b.dvid) from
(select ctl_dt,dvid from sa_daydau_detail where ctl_dt between '${start_date}' and '${end_date}')a
left outer join
(select start_dt,dvid from sa_firststartdate_dvid where start_dt between '2017-01-01' and '${end_date}')b
on lower(a.dvid)=lower(b.dvid)
group by a.ctl_dt,'active_month_distribute',concat('m',month(start_dt)) ;" active_date_distribute="
use ycappdata;
select a.ctl_dt,'active_date_distribute',
case when datediff(a.ctl_dt,b.start_dt)= then 'd0' when datediff(a.ctl_dt,b.start_dt)<= then 'd30'
when datediff(a.ctl_dt,b.start_dt)<= then 'd60' when datediff(a.ctl_dt,b.start_dt)<= then 'd90'
when datediff(a.ctl_dt,b.start_dt)<= then 'd120' when datediff(a.ctl_dt,b.start_dt)<= then 'd150'
when datediff(a.ctl_dt,b.start_dt)<= then 'd180' else 'd181' end,count(distinct b.dvid) from
(select ctl_dt,dvid from sa_daydau_detail where ctl_dt between '${start_date}' and '${end_date}')a
left outer join
(select start_dt,dvid from sa_firststartdate_dvid where start_dt between '2017-01-01' and '${end_date}')b
on lower(a.dvid)=lower(b.dvid)
group by a.ctl_dt,'active_date_distribute',case when datediff(a.ctl_dt,b.start_dt)= then 'd0' when datediff(a.ctl_dt,b.start_dt)<= then 'd30'
when datediff(a.ctl_dt,b.start_dt)<= then 'd60' when datediff(a.ctl_dt,b.start_dt)<= then 'd90'
when datediff(a.ctl_dt,b.start_dt)<= then 'd120' when datediff(a.ctl_dt,b.start_dt)<= then 'd150'
when datediff(a.ctl_dt,b.start_dt)<= then 'd180' else 'd181' end ;" hive -e "${active}" >> app_operate.txt
hive -e "${loss}" >> app_operate.txt
hive -e "${active_month_distribute}" >> app_operate.txt
hive -e "${active_date_distribute}" >> app_operate.txt while [ ${start_day} -le ${end_day} ]
do
current_date=`date +"%Y-%m-%d" -d "${start_day}"` week_active="
use ycappdata;
select '${current_date}','week_active',count(distinct dvid) from sa_daydau_detail
where ctl_dt between date_sub('${current_date}',pmod(datediff('${current_date}', '2017-01-02'), )) and '${current_date}'
group by '${current_date}','week_active'; " month_active="
use ycappdata;
select '${current_date}','month_active',count(distinct dvid) from sa_daydau_detail
where ctl_dt between trunc('${current_date}','MM') and '${current_date}'
group by '${current_date}','month_active'; " active_active_distribute="
use ycappdata;
select '${current_date}','active_active_distribute',concat('d',days),count(distinct ab.dvid) from
(select b.dvid,count(distinct b.ctl_dt) as days from
(select ctl_dt,dvid from sa_daydau_detail
where ctl_dt='${current_date}')a
join
(select ctl_dt,dvid from sa_daydau_detail
where ctl_dt between date_sub('${current_date}',) and '${current_date}')b
on lower(a.dvid)=lower(b.dvid)
group by b.dvid )ab
group by '${current_date}','active_active_distribute',concat('d',days);" newuser_retain="
use ycappdata;
select a.start_dt,'newuser_retain',concat('d',datediff(b.ctl_dt,a.start_dt)),count(distinct b.dvid) from
(select start_dt,dvid from sa_firststartdate_dvid
where start_dt between date_sub('${current_date}',) and '${current_date}')a
left outer join
(select ctl_dt,dvid from sa_daydau_detail
where ctl_dt between date_sub('${current_date}',) and '${current_date}')b
on lower(a.dvid)=lower(b.dvid)
group by a.start_dt,'newuser_retain',concat('d',datediff(b.ctl_dt,a.start_dt)); " active_retain="
use ycappdata;
select a.ctl_dt,'active_retain',concat('d',datediff(b.ctl_dt,a.ctl_dt)),count(distinct b.dvid) from
(select ctl_dt,dvid from sa_daydau_detail
where ctl_dt between date_sub('${current_date}',) and '${current_date}')a
left outer join
(select ctl_dt,dvid from sa_daydau_detail
where ctl_dt between date_sub('${current_date}',) and '${current_date}')b
on lower(a.dvid)=lower(b.dvid)
where a.ctl_dt<=b.ctl_dt
group by a.ctl_dt,'active_retain',concat('d',datediff(b.ctl_dt,a.ctl_dt)); " echo "${week_active}"
echo "${month_active}"
echo "${active_active_distribute}"
echo "${newuser_retain}"
echo "${active_retain}" hive -e "${week_active}" >> app_operate.txt
hive -e "${month_active}" >> app_operate.txt
hive -e "${active_active_distribute}" >> app_operate.txt
hive -e "${newuser_retain}" >> app_operate.txt
hive -e "${active_retain}" >> app_operate.txt
start_day=`date +"%Y%m%d" -d "${start_day} 1 days" `
done
hadoop之hive高级操作的更多相关文章
- Hadoop 上Hive 的操作
数据dept表的准备: --创建dept表 CREATE TABLE dept( deptno int, dname string, loc string) ROW FORMAT DELIMITED ...
- 大数据技术之_08_Hive学习_04_压缩和存储(Hive高级)+ 企业级调优(Hive优化)
第8章 压缩和存储(Hive高级)8.1 Hadoop源码编译支持Snappy压缩8.1.1 资源准备8.1.2 jar包安装8.1.3 编译源码8.2 Hadoop压缩配置8.2.1 MR支持的压缩 ...
- 初识Hadoop、Hive
2016.10.13 20:28 很久没有写随笔了,自打小宝出生后就没有写过新的文章.数次来到博客园,想开始新的学习历程,总是被各种琐事中断.一方面确实是最近的项目工作比较忙,各个集群频繁地上线加多版 ...
- Hadoop之Hive篇
想了解Hadoop整体结构及各框架角色建议飞入这篇文章,写的很好:http://www.open-open.com/lib/view/open1385685943484.html .以下文章是本人参考 ...
- 大数据技术生态圈形象比喻(Hadoop、Hive、Spark 关系)
[摘要] 知乎上一篇很不错的科普文章,介绍大数据技术生态圈(Hadoop.Hive.Spark )的关系. 链接地址:https://www.zhihu.com/question/27974418 [ ...
- hadoop记录-hive常见设置
分区表 set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict;create tabl ...
- hadoop安装hive及java调用hive
1.安装hive 在安装hive前,请确保已经安装好了hadoop,如未安装,请参考centoos 安装hadoop集群进行安装: 1.1.下载,解压 下载hive2.1.1:http://mirr ...
- HIVE简单操作
1.hive命令登录HIVE数据库后,执行show databases;命令可以看到hive数据库中有一个默认的default数据库. [root@hadoop hive]# hive Logging ...
- Hadoop生态圈-Hive快速入门篇之HQL的基础语法
Hadoop生态圈-Hive快速入门篇之HQL的基础语法 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 本篇博客的重点是介绍Hive中常见的数据类型,DDL数据定义,DML数据操作 ...
随机推荐
- 多线程下使用使用 UniDAC+MSSQL 需要注意的问题(使用CoInitialize)
ADO线程不安全,UniDAC 在使用MSSQL也是如此.其实这是微软COM问题,不怪Devart公司. 一般解决方法是在线程开始启用 CoInitialize(nil),线程结束调用 CoUnini ...
- win7(64位)彻底卸载mysql,重装不再烦恼
[此文出身]鄙人mysql呆鸟,一时手残卸载了mysql,之后重装,始终在配置的时候始终无法通过(如图),纠结一天之久! 查看大图 在某人的鄙视鞭策下,通过度娘的多种指导及自身的多次尝试,终于在下班前 ...
- CentOS 7 部署 ASP.NET Core 应用程序
原文:CentOS 7 部署 ASP.NET Core 应用程序 看了几篇大牛写的关于 Linux 部署 ASP.NET Core 程序的文章,今天来实战演练一下.2017年最后一个工作日,提前预祝大 ...
- WPF依赖属性(续)(1)
原文:WPF依赖属性(续)(1) 之前有写过几篇文章,详细地介绍了依赖属性的基本使用方法,如果你不想了解其内部实现机制的话,那么通过那两篇文章的介绍,足以应付平时的应用 ...
- 详尽分析世纪之战:360VS腾讯是两个阶层的抗争
很不错的一篇文字 分析的也很透彻 [转自中国移动http://labs.chinamobile.com/] 来源:搜狐IT 作者:吃熊掌的鱼 2010-11-01 10:11:51 [ 13967阅 ...
- DDD实战9 经销商领域上下文
1.创建Dealer.Domain 类库项目 2.创建实体和值对象 3.安装ef的包 4.创建上下文接口(IDealerContext)之所以要创建上下文接口,是为了可替换,在其他项目总使用接口,当需 ...
- matlab GUI 编程
matlab 语法的简便,在 GUI 上也不遑多让呀: uigetfile [filename, pathname] = uigetfile('*.m', 'choose a m file') 1. ...
- gitlab 添加文件到新建git库
1. 账号拥有master权限 2.执行操作 git clone git@IP:Group/project.gitcd projecttouch README.mdgit add README.mdg ...
- Linux 系统安装(5分钟)
安装版本:CentOS 6.5 minimal 虚拟机工具:VMware 虚拟机配置:1核2线程 2G内存 50G硬盘 步骤: 一.虚拟机配置 1.打开VMware,创建新的虚拟机,选择典型安装: 2 ...
- .net core 2.0 读取配置文件
1.引用Microsoft.Extensions.Configuration2.在Startup中注入服务 public static IConfiguration Configuration { g ...