Hive 数仓中常见的日期转换操作
(1)Hive 数仓中一些常用的dt与日期的转换操作
下面总结了自己工作中经常用到的一些日期转换,这类日期转换经常用于报表的时间粒度和统计周期的控制中
日期变换:
(1)dt转日期
to_date(from_unixtime(unix_timestamp('${dt}','yyyyMMdd')))
(2)日期转dt
regexp_replace('${date}','-','')
(3)dt转当月1号日期
to_date(from_unixtime(unix_timestamp(concat(substr('${dt}',1,6),'01'),'yyyyMMdd')))
trunc(to_date(from_unixtime(unix_timestamp('${dt}','yyyyMMdd'))),'MM')
-- 下月1号日期
trunc(add_months(to_date(from_unixtime(unix_timestamp('${dt}','yyyyMMdd'))),1),'MM')
(4)dt转当周星期一日期
next_day(date_add(to_date(from_unixtime(unix_timestamp('${dt}','yyyyMMdd'))), -7), 'Mo')
date_sub(next_day(to_date(from_unixtime(unix_timestamp('${dt}','yyyyMMdd'))),'MO'),7)
-- 下周星期一日期
next_day(to_date(from_unixtime(unix_timestamp('${dt}','yyyyMMdd'))),'MO')
(5)dt前六天日期(dt为星期天时得到的是本周周一的日期)
date_add(to_date(from_unixtime(unix_timestamp('${dt}','yyyyMMdd'))), -6)
(5)dt转当季第一天日期
if(length(floor(substr('${dt}',5,2)/3.1)*3+1)=1,concat(substr('${dt}',1,4),'-0',floor(substr('${dt}',5,2)/3.1)*3+1,'-01'),concat(substr('${dt}',1,4),'-',floor(substr('${dt}',5,2)/3.1)*3+1,'-01'))
(6)dt转半年第一天日期
if(length(floor(substr('${dt}',5,2)/6.1)*6+1)=1,concat(substr('${dt}',1,4),'-0',floor(substr('${dt}',5,2)/6.1)*6+1,'-01'),concat(substr('${dt}',1,4),'-',floor(substr('${dt}',5,2)/6.1)*6+1,'-01'))
(7)dt转当年1号日期
concat(substr('${dt}',1,4),'-01-01')(8)在同时有日周月粒度时要注意数据的时间范围,有时每月的第一个自然周会跨月,比如2019年3月的第一周的日期是20190225-20190303where agent_business_date between date_add_day('${dt}',-31) and to_date(from_unixtime(unix_timestamp('${dt}','yyyyMMdd')))where dt between regexp_replace(date_add_day('${dt}',-31),'-','') and '${dt}'
------------------------------------------------------------------------------------------
-- 日期维度表表结构edw_public.dim_esf_edw_pub_date
------------------------------------------------------------------------------------------
col_name data_type comment
------------------------------------------------------------------------
calendar_date string 日期,格式为"YYYY-MM-DD"
week_english_name string 星期英文名
week_chinese_name string 星期中文名
day_of_week_number int 所属一周当中的第几天
calendar_month_code string 日期所属月份,格式为"YYYY-MM"
calendar_month_number int 所属月份数字
month_english_name string 月份英文名
month_chinese_name string 月份中文名
day_of_month_number int 所属月份当中的第几天
calendar_quater_code string 日期所属季度,格式为"YYYY-QT"
calendar_quater_number int 所属季度数字
day_of_quater_number int 所属季度当中的第几天
calendar_half_year_code string 日期所属半年,格式为"YYYY-HY"
calendar_half_year_number int 所属半年数字,1为上半年,2为下半年
calendar_year_code string 日期所属年份,格式为"YYYY"
day_of_year_number int 所属年份当中的第几天
work_day_flag string 工作日标志: Y - 是/ N - 否
holiday_flag string 节假日标志: Y - 是/ N - 否
-- 日期维度表的使用
-- 当天日期
SELECT
calendar_date
FROM
edw_public.dim_esf_edw_pub_date
WHERE
calendar_date = regexp_replace('${dt}','(\\d{4})(\\d{2})(\\d{2})','$1-$2-$3')
-- Finereport中日周月季半年年 各周期末日期的算法
select
${if(粒度 == 1," case when date(max(calendar_date))>=date(date_add('day',-1,current_date)) then date(date_add('day',-1,current_date)) else date(max(calendar_date)) end as period_end_date","")}
${if(粒度 == 2," distinct case when day_of_week_number = 1 and date_add('day',6,date(calendar_date)) >=date(date_add('day',-1,current_date)) then date(date_add('day',-1,current_date)) when day_of_week_number = 7 and date(calendar_date) >=date(date_add('day',-1,current_date)) then date(date_add('day',-1,current_date)) when day_of_week_number = 1 then date_add('day',6,date(calendar_date)) when day_of_week_number = 7 then date(calendar_date) else date(calendar_date) end as period_end_date ","")}
${if(粒度 == 3," case when date(max(calendar_date))>=date(date_add('day',-1,current_date)) then date(date_add('day',-1,current_date)) else date(max(calendar_date)) end as period_end_date ","")}
${if(粒度 == 4," case when date(max(calendar_date))>=date(date_add('day',-1,current_date)) then date(date_add('day',-1,current_date)) else date(max(calendar_date)) end as period_end_date ","")}
${if(粒度 == 5," case when date(max(calendar_date))>=date(date_add('day',-1,current_date)) then date(date_add('day',-1,current_date)) else date(max(calendar_date)) end as period_end_date ","")}
${if(粒度 == 6," case when date(max(calendar_date))>=date(date_add('day',-1,current_date)) then date(date_add('day',-1,current_date)) else date(max(calendar_date)) end as period_end_date ","")}
from
edw_public.dim_esf_edw_pub_date
where calendar_date >= '${开始时间}' and calendar_date <= '${结束时间}'
${if(粒度 == 1," group by calendar_date ","")}
${if(粒度 == 2," and day_of_week_number in (1,7) ","")}
${if(粒度 == 3," group by calendar_month_code ","")}
${if(粒度 == 4," group by calendar_quater_code ","")}
${if(粒度 == 5," group by calendar_year_code ","")}
${if(粒度 == 6," group by calendar_half_year_code ","")}
-- Finereport中日周月季半年年 各周期期初期末日期的算法(这种计算方法当前日期是20190330,输入的日期范围是2019-03-01至2091-03-28则输出的月日期范围是2019-03-29)
select
${if(粒度 == 1,"date(calendar_date) as period_start_date, date(calendar_date) as period_end_date ","")}
${if(粒度 == 2,"case when day_of_week_number = 1 then date(calendar_date) when day_of_week_number = 7 then date_add('day',-6, date(calendar_date)) end as period_start_date, case when day_of_week_number = 1 and date_add('day',6, date(calendar_date)) >=date(date_add('day',-1,current_date)) then date(date_add('day',-1,current_date)) when day_of_week_number = 7 and date(calendar_date)>=date(date_add('day',-1,current_date)) then date(date_add('day',-1,current_date)) when day_of_week_number = 1 then date_add('day',6, date(calendar_date)) when day_of_week_number = 7 then date(calendar_date) end as period_end_date ","")}
${if(粒度 == 3,"date(calendar_date) as period_start_date, case when date_add('day',-day(date(calendar_date)),date_add('month',1,(date(calendar_date))))>=date(date_add('day',-1,current_date)) then date(date_add('day',-1,current_date)) else date_add('day',-day(date(calendar_date)),date_add('month',1,(date(calendar_date)))) end as period_end_date ","")}
${if(粒度 == 4,"calendar_date as period_start_date,date_add('day',-1,date_add('month',1,date(substr(calendar_date,1,4)||'-'||cast(cast(floor(cast(substr(calendar_date,6,2) as int)/3.1)*3+3 as int) as varchar)||'-01'))) as period_end_date ","")}
${if(粒度 == 5,"date(concat(substr(calendar_date,1,4),'-01','-01')) as period_start_date,case when date(concat(substr(calendar_date,1,4),'-12','-31'))>= date(date_add('day',-1,current_date)) then date(date_add('day',-1,current_date)) else date(concat(substr(calendar_date,1,4),'-12','-31')) end as period_end_date","")}
${if(粒度 == 6,"date(min(calendar_date)) as period_start_date,case when date(max(calendar_date))>= date(date_add('day',-1,current_date)) then date(date_add('day',-1,current_date)) else date(max(calendar_date)) end as period_end_date","")}
from
edw_public.dim_esf_edw_pub_date
where calendar_date >= '${开始时间}' and calendar_date <= '${结束时间}'
${if(粒度 == 1," and 1 = 1 ","")}
${if(粒度 == 2," and day_of_week_number in (1,7) ","")}
${if(粒度 == 3," and day_of_month_number = 1","")}
${if(粒度 == 4," and day_of_quater_number = 1","")}
${if(粒度 == 5," and day_of_year_number = 1","")}
${if(粒度 == 6," group by calendar_half_year_code ","")}
------------------------------------------------------------------------------------------------
-- 根据输入的时间范围计算期末日期
------------------------------------------------------------------------------------------------
select t1.*
from
-- 日周月季年半年不同粒度的统计数据各存为了一张表
edw_reports.adm_xf_edw_house_sub_project_report_00${dtype}ly_di t1--日报
join
(
-- 日
SELECT
calendar_date
FROM
edw_public.dim_esf_edw_pub_date
WHERE
calendar_date BETWEEN '${bdt}' AND '${edt}'
AND '${dtype}' = '1_dai'
UNION
-- 月
SELECT
MAX(calendar_date) AS calendar_date
FROM
edw_public.dim_esf_edw_pub_date
WHERE
calendar_date BETWEEN '${bdt}' AND '${edt}'
AND '${dtype}' = '2_dai'
GROUP BY
calendar_month_number
UNION
-- 周
SELECT
calendar_date
FROM
edw_public.dim_esf_edw_pub_date
WHERE
calendar_date BETWEEN '${bdt}' AND '${edt}'
AND day_of_week_number = 7
AND '${dtype}' = '3_dai'
UNION
-- 季
SELECT
MAX(calendar_date) AS calendar_date
FROM
edw_public.dim_esf_edw_pub_date
WHERE
calendar_date BETWEEN '${bdt}' AND '${edt}'
AND '${dtype}' = '4_dai'
GROUP BY
calendar_quater_code
UNION
-- 年
SELECT
MAX(calendar_date) AS calendar_date
FROM
edw_public.dim_esf_edw_pub_date
WHERE
calendar_date BETWEEN '${bdt}' AND '${edt}'
AND '${dtype}' = '5_dai'
GROUP BY
calendar_year_code
UNION
-- 半年
SELECT
MAX(calendar_date) AS calendar_date
FROM
edw_public.dim_esf_edw_pub_date
WHERE
calendar_date BETWEEN '${bdt}' AND '${edt}'
AND '${dtype}' = '6_dai'
GROUP BY
calendar_half_year_code
UNION
SELECT
MAX(calendar_date) AS calendar_date
FROM
edw_public.dim_esf_edw_pub_date
WHERE
calendar_date BETWEEN '${bdt}' AND '${edt}'
ORDER BY
calendar_date
) t2
on t1.statistic_date = t2.calendar_date
where
statistic_date between '${bdt}' and '${edt}'
${if(len(tenant_name) == 0,"","and house_sub_project_organization_short_name = '" + tenant_name + "'")}
${if(len(status) == 0,"","and house_sub_project_cooperation_status_code = " + status)}
${if(len(tenant_type) == 0,"","and house_sub_project_organization_business_type_code= " + tenant_type)}
${if(len(project_type) == 0,"","and house_sub_project_cooperation_type_code= " + project_type)}
order by statistic_date
(2)Hive 计算指定日期在本周的第几天和指定日期的本周指定天数的日期
注意这里需要先明确本周的第一天到底是星期一还是星期天?dayofweek函数定义星期天是一周中的第一天,另外dayofweek在hive2.2.0才开始支持,低版本的hive不支持dayofweek函数,需要使用其他方法实现,请见我的博客Hive和sparksql中的dayofweek
-- 计算指定日期本周的第一天和最后一天
select
day
, dayofweek(day) as dw1
, date_add(day,1 - dayofweek(day)) as Su_s -- 周日_start
, date_add(day,7 - dayofweek(day)) as Sa_e -- 周六_end
, case when dayofweek(day) = 1 then 7 else dayofweek(day) - 1 end as dw2
, date_add(day,1 - case when dayofweek(day) = 1 then 7 else dayofweek(day) - 1 end) as Mo_s -- 周一_start
, date_add(day,7 - case when dayofweek(day) = 1 then 7 else dayofweek(day) - 1 end) as Su_e -- 周日_end
, trunc(day,'YY') as yearly_first_day
, trunc(day,'MM') as monthly_first_day -- 本月1号日期
, last_day(day) as monthly_last_day -- 本月最后一天日期
, date_add(next_day(day,'MO'), -7) as weekly_first_day -- 本周一日期
, next_day(date_add(day, -7),'MO') as weekly_first_day -- 本周一日期
, case when (7 - datediff(next_day(day,'SU'),day)) <> 0 then next_day(day,'SU') else day end as weekly_end_day -- 本周日日期
from (
select '2018-11-01' as day union all
select '2018-11-02' as day union all
select '2018-11-03' as day union all
select '2018-11-04' as day union all
select '2018-11-05' as day union all
select '2018-11-06' as day union all
select '2018-11-07' as day union all
select '2018-11-08' as day union all
select '2018-11-09' as day union all
select '2018-11-10' as day union all
select '2018-11-11' as day union all
select '2018-11-12' as day union all
select '2018-11-13' as day union all
select '2018-11-14' as day union all
select '2018-11-15' as day union all
select '2018-11-16' as day union all
select '2018-11-17' as day union all
select '2018-11-18' as day union all
select '2018-11-19' as day union all
select '2018-11-20' as day union all
select '2018-11-21' as day union all
select '2018-11-22' as day union all
select '2018-11-23' as day union all
select '2018-11-24' as day union all
select '2018-11-25' as day union all
select '2018-11-26' as day union all
select '2018-11-27' as day union all
select '2018-11-28' as day union all
select '2018-11-29' as day union all
select '2018-11-30' as day union all
) t1
;
Hive 数仓中常见的日期转换操作的更多相关文章
- 在HUE中将文本格式的数据导入hive数仓中
今天有一个需求需要将一份文档形式的hft与fdd的城市关系关系的数据导入到hive数仓中,之前没有在hue中进行这项操作(上家都是通过xshell登录堡垒机直接连服务器进行操作的),特此记录一下. - ...
- 使用Oozie中workflow的定时任务重跑hive数仓表的历史分期调度
在数仓和BI系统的开发和使用过程中会经常出现需要重跑数仓中某些或一段时间内的分区数据,原因可能是:1.数据统计和计算逻辑/口径调整,2.发现之前的埋点数据收集出现错误或者埋点出现错误,3.业务数据库出 ...
- Hive数仓
分层设计 ODS(Operational Data Store):数据运营层 "面向主题的"数据运营层,也叫ODS层,是最接近数据源中数据的一层,数据源中的数据,经过抽取.洗净. ...
- js中时间戳与日期转换-js日期操作
常用的一些日期操作. 用js获取一个时间戳. <script type="text/javascript"> var date = new Date();//当前时间 ...
- Hive数仓之快速入门(二)
上次已经讲了<Hive数据仓库之快速入门一>不记得的小伙伴可以点击回顾一下,接下来我们再讲Hive数据仓库之快速入门二 DQL hive中的order by.distribute by.s ...
- hive数仓客户端界面工具
1.Hive的官网上介绍了三个可以在Windows中通过JDBC连接HiveServer2的图形界面工具,包括:SQuirrel SQL Client.Oracle SQL Developer以及Db ...
- 一文读懂数仓中的pg_stat
摘要:GaussDB(DWS)在SQL执行过程中,会记录表增删改查相关的运行时统计信息,并在事务提交或回滚后记录到共享的内存中.这些信息可以通过 "pg_stat_all_tables视图& ...
- python中常见的日期换算
time模块提供各种操作时间的函数 说明:一般有两种表示时间的方式: 第一种是时间戳的方式(相对于1970.1.1 00:00:00以秒计算的偏移量),时间戳是惟一的 第二种 ...
- 大数据学习——hive数仓DML和DDL操作
1 创建一个分区表 create table t_partition001(ip string,duration int) partitioned by(country string) row for ...
随机推荐
- 写一致性原理以及quorum机制
(1)consistency,one(primary shard),all(all shard),quorum(default)我们在发送任何一个增删改操作的时候,比如 PUT /index/type ...
- 微星笔记本每次都进bios
解决方法 bios中更改启动模式,要更改为LEGACY
- what's the python之变量、基本数据类型
what's the 变量? Python 中的变量赋值不需要类型声明. 变量在内存中创建,包括变量的标识,名称和数据. 变量在使用前都必须赋值,变量赋值以后该变量才会被创建. 等号(=)用来给变量赋 ...
- Unity3D加密流程文档
一键搞定源代码.资源等保护 代码被反编译破解,无可奈何? Unity3D 开发的软件程序,无论使用虚拟化,还是混淆,都无法抵抗住黑客丧心病狂的破解,轻松被反编译,令开发商无比头疼. 图片等资源保护被盗 ...
- vue中less的使用
1.安装:npm install less less-loader --save 2.修改webpack.config.js文件,配置loader加载依赖,让其支持外部的less,在原来的代码上添加 ...
- yarn client中的一个BUG的修复
org.apache.spark.deploy.yarn.Client.scala中的monitorApplication方法: /** * Report the state of an applic ...
- Hyperledger Fabric CA的命令行用法
介绍Hyperledger Fabric CA的命令行方式简单用法 Hyperledger Fabric CA由server和client两部分组成. 设置两个环境变量 export FABRIC_C ...
- Django 框架 Form组件
一.Form组件简介 Form组件是django中一个非常强大的组件,在处理服务端和前端的交互上大大的提高了开发人员的开发速度. Form组件的功能: 用于处理前后端的数据认证(显示错误信息) 用于生 ...
- 二叉树df
二叉树 最有搜索算法 打印偶节点 不要用递归
- Centos的升级与更新
系统升级(6.5->7.2): 这里拿Centos6升级到Centos7为例: 1.查看当前CentOS版本cat /etc/redhat-release 2.更新源vim /etc/yum.r ...