Hive 练习简单任务处理

1、2018年4月份的用户数、订单量、销量、GMV （不局限与这些统计量，你也可以自己想一些）

-- -- -- 2018年4月份的用户数量

select

	count(a.user_id) as user_nums

from

	(

		select

			user_id

		from

			app_jypt_m04_ord_det_di

		where

			dt >= '2018-04-01'

			and sale_ord_dt <= '2018-04-30'

			and sale_ord_dt >= '2018-04-01'

		group by

			user_id

	)

	a;

-- 2018年4月份的订单量

select

	count(a.sale_ord_id) as sale_nums

from

	(

		select

			sale_ord_id

		from

			app_jypt_m04_ord_det_di

		where

			dt >= '2018-04-01'

			and sale_ord_dt <= '2018-04-30'

			and sale_ord_dt >= '2018-04-01'

		group by

			sale_ord_id

	)

	a;

-- -- 2018年4月份的销量

select

	sum(COALESCE(sale_qtty, 0)) as xiaoliang

from

	app_jypt_m04_ord_det_di

where

	dt >= '2018-04-01'

	and sale_ord_dt <= '2018-04-30'

	and sale_ord_dt >= '2018-04-01';

-- -- -- 2018年4月份的销售额GMV

-- user_payable_pay_amount 用户应付金额

select

	sum(user_payable_pay_amount) as xiaoshoujine

from

	app_jypt_m04_ord_det_di

where

	dt >= '2018-04-01'

	and sale_ord_dt <= '2018-04-30'

	and sale_ord_dt >= '2018-04-01';

PS:

订单数就是卖了几单；
销量就是卖了多少件，一个订单中可能卖出一件或多件；
GMV: Gross Merchandise Volume，是成交总额（一定时间段内）的意思。
在电商网站定义里面是网站成交金额。这个实际指的是拍下订单金额，包含付款和未付款的部分。

2、上述这些变化量相对3月份的变化

3、计算2018年4月1号的新用户数量（之前半年未购买的用户为新用户）

-- 计算2018年4月1号的新用户数量（之前半年未购买的用户为新用户）

-- 首先找出4月1号的用户的xxx，然后统计半年内有过购买记录的用户yyy。

-- select distinct user_id as xxx from gdm_m04_ord_det_sum where dt>='2018-04-01' and sale_ord_dt='2018-04-01';

-- select distinct user_id as yyy from gdm_m04_ord_det_sum where dt>='2017-10-01' and sale_ord_dt<='2018-03-31' and sale_ord_dt>='2017-10-01';

-- 用xxx-yyy,然后count()计算数量;

-- 两种方法，一种用not in ，一种用not exists

-- not in 方法

select distinct user_id from gdm_m04_ord_det_sum

where user_id not in (select distinct user_id from gdm_m04_ord_det_sum where dt>='2017-10-01' and sale_ord_dt<='2018-03-31' and sale_ord_dt>='2017-10-01');

-- not exists 方法

select distinct user_id from gdm_m04_ord_det_sum where dt>='2018-04-01' and sale_ord_dt='2018-04-01' where not exists (select distinct user_id from gdm_m04_ord_det_sum where dt>='2017-10-01' and sale_ord_dt<='2018-03-31' and sale_ord_dt>='2017-10-01' where gdm_m04_ord_det_sum.user_id=gdm_m04_ord_det_sum.user_id);

-- 另一种 left outer join 这样效率更高 语法有问题？？

select distinct user_id from gdm_m04_ord_det_sum where dt>='2018-04-01' and sale_ord_dt='2018-04-01' a left outer join (select distinct user_id from gdm_m04_ord_det_sum where dt>='2017-10-01' and sale_ord_dt<='2018-03-31' and sale_ord_dt>='2017-10-01' b) on a.user_id=b.user_id where b.user_id is null;

正确方法：

select

	count(a.id1) as user_new_nums

from

	(

		select distinct

			user_id as id1

		from

			app_jypt_m04_ord_det_di

		where

			dt >= '2018-04-01'

			and sale_ord_dt = '2018-04-01'

	)

	a

left outer join

	(

		select distinct

			user_id as id2

		from

			app_jypt_m04_ord_det_di

		where

			dt >= '2017-10-01'

			and sale_ord_dt <= '2018-03-31'

			and sale_ord_dt >= '2017-10-01'

	)

	b

on

	a.id1 = b.id2

where

	b.id2 is null;

Hive 练习简单任务处理的更多相关文章

Hive 的简单使用及调优参考文档
Hive 的简单使用及调优参考文档 HIVE的使用命令行界面使用一下命令查看hive的命令行页面, hive --help --service cli 简化命令为hive –h 会输出下面的这 ...
[转]Hive：简单查询不启用Mapreduce job而启用Fetch task
转自:http://www.iteblog.com/archives/831 如果你想查询某个表的某一列,Hive默认是会启用MapReduce Job来完成这个任务,如下: hive> SEL ...
hive中简单介绍分区表
所介绍内容基本上是翻译官方文档,比较肤浅,如有错误,请指正! hive中创建分区表没有什么复杂的分区类型(范围分区.列表分区.hash分区.混合分区等).分区列也不是表中的一个实际的字段,而是一个或者 ...
[Hive_add_3] Hive 进行简单数据处理
0. 说明通过 Hive 对 duowan 数据进行简单处理 1. 操作流程 1.1 建表 create table duowan(id int, name string, pass string, ...
hive 中简单的udf函数编写
.注册函数,使用using jar方式在hdfs上引用udf库. $hive.注销函数,只需要删除mysql的hive数据记录即可. delete from func_ru ; delete from ...
hive中简单介绍分区表(partition table)——动态分区(dynamic partition)、静态分区(static partition)
一.基本概念 hive中分区表分为:范围分区.列表分区.hash分区.混合分区等. 分区列:分区列不是表中的一个实际的字段,而是一个或者多个伪列.翻译一下是:“在表的数据文件中实际上并不保存分区列的信 ...
Hive之简单查询不启用MapReduce
假设你想查询某个表的某一列.Hive默认是会启用MapReduce Job来完毕这个任务,例如以下: 01 hive> SELECT id, money FROM m limit 10; 02 ...
hive的简单使用
一.一些说明 1.支持的操作 hive 默认不支持updata 和 delete操作 insert也是执行缓慢,主要用于数据的计算 hive 数据类型---字符串,大部分与java一致. 2.内外表的 ...
hadoop生态系统学习之路（六）hive的简单使用
一.hive的基本概念与原理 Hive是基于Hadoop之上的数据仓库,能够存储.查询和分析存储在 Hadoop 中的大规模数据. Hive 定义了简单的类 SQL 查询语言,称为 HQL.它同意熟悉 ...

随机推荐

训练指南 UVALive - 3713 （2-SAT）
layout: post title: 训练指南 UVALive - 3713 (2-SAT) author: "luowentaoaa" catalog: true mathja ...
5、Django实战第5天：首页和登录页面的配置
从这天开始我们需要用到前端源码,需要的朋友可以进行小额打赏(15元),打赏二维码在博客的右侧,打赏后可以凭截图联系463951510@qq.com,博主收到邮件后会立即回复发送所有源码素材,实战过程中 ...
POJ 3494 Largest Submatrix of All 1’s（最大子图形）
[题目链接] http://poj.org/problem?id=3494 [题目大意] 在01矩阵中求最大全1子矩形 [题解] 在处理每个点的时候,继承上一个点等高度下的左右最大扩展, 计算在该层的 ...
【计算几何】【极角序】【二分】bzoj1914 [Usaco2010 OPen]Triangle Counting 数三角形
极角排序后枚举每个点,计算其与原点连线的左侧的半平面内的点与其组成的三角形数(二分/尺取),这些都不是黄金三角形. 补集转化,用平面内所有三角形的个数(C(n,3))减去这些即可. 精度很宽松,几乎不 ...
【博弈论】poj2348 Euclid's Game
假设当前b>a. 一.b%a==0 必胜二.b<2*a,当前我们没有选择的余地,若下一步是必胜(最终能到情况一),则当前必败:反之,当前必胜. 三.b>2*a,假设x是使得b-ax ...
【枚举】bzoj1709 [Usaco2007 Oct]Super Paintball超级弹珠
由于子弹的轨迹是可逆的,因此我们可以枚举所有敌人的位置,然后统计他们能打到的位置,这些位置也就是能打到他们的位置咯. O(n*k). #include<cstdio> using name ...
1.9（java学习笔记）object类及toString()与equals()方法
object类 java中objec是所有类公共的父类,一个类只要没有明显的继承某一类,那么它就是继承object类. 例如 class Person {......};和class Person e ...
[Eclipse]--Error:The superclass "javax.servlet.http.HttpServlet" was not found on the Java Build Path.
一段时间没用eclipse后,再去打开以前的项目,发现一打开前线标红.查看错误的时候,如下图所示: Error:The superclass "javax.servlet.http.Http ...
在PC机上，如何用Chrome浏览器模拟查看和调试手机的HTML5页面？
如题,如何用PC机上的Chrome浏览器模拟查看和调试手机HTML5页面? 参考操作步骤如下: 第一步.用Chrome打开要调试的页面: 第二步.按F12,打开“开发者工具”,点击其右上角的“Dock ...
PHP addslashes() 函数
定义和用法 addslashes() 函数在指定的预定义字符前添加反斜杠. 这些预定义字符是: 单引号 (') 双引号 (") 反斜杠 (\) NULL 语法 addslashes(stri ...

Hive 练习 简单任务处理

Hive 练习 简单任务处理的更多相关文章

随机推荐

热门专题

Hive 练习简单任务处理

Hive 练习简单任务处理的更多相关文章