〇、概述

1、所需资料

窗口函数实现组内百分比、累计值、累计百分比:https://blog.csdn.net/weixin_39751959/article/details/88828922

2、背景

需求:不同场景不同规则下各区间内基线值的计算和MQ发送

计算位于场景列表内的各场景组合(scene),满足不同规则(rule)某区间dataRange(如20%-80%)的基线平均值

其他场景,计算平均数作为基线值

一、概述

1、输入信息

传入参数:

{"rules":
[{"dataRange":[20,80],"ruleTypeName":"标准基线","duration":30,"ruleType":"1","ruleId":"123"},
{"dataRange":[0,20],"ruleTypeName":"管理基线","duration":60,"ruleType":"2","ruleId":"234"},
{"dataRange":[80,100],"ruleTypeName":"异常基线","duration":90,"ruleType":"3","ruleId":"123"}],
"modules":
[{"moduleNumber":"ltc_contract_basic_info","moduleName":"合同基本信息","calField":"create_time","nextFields":["ltc_contract_basic_info_id"]},
{"moduleNumber":"ltc_contract_assess_basic_info","moduleName":"合同评审信息","preFields":["ltc_contract_basic_info_id"],"nextFields":["ltc_contract_assess_basic_info_id"]},
{"moduleNumber":"ltc_contract_assess_record","moduleName":"合同评审记录","preFields":["ltc_contract_assess_basic_info_id"],"calField":"contract_assess_end_date"}],"
scenes":
{"sceneKeys":
[{"values":["010101","010102","0102","0103","0103"],"key":"sales_scenario"},
{"values":["01","02","03","04","05"],"key":"contract_register_type_code"}],
"sceneGroups":
[{"contract_register_type_code":"03","sales_scenario":"0101"},
{"contract_register_type_code":"03","sales_scenario":"020102"}]},
"definition":
{"definitionName":"从注册到评审的时间基线","version":1,"definitionId":"123"}}

2、SQL查询结果

二、实现过程

1、初始信息

根据json可以按照模板生成下列SQL

SELECT
sales_scenario,
contract_register_type_code,
-- ltc_contract_basic_info.create_time,
-- ltc_contract_assess_record.contract_assess_end_date,
DATE_PART('MINUTE', ltc_contract_assess_record.contract_assess_end_date - ltc_contract_basic_info.create_time) subtime,
1 num_every
FROM
ltc_contract_basic_info
LEFT JOIN ltc_contract_assess_basic_info
ON ltc_contract_basic_info.ltc_contract_basic_info_id = ltc_contract_assess_basic_info.ltc_contract_basic_info_id
LEFT JOIN ltc_contract_assess_record
ON ltc_contract_assess_basic_info.ltc_contract_assess_basic_info_id = ltc_contract_assess_record.ltc_contract_assess_basic_info_id
WHERE
ltc_contract_assess_record.contract_assess_end_date > now( ) - INTERVAL '90 days'
and
(sales_scenario,contract_register_type_code) in (('0101','03'),('0102','01'),('020102','03'))

设置根据时间差排序,得到如下结果

下一步思路:计算subtime值的百分比

2、计算组内累计值

sum(num_every) over(partition by sales_scenario,contract_register_type_code order by subtime) rk_combine

整体:

select
sales_scenario,
contract_register_type_code,
subtime,
sum(num_every) over(partition by sales_scenario,contract_register_type_code order by subtime) rk_combine
from (
SELECT
sales_scenario,
contract_register_type_code,
-- ltc_contract_basic_info.create_time,
-- ltc_contract_assess_record.contract_assess_end_date,
DATE_PART('MINUTE', ltc_contract_assess_record.contract_assess_end_date - ltc_contract_basic_info.create_time) subtime,
1 num_every
FROM
ltc_contract_basic_info
LEFT JOIN ltc_contract_assess_basic_info
ON ltc_contract_basic_info.ltc_contract_basic_info_id = ltc_contract_assess_basic_info.ltc_contract_basic_info_id
LEFT JOIN ltc_contract_assess_record
ON ltc_contract_assess_basic_info.ltc_contract_assess_basic_info_id = ltc_contract_assess_record.ltc_contract_assess_basic_info_id
WHERE
ltc_contract_assess_record.contract_assess_end_date > now( ) - INTERVAL '90 days'
and
(sales_scenario,contract_register_type_code) in (('0101','03'),('0102','01'),('020102','03'))
) res_start

3、获得组内最大值

max(rk_combine) over(partition by sales_scenario,contract_register_type_code) max_rk_combine

整体:

select
sales_scenario,
contract_register_type_code,
subtime,
rk_combine,
max(rk_combine) over(partition by sales_scenario,contract_register_type_code) max_rk_combine
from
( select
sales_scenario,
contract_register_type_code,
subtime,
sum(num_every) over(partition by sales_scenario,contract_register_type_code order by subtime) rk_combine
from
(
SELECT
sales_scenario,
contract_register_type_code,
-- ltc_contract_basic_info.create_time,
-- ltc_contract_assess_record.contract_assess_end_date,
DATE_PART('MINUTE', ltc_contract_assess_record.contract_assess_end_date - ltc_contract_basic_info.create_time) subtime,
1 num_every
FROM
ltc_contract_basic_info
LEFT JOIN ltc_contract_assess_basic_info
ON ltc_contract_basic_info.ltc_contract_basic_info_id = ltc_contract_assess_basic_info.ltc_contract_basic_info_id
LEFT JOIN ltc_contract_assess_record
ON ltc_contract_assess_basic_info.ltc_contract_assess_basic_info_id = ltc_contract_assess_record.ltc_contract_assess_basic_info_id
WHERE
ltc_contract_assess_record.contract_assess_end_date > now( ) - INTERVAL '90 days'
and
(sales_scenario,contract_register_type_code) in (('0101','03'),('0102','01'),('020102','03'))
) res_start
) res_middle

4、获得百分比

round(rk_combine/max_rk_combine,2)*100 percent

整体:

(select
sales_scenario,
contract_register_type_code,
subtime,
round(rk_combine/max_rk_combine,2)*100 percent,
null default_value
from
(
select
sales_scenario,
contract_register_type_code,
subtime,
rk_combine,
max(rk_combine) over(partition by sales_scenario,contract_register_type_code) max_rk_combine
from
(
select
sales_scenario,
contract_register_type_code,
subtime,
sum(num_every) over(partition by sales_scenario,contract_register_type_code order by subtime) rk_combine
from
(
SELECT
sales_scenario,
contract_register_type_code,
-- ltc_contract_basic_info.create_time,
-- ltc_contract_assess_record.contract_assess_end_date,
DATE_PART('MINUTE', ltc_contract_assess_record.contract_assess_end_date - ltc_contract_basic_info.create_time) subtime,
1 num_every
FROM
ltc_contract_basic_info
LEFT JOIN ltc_contract_assess_basic_info
ON ltc_contract_basic_info.ltc_contract_basic_info_id = ltc_contract_assess_basic_info.ltc_contract_basic_info_id
LEFT JOIN ltc_contract_assess_record
ON ltc_contract_assess_basic_info.ltc_contract_assess_basic_info_id = ltc_contract_assess_record.ltc_contract_assess_basic_info_id
WHERE
ltc_contract_assess_record.contract_assess_end_date > now( ) - INTERVAL '90 days'
and
(sales_scenario,contract_register_type_code) in (('0101','03'),('0102','01'),('020102','03'))
) res_start
) res_middle
) res_end)

获得位于场景组合的基线均值结果

5、获得默认基线值

(
SELECT
sales_scenario,
contract_register_type_code,
DATE_PART('MINUTE', ltc_contract_assess_record.contract_assess_end_date - ltc_contract_basic_info.create_time) subtime,
0 percent,
avg(DATE_PART('MINUTE', ltc_contract_assess_record.contract_assess_end_date - ltc_contract_basic_info.create_time)) over() default_value
FROM
ltc_contract_basic_info
LEFT JOIN ltc_contract_assess_basic_info
ON ltc_contract_basic_info.ltc_contract_basic_info_id = ltc_contract_assess_basic_info.ltc_contract_basic_info_id
LEFT JOIN ltc_contract_assess_record
ON ltc_contract_assess_basic_info.ltc_contract_assess_basic_info_id = ltc_contract_assess_record.ltc_contract_assess_basic_info_id
WHERE
ltc_contract_assess_record.contract_assess_end_date > now( ) - INTERVAL '90 days'
and
(
sales_scenario is null
or
contract_register_type_code is null
or
(sales_scenario,contract_register_type_code) not in (('0101','03'),('0102','01'),('020102','03')
)
)
)

6、结果组合

(select
sales_scenario,
contract_register_type_code,
subtime,
round(rk_combine/max_rk_combine,2)*100 percent,
null default_value
from
(
select
sales_scenario,
contract_register_type_code,
subtime,
rk_combine,
max(rk_combine) over(partition by sales_scenario,contract_register_type_code) max_rk_combine
from
(
select
sales_scenario,
contract_register_type_code,
subtime,
sum(num_every) over(partition by sales_scenario,contract_register_type_code order by subtime) rk_combine
from
(
SELECT
sales_scenario,
contract_register_type_code,
-- ltc_contract_basic_info.create_time,
-- ltc_contract_assess_record.contract_assess_end_date,
DATE_PART('MINUTE', ltc_contract_assess_record.contract_assess_end_date - ltc_contract_basic_info.create_time) subtime,
1 num_every
FROM
ltc_contract_basic_info
LEFT JOIN ltc_contract_assess_basic_info
ON ltc_contract_basic_info.ltc_contract_basic_info_id = ltc_contract_assess_basic_info.ltc_contract_basic_info_id
LEFT JOIN ltc_contract_assess_record
ON ltc_contract_assess_basic_info.ltc_contract_assess_basic_info_id = ltc_contract_assess_record.ltc_contract_assess_basic_info_id
WHERE
ltc_contract_assess_record.contract_assess_end_date > now( ) - INTERVAL '90 days'
and
(sales_scenario,contract_register_type_code) in (('0101','03'),('0102','01'),('020102','03'))
) res_start
) res_middle
) res_end)
union all
(
SELECT
sales_scenario,
contract_register_type_code,
DATE_PART('MINUTE', ltc_contract_assess_record.contract_assess_end_date - ltc_contract_basic_info.create_time) subtime,
0 percent,
avg(DATE_PART('MINUTE', ltc_contract_assess_record.contract_assess_end_date - ltc_contract_basic_info.create_time)) over() default_value
FROM
ltc_contract_basic_info
LEFT JOIN ltc_contract_assess_basic_info
ON ltc_contract_basic_info.ltc_contract_basic_info_id = ltc_contract_assess_basic_info.ltc_contract_basic_info_id
LEFT JOIN ltc_contract_assess_record
ON ltc_contract_assess_basic_info.ltc_contract_assess_basic_info_id = ltc_contract_assess_record.ltc_contract_assess_basic_info_id
WHERE
ltc_contract_assess_record.contract_assess_end_date > now( ) - INTERVAL '90 days'
and
(
sales_scenario is null
or
contract_register_type_code is null
or
(sales_scenario,contract_register_type_code) not in (('0101','03'),('0102','01'),('020102','03')
)
)
)

三、步骤总结

1、计算组内累计值

sum(saleroom) over(partition by area order by date) ---求组内累计值

2、计算组内总计值/最大值

sum(saleroom) over(partition by area order by area)    ---求组内总计值

3、累计值/总计值

组内百分比= saleroom / total_value

累计百分比 = aggregate_value/total_value

四、总结

1、过程

用1表示每项的值

分组计算,按照subtime排序,得到累计值

求出最大的累计值,作为和

用各项累计值除以每一项的和,得到百分比

2、结果计算与返回

package com.boulderai.baseline.cal.service.impl;

import com.boulderai.baseline.cal.mq.MessageProducer;
import com.boulderai.baseline.cal.service.BaseLineCalService;
import cn.hutool.json.JSONUtil;
import com.boulderai.timeline.api.bigdata.BaseLineCalRequest;
import com.boulderai.timeline.api.bigdata.BaseLineMessage;
import org.apache.commons.lang3.StringUtils;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.stereotype.Service;
import org.springframework.util.CollectionUtils;
import lombok.extern.slf4j.Slf4j;
import javax.annotation.Resource;
import java.math.BigDecimal;
import java.util.*;
import java.util.concurrent.atomic.AtomicReference;
import java.util.stream.Collectors; /**
* @Title:BaseLineCalServiceImpl
* @Descript:
* @author: yanwei (yanwei@yanxxcloud.cn)
* @date:2022/8/3
**/
@Service
@Slf4j
public class BaseLineCalServiceImpl implements BaseLineCalService { @Resource
private MessageProducer messageProducer; @Resource
private JdbcTemplate jdbcTemplate; /**
* @author 刘金辉
* @param request
* @return 发送MQ是否成功
*/
@Override
public Boolean calculate(BaseLineCalRequest request) {
List<BaseLineMessage> messageList = new ArrayList<>();
request.getRules().forEach(rule -> {
String sql = request.computeSql(rule.getRuleType());
String[] union_sql_list = sql.split("union all");
String sql_in_scene_list = union_sql_list[0];
Integer[] dataRange = rule.getDataRange();
List<Map<String, Object>> resListMiddle = jdbcTemplate.queryForList(sql_in_scene_list);
//sceneKeysList为场景的集合,如[delivery_way_code,contract_register_type_code]
List<String> sceneKeysList = request.getScenes()
.getSceneKeys().stream()
.map(x -> x.getKey()).collect(Collectors.toList());
List<Map<String, Object>> resListUltimate = new ArrayList<>();
//inSceneListRes为查询到的拆分场景的结果集
List<Map<String, Object>> inSceneListRes = jdbcTemplate.queryForList(sql_in_scene_list).stream()
.filter(x -> Math.round(Double.valueOf(x.get("percent").toString())) >= dataRange[0]
&& Math.round(Double.valueOf(x.get("percent").toString())) <= dataRange[1])
.collect(Collectors.toList());
request.getScenes().getSceneGroups().forEach(sceneValueMap -> {
//sceneValueMap为每一个场景组合,如[delivery_way_code -> 01,contract_register_type_code -> 01,02]
Map<String, Object> resInMap = new HashMap<String, Object>(); //构建每个要插入的map
sceneValueMap.entrySet().forEach(everySceneCombine->{
resInMap.put(everySceneCombine.getKey(),everySceneCombine.getValue());
});
Double baselineValue = computeResult2New(sceneValueMap, sceneKeysList, inSceneListRes);
resInMap.put("value", baselineValue);
resListUltimate.add(resInMap);
});
int default_value = 0;
String sql_not_in_scene_list = union_sql_list[1];
List<Map<String, Object>> result = jdbcTemplate.queryForList(sql_not_in_scene_list);
if (!CollectionUtils.isEmpty(result)) {
default_value = Math.round(Float.parseFloat(result.get(0).get("default_value").toString()));
}
BaseLineMessage ansMsg = new BaseLineMessage();
ansMsg.setDefinition(request.getDefinition());
ansMsg.setRule(rule);
ansMsg.setValues(resListUltimate);
ansMsg.setDefaultValue(default_value);// 其他场景的平均值,如何确定
messageList.add(ansMsg);
});
log.info("cal:{}", JSONUtil.toJsonStr(request));
log.info("cal.return:{}", JSONUtil.toJsonStr(messageList));
messageProducer.SendCalMessageList(messageList);
return true;
} /**
* 场景组合拆分为子场景,去数据库查询
* sceneValueMap为每一个场景组合,如[delivery_way_code -> 01,contract_register_type_code -> 01,02, value=10]
* sceneKeysList为场景的集合,如[delivery_way_code,contract_register_type_code]
* inSceneListRes为查询到的拆分场景的结果集
* @param group
* @param sceneKeys
* @param result
* @return
*/
private Double computeResult2New(Map<String, String> sceneValueMap, List<String> sceneKeysList, List<Map<String, Object>> inSceneListResList) {
Double avgOfCombineSceneVal = 0.0;
List<String> splitSceneValList = Arrays.asList("");
String[] sceneKeysArray = String.join(",", sceneKeysList).split(",");
for (String sceneKey : sceneKeysArray) {
String assignKeyValue = String.valueOf(sceneValueMap.get(sceneKey));
String[] splitValueArray = assignKeyValue.split(","); //assignKeyValue=01,02 | splitValueArray=[01][02]
splitSceneValList = calgroup(splitValueArray, splitSceneValList); //splitSceneValList=[01][02]=>splitSceneValList=[01,01][01,02]
}
for (String everySceneCombineStr : splitSceneValList) { //splitSceneValList=[“01,01”,"01,02"] everySceneCombineStr="01,02"
List<Map<String, Object>> filterEverySceneCombineResList = inSceneListResList.stream().filter(qryResMap -> {
boolean fetched = true;
String[] everySceneStr = everySceneCombineStr.split(","); //["01", "02"]
for (int i = 0; i < everySceneStr.length; i++) {
fetched = everySceneStr[i].equals(String.valueOf(qryResMap.get(sceneKeysArray[i]))); //找到满足条件的子数据
if (!fetched) {
return false;
}
}
return fetched;
}).collect(Collectors.toList());//过滤得到满足每个场景组合的数据
Double avgRes = filterEverySceneCombineResList.stream().map(p -> Math.round(Double.valueOf(p.get("subtime").toString())))
.collect(Collectors.averagingLong(Long::longValue));
avgOfCombineSceneVal += avgRes;
}
return avgOfCombineSceneVal/splitSceneValList.size();
} /**
* @param group
* @param sceneKeys
* @param result
* @return
*/
private BigDecimal calResultBefore(Map<String, String> group, List<String> sceneKeys, List<Map<String, Object>> result) {
List<String> rarry = Arrays.asList("");
String[] arrKeys = String.join(",", sceneKeys).split(",");
for (String key : arrKeys) {
String v = String.valueOf(group.get(key));
String[] varr = v.split(",");
rarry = calgroup(varr, rarry);
}
//fetch data
List<Map<String, Object>> results = new ArrayList<>();
for (String r : rarry) { //,020102,03
List<Map<String, Object>> subList = result.stream().filter(rmap -> {
boolean fetched = true;
//String[] var = (String[]) Arrays.stream(r.split(",")).skip(0).collect(Collectors.toList()).toArray();
String[] var = r.split(","); //["", "020102", "03"]
for (int i = 0; i < var.length; i++) {
fetched = var[i].equals(String.valueOf(rmap.get(arrKeys[i])));
if (!fetched) {
return false;
}
}
return fetched;
}).collect(Collectors.toList());
results.addAll(subList);
}
//计算结果
return BigDecimal.ZERO;
} /**
* @param vs
* @param sub
* @return
*/
private List<String> calgroup(String[] vs, List<String> sub) {
List<String> ans = new ArrayList<>(); for (String v : vs) {
sub.stream().forEach(s -> {
if (StringUtils.isBlank(s)) {
ans.add(v);
} else {
ans.add(s + "," + v);
}
});
}
return ans;
} }

【SQL】窗口函数:求数据组内累计值和累计百分比的更多相关文章

  1. SQL ROW_NUMBER()实现取组内最新(最大)的数据

    SELECT * FROM(select ROW_NUMBER() over(partition BY sid order by cscore desc) as tid,sid,cname,cscor ...

  2. SQL分组求每组最大值问题的解决方法收集 (转载)

    例如有一个表student,其结构如下: id      name     sort      score 1        张三      语文      82 2        李四      数 ...

  3. sql server迁移数据(文件组之间的互相迁移与 文件组内文件的互相迁移)

    转自:https://www.cnblogs.com/lyhabc/p/3504380.html?utm_source=tuicool SQLSERVER将数据移到另一个文件组之后清空文件组并删除文件 ...

  4. [SQL]用于提取组内最新数据,左连接,内连接,not exist三种方案中,到底谁最快?

    本作代码下载:https://files.cnblogs.com/files/xiandedanteng/LeftInnerNotExist20191222.rar 人们总是喜欢给出或是得到一个简单明 ...

  5. 【HIVE高级笔试必备题型】(组内topN、相邻行的值比较问题)求语文大于数学_/_求文科大于理科成绩的学生

    Hive SQL练习之成绩分析 数据:[id, 学号,班级,科目,成绩] 1,1,1,yuwen,80 2,1,1,shuxue,85 3,2,1,yuwen,75 4,2,1,shuxue,70 5 ...

  6. sql查询技巧,按时间分段进行分组,每半小时一组统计组内记录数量

    今天拿到一个查询需求,需要统计某一天各个时间段内的记录数量. 具体是统计某天9:00至22:00时间段,每半小时内订单的数量,最后形成的数据形式如下: 时间段          订单数 9:00~9: ...

  7. 如何用SQL实现组内前几名的输出

    关于问题 如何查询组内最大的,最小的,大家或许都知道,无非是min.max的函数使用.可是如何在MySQL中查找组内最好的前两个,或者前三个? 什么是相关子查询 在提出对于这个问题的对应方法之前,首先 ...

  8. mssql sqlserver 使用sql脚本获取群组后,按时间排序(asc)第一条数据的方法分享

    摘要: 下文讲述使用sql脚本,获取群组后记录的第一条数据业务场景说明: 学校教务处要求统计: 每次作业,最早提交的学生名单下文通过举例的方式,记录此次脚本编写方法,方便以后备查,如下所示: 实现思路 ...

  9. 模拟QQ分组(具有伸缩功能) (添加开源框架的光闪烁效果)SimpleExpandableListAdapter 适配器的用法,并且可添加组及其组内数据。

    package com.lixu.qqfenzu; import java.util.ArrayList; import java.util.HashMap; import java.util.Lis ...

  10. sql 分组取每组的前n条或每组的n%(百分之n)的数据

    sql 分组取每组的前n条或每组的n%(百分之n)的数据 sql keyword: SELECT * ,ROW_NUMBER() OVER(partition by b.UserID order by ...

随机推荐

  1. Elastic:应用程序性能监控/管理(APM)实践

    在今天的文章里,我们将介绍Elastic的一个重要的应用:应用程序性能管理(Application Performance Monitoring/Management),简称APM.那么到底什么是AP ...

  2. 使用k8s部署springcloud解决三大问题

    1.正式环境使用的话启动时需要指定使用正式的配置文件,这个要咋处理? 解决办法 文章地址:https://www.cnblogs.com/sanduzxcvbnm/p/13262411.html 分析 ...

  3. Ubuntu20.04本地安装Redash中文版

    一.安装基础环境: # 1.更换APT国内源 sudo sed -i s@/cn.archive.ubuntu.com/@/mirrors.aliyun.com/@g /etc/apt/sources ...

  4. 打印 Logger 日志时,需不需要再封装一下工具类?

    在开发过程中,打印日志是必不可少的,因为日志关乎于应用的问题排查.应用监控等.现在打印日志一般都是使用 slf4j,因为使用日志门面,有助于打印方式统一,即使后面更换日志框架,也非常方便.在 < ...

  5. esp-idf 安装(Windows )

    esp32的开发有两种环境,分别是 Arduino 和 esp32-idf. Arduino 是在 esp32-idf 基础上进行封装的,虽然使用起来比较方便,但是能自由更改的就变少了,适合新手使用. ...

  6. C++运算符重载(简单易懂)

    转载:https://www.cnblogs.com/liuchenxu123/p/12538623.html 运算符重载,就是对已有的运算符重新进行定义,赋予其另一种功能,以适应不同的数据类型. 你 ...

  7. 【强烈推荐】用glob库的一行命令显著加速批量读取处理数据

    在我们气象领域,对数据进行批处理随处可见,尤其是在处理模式数据的时候.为了能让这个过程加速,很多大佬们提出了不同的方法,比如使用numba库进行计算.使用dask库进行并行等等,都是非常好的加速手段. ...

  8. 【linux】 第1回 linux运维基础

    目录 1. 运维的本质 2. 电脑与服务器 2.1 电脑的种类 2.2 服务器种类 2.3 服务器品牌 2.4 服务器尺寸 2.5 服务器内部组成 3. 磁盘阵列 4. 系统简介 5. 虚拟化 6. ...

  9. Aspose.Words 操作 Word 画 EChart 图

    使用 Aspose.Words 插件在 Word 画 EChart 图 使用此插件可以画出丰富的 EChart 图,API 参考 https://reference.aspose.com/words/ ...

  10. 9.异步redis

    在使用Python代码操作redis时候,连接.操作.断开都是网络IO #安装支持异步redis的模块 pip3 install aioredis async def execute(address, ...