记录一次 postgresql 优化案例( 嵌套循环改HASH JOIN )
今天同事给我一条5秒的SQL看看能不能优化。
表数据量:
select count(1) from AAAA
union all
select count(1) from XXXXX; count
---------
1000001
998000
(2 rows)
原始SQL:
SELECT A1.PK_DEPT, A1.ENABLESTATE
FROM AAAA A1
JOIN AAAA A2 ON A1.PK_DEPT = A2.PK_DEPT
WHERE ((A1.PK_GROUP = 'Group9' AND A1.PK_ORG IN ('Org9')))
AND (A1.PK_DEPT IN (SELECT T1.ORGID
FROM XXXXX T1
INNER JOIN (SELECT (CASE WHEN ORGID3 IS NULL THEN ORGID2 ELSE ORGID3 END) ORGID
FROM XXXXX
WHERE ORGID = 'Org108') T2
ON (T1.ORGID2 = T2.ORGID OR T1.ORGID3 = T2.ORGID)))
AND (A1.ENABLESTATE IN (2))
ORDER BY A1.PK_DEPT, A1.ENABLESTATE;
执行计划:
QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------
Sort (cost=16098.39..16098.40 rows=1 width=13) (actual time=5435.964..5454.953 rows=1000000 loops=1)
Sort Key: a1.pk_dept
Sort Method: quicksort Memory: 79264kB
-> Nested Loop Semi Join (cost=1039.46..16098.38 rows=1 width=13) (actual time=0.389..5338.781 rows=1000000 loops=1)
Join Filter: ((a1.pk_dept)::text = (t1.orgid)::text)
-> Gather (cost=1038.61..16089.43 rows=1 width=22) (actual time=0.368..55.998 rows=1000000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Join (cost=38.61..15089.33 rows=1 width=22) (actual time=0.246..49.481 rows=333333 loops=3)
Hash Cond: ((a2.pk_dept)::text = (a1.pk_dept)::text)
-> Parallel Seq Scan on aaaa a2 (cost=0.00..13491.33 rows=415833 width=9) (actual time=0.009..14.206 rows=332667 loops=3)
-> Hash (cost=38.60..38.60 rows=1 width=13) (actual time=0.193..0.195 rows=1000 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 51kB
-> Bitmap Heap Scan on aaaa a1 (cost=34.58..38.60 rows=1 width=13) (actual time=0.068..0.142 rows=1000 loops=3)
Recheck Cond: (((pk_org)::text = 'Org9'::text) AND ((pk_group)::text = 'Group9'::text))
Filter: (enablestate = 2)
Heap Blocks: exact=9
-> BitmapAnd (cost=34.58..34.58 rows=1 width=0) (actual time=0.062..0.063 rows=0 loops=3)
-> Bitmap Index Scan on idx_aaaa_pkorg (cost=0.00..17.17 rows=632 width=0) (actual time=0.031..0.031 rows=1000 loops=3)
Index Cond: ((pk_org)::text = 'Org9'::text)
-> Bitmap Index Scan on idx_aaaa_pkgroup (cost=0.00..17.17 rows=632 width=0) (actual time=0.030..0.030 rows=1000 loops=3)
Index Cond: ((pk_group)::text = 'Group9'::text)
-> Nested Loop (cost=0.85..8.94 rows=1 width=9) (actual time=0.005..0.005 rows=1 loops=1000000)
Join Filter: (((t1.orgid2)::text = (CASE WHEN (xxxxx.orgid3 IS NULL) THEN xxxxx.orgid2 ELSE xxxxx.orgid3 END)::text) OR ((t1.orgid3)::text = (CASE WHEN (xxxxx.orgid3 IS
NULL) THEN xxxxx.orgid2 ELSE xxxxx.orgid3 END)::text))
-> Index Scan using idx_xxxxx_orgid on xxxxx t1 (cost=0.42..0.49 rows=1 width=27) (actual time=0.003..0.003 rows=1 loops=1000000)
Index Cond: ((orgid)::text = (a2.pk_dept)::text)
-> Index Scan using idx_3_4 on xxxxx (cost=0.42..8.44 rows=1 width=18) (actual time=0.002..0.002 rows=1 loops=1000000)
Index Cond: ((orgid)::text = 'Org108'::text)
Planning Time: 0.326 ms
Execution Time: 5478.431 ms
(30 rows)
如果经常做优化的同学对于简单的SQL,相信可以使用瞪眼大法基本定位到语句慢的位置
AAAA、XXXXX 两张表都不算是小表,数据量在百万级别,在执行计划中,谓词都是有索引进行过滤的,
但是两张表关联以后却走了嵌套循环(Nested Loop),导致t1表和t2表关联后的内联视图作为被驱动表被干了1000000次,很明显这个执行计划是错误的。
最主要原因就是关联条件是or的逻辑条件。
可以通过等价改写来搞一下这条SQL,让 Nested Loop 改变成 hash join
等价改写SQL:
SELECT A1.PK_DEPT, A1.ENABLESTATE
FROM AAAA A1
JOIN AAAA A2 ON A1.PK_DEPT = A2.PK_DEPT
JOIN (SELECT T1.ORGID
FROM XXXXX T1
INNER JOIN (SELECT COALESCE(ORGID3, ORGID2) ORGID FROM XXXXX WHERE ORGID = 'Org108') T2
ON T1.ORGID2 = T2.ORGID
UNION
SELECT T1.ORGID
FROM XXXXX T1
INNER JOIN (SELECT COALESCE(ORGID3, ORGID2) ORGID FROM XXXXX WHERE ORGID = 'Org108') T2
ON T1.ORGID3 = T2.ORGID) X ON A1.PK_DEPT = X.ORGID
WHERE ((A1.PK_GROUP = 'Group9' AND A1.PK_ORG IN ('Org9')))
AND (A1.ENABLESTATE IN (2))
ORDER BY A1.PK_DEPT, A1.ENABLESTATE;
改写后执行计划:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=1072.44..16123.32 rows=1 width=13) (actual time=190.512..312.537 rows=1000000 loops=1)
Join Filter: ((a1.pk_dept)::text = (t1.orgid)::text)
Rows Removed by Join Filter: 3000000
-> Unique (cost=33.83..33.84 rows=2 width=516) (actual time=0.073..0.086 rows=4 loops=1)
-> Sort (cost=33.83..33.84 rows=2 width=516) (actual time=0.072..0.079 rows=5 loops=1)
Sort Key: t1.orgid
Sort Method: quicksort Memory: 25kB
-> Append (cost=0.85..33.82 rows=2 width=516) (actual time=0.037..0.068 rows=5 loops=1)
-> Nested Loop (cost=0.85..16.90 rows=1 width=9) (actual time=0.037..0.045 rows=2 loops=1)
-> Index Scan using idx_3_4 on xxxxx (cost=0.42..8.44 rows=1 width=18) (actual time=0.022..0.023 rows=2 loops=1)
Index Cond: ((orgid)::text = 'Org108'::text)
-> Index Scan using idx_xxxxx_orgid2 on xxxxx t1 (cost=0.42..8.44 rows=1 width=18) (actual time=0.009..0.009 rows=1 loops=2)
Index Cond: ((orgid2)::text = (COALESCE(xxxxx.orgid3, xxxxx.orgid2))::text)
-> Nested Loop (cost=0.85..16.90 rows=1 width=9) (actual time=0.014..0.021 rows=3 loops=1)
-> Index Scan using idx_3_4 on xxxxx xxxxx_1 (cost=0.42..8.44 rows=1 width=18) (actual time=0.003..0.003 rows=2 loops=1)
Index Cond: ((orgid)::text = 'Org108'::text)
-> Index Scan using idx_xxxxx_orgid3 on xxxxx t1_1 (cost=0.42..8.44 rows=1 width=18) (actual time=0.008..0.008 rows=2 loops=2)
Index Cond: ((orgid3)::text = (COALESCE(xxxxx_1.orgid3, xxxxx_1.orgid2))::text)
-> Materialize (cost=1038.61..16089.43 rows=1 width=22) (actual time=0.096..43.254 rows=1000000 loops=4)
-> Gather (cost=1038.61..16089.43 rows=1 width=22) (actual time=0.384..44.877 rows=1000000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Join (cost=38.61..15089.33 rows=1 width=22) (actual time=0.257..48.484 rows=333333 loops=3)
Hash Cond: ((a2.pk_dept)::text = (a1.pk_dept)::text)
-> Parallel Seq Scan on aaaa a2 (cost=0.00..13491.33 rows=415833 width=9) (actual time=0.009..14.053 rows=332667 loops=3)
-> Hash (cost=38.60..38.60 rows=1 width=13) (actual time=0.217..0.219 rows=1000 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 51kB
-> Bitmap Heap Scan on aaaa a1 (cost=34.58..38.60 rows=1 width=13) (actual time=0.085..0.160 rows=1000 loops=3)
Recheck Cond: (((pk_org)::text = 'Org9'::text) AND ((pk_group)::text = 'Group9'::text))
Filter: (enablestate = 2)
Heap Blocks: exact=9
-> BitmapAnd (cost=34.58..34.58 rows=1 width=0) (actual time=0.077..0.078 rows=0 loops=3)
-> Bitmap Index Scan on idx_aaaa_pkorg (cost=0.00..17.17 rows=632 width=0) (actual time=0.039..0.039 rows=1000 loops=3)
Index Cond: ((pk_org)::text = 'Org9'::text)
-> Bitmap Index Scan on idx_aaaa_pkgroup (cost=0.00..17.17 rows=632 width=0) (actual time=0.035..0.036 rows=1000 loops=3)
Index Cond: ((pk_group)::text = 'Group9'::text)
Planning Time: 0.236 ms
Execution Time: 337.656 ms
(38 rows)
差集比对
SELECT A1.PK_DEPT, A1.ENABLESTATE
FROM AAAA A1
JOIN AAAA A2 ON A1.PK_DEPT = A2.PK_DEPT
JOIN (SELECT T1.ORGID
FROM XXXXX T1
INNER JOIN (SELECT COALESCE(ORGID3, ORGID2) ORGID FROM XXXXX WHERE ORGID = 'Org108') T2
ON T1.ORGID2 = T2.ORGID
UNION
SELECT T1.ORGID
FROM XXXXX T1
INNER JOIN (SELECT COALESCE(ORGID3, ORGID2) ORGID FROM XXXXX WHERE ORGID = 'Org108') T2
ON T1.ORGID3 = T2.ORGID) X ON A1.PK_DEPT = X.ORGID
WHERE ((A1.PK_GROUP = 'Group9' AND A1.PK_ORG IN ('Org9')))
AND (A1.ENABLESTATE IN (2))
EXCEPT
SELECT A1.PK_DEPT, A1.ENABLESTATE
FROM AAAA A1
JOIN AAAA A2 ON A1.PK_DEPT = A2.PK_DEPT
WHERE ((A1.PK_GROUP = 'Group9' AND A1.PK_ORG IN ('Org9')))
AND (A1.PK_DEPT IN (SELECT T1.ORGID
FROM XXXXX T1
INNER JOIN (SELECT (CASE WHEN ORGID3 IS NULL THEN ORGID2 ELSE ORGID3 END) ORGID
FROM XXXXX
WHERE ORGID = 'Org108') T2
ON (T1.ORGID2 = T2.ORGID OR T1.ORGID3 = T2.ORGID))
)
AND (A1.ENABLESTATE IN (2)); pk_dept | enablestate
---------+-------------
(0 rows) Time: 5740.419 ms (00:05.740)
可以看到改写完以后,A1和A2表已经被物化,t1 内联视图作为一个整体和A1和A2进行关联,SQL执行时间也从5S降到337ms就能出结果。
通过差集比对,两条SQL是等价的,本次案例的SQL优化已完成
记录一次 postgresql 优化案例( 嵌套循环改HASH JOIN )的更多相关文章
- PostgreSQL 涉及复杂视图查询的优化案例
一.前言 对于含有union , group by 等的视图,我们称之为复杂视图. 这类的视图会影响优化器对于视图的提升,也就是视图无法与父查询进行合并,从而影响访问路径.连接方法.连接顺序等.本文通 ...
- mysql优化案例
MySQL优化案例 Mysql5.1大表分区效率测试 Mysql5.1大表分区效率测试MySQL | add at 2009-03-27 12:29:31 by PConline | view:60, ...
- Hive优化案例
1.Hadoop计算框架的特点 数据量大不是问题,数据倾斜是个问题. jobs数比较多的作业效率相对比较低,比如即使有几百万的表,如果多次关联多次汇总,产生十几个jobs,耗时很长.原因是map re ...
- 记一次mysql多表查询(left jion)优化案例
一次mysql多表查询(left jion)优化案例 在新上线的供需模块中,发现某一个查询按钮点击后,出不来结果,找到该按钮对应sql手动执行,发现需要20-30秒才能出结果,所以服务端程序判断超时, ...
- 数据库优化案例——————某市中心医院HIS系统
记得在自己学习数据库知识的时候特别喜欢看案例,因为优化的手段是容易掌握的,但是整体的优化思想是很难学会的.这也是为什么自己特别喜欢看案例,今天也开始分享自己做的优化案例. 最近一直很忙,博客产出也少的 ...
- SQL 优化案例 1
create or replace procedure SP_GET_NEWEST_CAPTCHA( v_ACCOUNT_ID in VARCHAR2, --接收短信的手机号 v_Tail_num i ...
- 数据库优化案例——————某知名零售企业ERP系统
写在前面 记得在自己学习数据库知识的时候特别喜欢看案例,因为优化的手段是容易掌握的,但是整体的优化思想是很难学会的.这也是为什么自己特别喜欢看案例,今天也分享自己做的优化案例. 之前分享过OA系统.H ...
- MySQL参数优化案例
环境介绍 优化层级与指导思想 优化过程 最小化安装情况下的性能表现 优化innodb_buffer_pool_size 优化innodb_log_files_in_group&innodb_l ...
- SQL 优化案例
create or replace procedure SP_GET_NEWEST_CAPTCHA( v_ACCOUNT_ID in VARCHAR2, --接收短信的手机号 v_Tail_num i ...
- 老李案例分享:Weblogic性能优化案例
老李案例分享:Weblogic性能优化案例 POPTEST的测试技术交流qq群:450192312 网站应用首页大小在130K左右,在之前的测试过程中,其百用户并发的平均响应能力在6.5秒,性能优化后 ...
随机推荐
- TensorRT 模型加密杂谈
在大多数项目交付场景中,经常需要对部署模型进行加密.模型加密一方面可以防止泄密,一方面可以便于模型跟踪管理,防止混淆. 由于博主使用的部署模型多为TensorRT格式,这里以TensorRT模型为例, ...
- SpringBoot3集成Quartz
目录 一.简介 二.工程搭建 1.工程结构 2.依赖管理 3.数据库 4.配置文件 三.Quartz用法 1.初始化加载 2.新增任务 3.更新任务 4.暂停任务 5.恢复任务 6.执行一次 7.删除 ...
- Pandas 使用教程 Series、DataFrame
目录 Series (一维数据) 指定索引值 使用 key/value 对象,创建对象 设置 Series 名称参数 DataFrame(二维数据) 使用字典(key/value)创建 loc 属性返 ...
- AI绘画Stable Diffusion实战操作: 62个咒语调教-时尚杂志封面
今天来给大家分享,如何用sd简单的咒语输出好看的图片的教程,今天做的是时尚杂志专题,话不多说直入主题. 还不会StableDiffusion的基本操作,推荐看看这篇保姆级教程: AI绘画:Stable ...
- 在移动硬盘上安装Win11系统(不使用工具)
一.准备镜像文件 1.前往官网下载Win11镜像文件. Win11官网:Download Windows 11 (microsoft.com) 2.装载Win11镜像 找到Win11镜像.右键点击装载 ...
- 《Web安全基础》03. SQL 注入
@ 目录 1:简要 SQL 注入 2:MySQL 注入 2.1:信息获取 2.2:跨库攻击 2.3:文件读写 2.4:常见防护 3:注入方法 3.1:类型方法明确 3.2:盲注 3.3:编码 3.4: ...
- Python+Flask设置接口开机自启动
Windows系统适用 创建一个批处理文件(例如 start_flask_api.bat),内容如下: @echo off cd /d C:\path\to\your\flask\app //你要启动 ...
- Go学习笔记1
学习路线 2023-Go全链路工程师课纲 https://www.processon.com/view/link/63594cd97d9c0854f9ac855e 一.搭建环境 https://stu ...
- 搭建Minio分布式服务
本文主要介绍Minio的分布式环境搭建,安装比较简单,因博主只有一台window,所以使用VM虚拟机搭建的. 搭建前可以先了解下minio: 1.官方文档:https://docs.min.io/cn ...
- Java 中的日期时间总结
前言 大家好,我是 god23bin,在日常开发中,我们经常需要处理日期和时间,日期和时间可以说是一定会用到的,现在总结下 Java 中日期与时间的基本概念与一些常用的用法. 基本概念 日期(年月日, ...