今天同事给我一条5秒的SQL看看能不能优化。

表数据量:

select count(1) from AAAA
union all
select count(1) from XXXXX; count
---------
1000001
998000
(2 rows)

原始SQL:

SELECT A1.PK_DEPT, A1.ENABLESTATE
FROM AAAA A1
JOIN AAAA A2 ON A1.PK_DEPT = A2.PK_DEPT
WHERE ((A1.PK_GROUP = 'Group9' AND A1.PK_ORG IN ('Org9')))
AND (A1.PK_DEPT IN (SELECT T1.ORGID
FROM XXXXX T1
INNER JOIN (SELECT (CASE WHEN ORGID3 IS NULL THEN ORGID2 ELSE ORGID3 END) ORGID
FROM XXXXX
WHERE ORGID = 'Org108') T2
ON (T1.ORGID2 = T2.ORGID OR T1.ORGID3 = T2.ORGID)))
AND (A1.ENABLESTATE IN (2))
ORDER BY A1.PK_DEPT, A1.ENABLESTATE;

执行计划:

                                                                                                                  QUERY PLAN                                                           

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------
Sort (cost=16098.39..16098.40 rows=1 width=13) (actual time=5435.964..5454.953 rows=1000000 loops=1)
Sort Key: a1.pk_dept
Sort Method: quicksort Memory: 79264kB
-> Nested Loop Semi Join (cost=1039.46..16098.38 rows=1 width=13) (actual time=0.389..5338.781 rows=1000000 loops=1)
Join Filter: ((a1.pk_dept)::text = (t1.orgid)::text)
-> Gather (cost=1038.61..16089.43 rows=1 width=22) (actual time=0.368..55.998 rows=1000000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Join (cost=38.61..15089.33 rows=1 width=22) (actual time=0.246..49.481 rows=333333 loops=3)
Hash Cond: ((a2.pk_dept)::text = (a1.pk_dept)::text)
-> Parallel Seq Scan on aaaa a2 (cost=0.00..13491.33 rows=415833 width=9) (actual time=0.009..14.206 rows=332667 loops=3)
-> Hash (cost=38.60..38.60 rows=1 width=13) (actual time=0.193..0.195 rows=1000 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 51kB
-> Bitmap Heap Scan on aaaa a1 (cost=34.58..38.60 rows=1 width=13) (actual time=0.068..0.142 rows=1000 loops=3)
Recheck Cond: (((pk_org)::text = 'Org9'::text) AND ((pk_group)::text = 'Group9'::text))
Filter: (enablestate = 2)
Heap Blocks: exact=9
-> BitmapAnd (cost=34.58..34.58 rows=1 width=0) (actual time=0.062..0.063 rows=0 loops=3)
-> Bitmap Index Scan on idx_aaaa_pkorg (cost=0.00..17.17 rows=632 width=0) (actual time=0.031..0.031 rows=1000 loops=3)
Index Cond: ((pk_org)::text = 'Org9'::text)
-> Bitmap Index Scan on idx_aaaa_pkgroup (cost=0.00..17.17 rows=632 width=0) (actual time=0.030..0.030 rows=1000 loops=3)
Index Cond: ((pk_group)::text = 'Group9'::text)
-> Nested Loop (cost=0.85..8.94 rows=1 width=9) (actual time=0.005..0.005 rows=1 loops=1000000)
Join Filter: (((t1.orgid2)::text = (CASE WHEN (xxxxx.orgid3 IS NULL) THEN xxxxx.orgid2 ELSE xxxxx.orgid3 END)::text) OR ((t1.orgid3)::text = (CASE WHEN (xxxxx.orgid3 IS
NULL) THEN xxxxx.orgid2 ELSE xxxxx.orgid3 END)::text))
-> Index Scan using idx_xxxxx_orgid on xxxxx t1 (cost=0.42..0.49 rows=1 width=27) (actual time=0.003..0.003 rows=1 loops=1000000)
Index Cond: ((orgid)::text = (a2.pk_dept)::text)
-> Index Scan using idx_3_4 on xxxxx (cost=0.42..8.44 rows=1 width=18) (actual time=0.002..0.002 rows=1 loops=1000000)
Index Cond: ((orgid)::text = 'Org108'::text)
Planning Time: 0.326 ms
Execution Time: 5478.431 ms
(30 rows)
如果经常做优化的同学对于简单的SQL,相信可以使用瞪眼大法基本定位到语句慢的位置 
AAAA、XXXXX 两张表都不算是小表,数据量在百万级别,在执行计划中,谓词都是有索引进行过滤的,
但是两张表关联以后却走了嵌套循环(Nested Loop),导致t1表和t2表关联后的内联视图作为被驱动表被干了1000000次,很明显这个执行计划是错误的。
最主要原因就是关联条件是or的逻辑条件。

可以通过等价改写来搞一下这条SQL,让
Nested Loop 改变成 hash join
等价改写SQL:
SELECT A1.PK_DEPT, A1.ENABLESTATE
FROM AAAA A1
JOIN AAAA A2 ON A1.PK_DEPT = A2.PK_DEPT
JOIN (SELECT T1.ORGID
FROM XXXXX T1
INNER JOIN (SELECT COALESCE(ORGID3, ORGID2) ORGID FROM XXXXX WHERE ORGID = 'Org108') T2
ON T1.ORGID2 = T2.ORGID
UNION
SELECT T1.ORGID
FROM XXXXX T1
INNER JOIN (SELECT COALESCE(ORGID3, ORGID2) ORGID FROM XXXXX WHERE ORGID = 'Org108') T2
ON T1.ORGID3 = T2.ORGID) X ON A1.PK_DEPT = X.ORGID
WHERE ((A1.PK_GROUP = 'Group9' AND A1.PK_ORG IN ('Org9')))
AND (A1.ENABLESTATE IN (2))
ORDER BY A1.PK_DEPT, A1.ENABLESTATE;
改写后执行计划:
                                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=1072.44..16123.32 rows=1 width=13) (actual time=190.512..312.537 rows=1000000 loops=1)
Join Filter: ((a1.pk_dept)::text = (t1.orgid)::text)
Rows Removed by Join Filter: 3000000
-> Unique (cost=33.83..33.84 rows=2 width=516) (actual time=0.073..0.086 rows=4 loops=1)
-> Sort (cost=33.83..33.84 rows=2 width=516) (actual time=0.072..0.079 rows=5 loops=1)
Sort Key: t1.orgid
Sort Method: quicksort Memory: 25kB
-> Append (cost=0.85..33.82 rows=2 width=516) (actual time=0.037..0.068 rows=5 loops=1)
-> Nested Loop (cost=0.85..16.90 rows=1 width=9) (actual time=0.037..0.045 rows=2 loops=1)
-> Index Scan using idx_3_4 on xxxxx (cost=0.42..8.44 rows=1 width=18) (actual time=0.022..0.023 rows=2 loops=1)
Index Cond: ((orgid)::text = 'Org108'::text)
-> Index Scan using idx_xxxxx_orgid2 on xxxxx t1 (cost=0.42..8.44 rows=1 width=18) (actual time=0.009..0.009 rows=1 loops=2)
Index Cond: ((orgid2)::text = (COALESCE(xxxxx.orgid3, xxxxx.orgid2))::text)
-> Nested Loop (cost=0.85..16.90 rows=1 width=9) (actual time=0.014..0.021 rows=3 loops=1)
-> Index Scan using idx_3_4 on xxxxx xxxxx_1 (cost=0.42..8.44 rows=1 width=18) (actual time=0.003..0.003 rows=2 loops=1)
Index Cond: ((orgid)::text = 'Org108'::text)
-> Index Scan using idx_xxxxx_orgid3 on xxxxx t1_1 (cost=0.42..8.44 rows=1 width=18) (actual time=0.008..0.008 rows=2 loops=2)
Index Cond: ((orgid3)::text = (COALESCE(xxxxx_1.orgid3, xxxxx_1.orgid2))::text)
-> Materialize (cost=1038.61..16089.43 rows=1 width=22) (actual time=0.096..43.254 rows=1000000 loops=4)
-> Gather (cost=1038.61..16089.43 rows=1 width=22) (actual time=0.384..44.877 rows=1000000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Join (cost=38.61..15089.33 rows=1 width=22) (actual time=0.257..48.484 rows=333333 loops=3)
Hash Cond: ((a2.pk_dept)::text = (a1.pk_dept)::text)
-> Parallel Seq Scan on aaaa a2 (cost=0.00..13491.33 rows=415833 width=9) (actual time=0.009..14.053 rows=332667 loops=3)
-> Hash (cost=38.60..38.60 rows=1 width=13) (actual time=0.217..0.219 rows=1000 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 51kB
-> Bitmap Heap Scan on aaaa a1 (cost=34.58..38.60 rows=1 width=13) (actual time=0.085..0.160 rows=1000 loops=3)
Recheck Cond: (((pk_org)::text = 'Org9'::text) AND ((pk_group)::text = 'Group9'::text))
Filter: (enablestate = 2)
Heap Blocks: exact=9
-> BitmapAnd (cost=34.58..34.58 rows=1 width=0) (actual time=0.077..0.078 rows=0 loops=3)
-> Bitmap Index Scan on idx_aaaa_pkorg (cost=0.00..17.17 rows=632 width=0) (actual time=0.039..0.039 rows=1000 loops=3)
Index Cond: ((pk_org)::text = 'Org9'::text)
-> Bitmap Index Scan on idx_aaaa_pkgroup (cost=0.00..17.17 rows=632 width=0) (actual time=0.035..0.036 rows=1000 loops=3)
Index Cond: ((pk_group)::text = 'Group9'::text)
Planning Time: 0.236 ms
Execution Time: 337.656 ms
(38 rows)

差集比对

SELECT A1.PK_DEPT, A1.ENABLESTATE
FROM AAAA A1
JOIN AAAA A2 ON A1.PK_DEPT = A2.PK_DEPT
JOIN (SELECT T1.ORGID
FROM XXXXX T1
INNER JOIN (SELECT COALESCE(ORGID3, ORGID2) ORGID FROM XXXXX WHERE ORGID = 'Org108') T2
ON T1.ORGID2 = T2.ORGID
UNION
SELECT T1.ORGID
FROM XXXXX T1
INNER JOIN (SELECT COALESCE(ORGID3, ORGID2) ORGID FROM XXXXX WHERE ORGID = 'Org108') T2
ON T1.ORGID3 = T2.ORGID) X ON A1.PK_DEPT = X.ORGID
WHERE ((A1.PK_GROUP = 'Group9' AND A1.PK_ORG IN ('Org9')))
AND (A1.ENABLESTATE IN (2))
EXCEPT
SELECT A1.PK_DEPT, A1.ENABLESTATE
FROM AAAA A1
JOIN AAAA A2 ON A1.PK_DEPT = A2.PK_DEPT
WHERE ((A1.PK_GROUP = 'Group9' AND A1.PK_ORG IN ('Org9')))
AND (A1.PK_DEPT IN (SELECT T1.ORGID
FROM XXXXX T1
INNER JOIN (SELECT (CASE WHEN ORGID3 IS NULL THEN ORGID2 ELSE ORGID3 END) ORGID
FROM XXXXX
WHERE ORGID = 'Org108') T2
ON (T1.ORGID2 = T2.ORGID OR T1.ORGID3 = T2.ORGID))
)
AND (A1.ENABLESTATE IN (2)); pk_dept | enablestate
---------+-------------
(0 rows) Time: 5740.419 ms (00:05.740)

可以看到改写完以后,A1和A2表已经被物化,t1 内联视图作为一个整体和A1和A2进行关联,SQL执行时间也从5S降到337ms就能出结果。

通过差集比对,两条SQL是等价的,本次案例的SQL优化已完成

 
 

记录一次 postgresql 优化案例( 嵌套循环改HASH JOIN )的更多相关文章

  1. PostgreSQL 涉及复杂视图查询的优化案例

    一.前言 对于含有union , group by 等的视图,我们称之为复杂视图. 这类的视图会影响优化器对于视图的提升,也就是视图无法与父查询进行合并,从而影响访问路径.连接方法.连接顺序等.本文通 ...

  2. mysql优化案例

    MySQL优化案例 Mysql5.1大表分区效率测试 Mysql5.1大表分区效率测试MySQL | add at 2009-03-27 12:29:31 by PConline | view:60, ...

  3. Hive优化案例

    1.Hadoop计算框架的特点 数据量大不是问题,数据倾斜是个问题. jobs数比较多的作业效率相对比较低,比如即使有几百万的表,如果多次关联多次汇总,产生十几个jobs,耗时很长.原因是map re ...

  4. 记一次mysql多表查询(left jion)优化案例

    一次mysql多表查询(left jion)优化案例 在新上线的供需模块中,发现某一个查询按钮点击后,出不来结果,找到该按钮对应sql手动执行,发现需要20-30秒才能出结果,所以服务端程序判断超时, ...

  5. 数据库优化案例——————某市中心医院HIS系统

    记得在自己学习数据库知识的时候特别喜欢看案例,因为优化的手段是容易掌握的,但是整体的优化思想是很难学会的.这也是为什么自己特别喜欢看案例,今天也开始分享自己做的优化案例. 最近一直很忙,博客产出也少的 ...

  6. SQL 优化案例 1

    create or replace procedure SP_GET_NEWEST_CAPTCHA( v_ACCOUNT_ID in VARCHAR2, --接收短信的手机号 v_Tail_num i ...

  7. 数据库优化案例——————某知名零售企业ERP系统

    写在前面 记得在自己学习数据库知识的时候特别喜欢看案例,因为优化的手段是容易掌握的,但是整体的优化思想是很难学会的.这也是为什么自己特别喜欢看案例,今天也分享自己做的优化案例. 之前分享过OA系统.H ...

  8. MySQL参数优化案例

    环境介绍 优化层级与指导思想 优化过程 最小化安装情况下的性能表现 优化innodb_buffer_pool_size 优化innodb_log_files_in_group&innodb_l ...

  9. SQL 优化案例

    create or replace procedure SP_GET_NEWEST_CAPTCHA( v_ACCOUNT_ID in VARCHAR2, --接收短信的手机号 v_Tail_num i ...

  10. 老李案例分享:Weblogic性能优化案例

    老李案例分享:Weblogic性能优化案例 POPTEST的测试技术交流qq群:450192312 网站应用首页大小在130K左右,在之前的测试过程中,其百用户并发的平均响应能力在6.5秒,性能优化后 ...

随机推荐

  1. Vue Cli起别名

    vue cli 3的写法 module.exports = { configureWebpack: { resolve:{ extensions:[], alias:{ 'assets':'@/ass ...

  2. 利用选项卡提高Visual Studio 2022开发效率

    设计器作为软件开发的必要工具,其效率的提高显得尤为重要.Visual Studio 2022作为一款功能强大的设计器,通过选项卡提高了工作效率,让开发者在使用过程中更加便捷. 在Visual Stud ...

  3. 使用supervisor守护Prometheus进程

    使用supervisor守护Prometheus进程 目录 使用supervisor守护Prometheus进程 安装supervisor 安装Prometheus监控系统 配置supervisor ...

  4. Linux校验文件MD5和SHA值的方法

    1.需求背景 下载或传输文件后,需要计算文件的MD5.SHA256等校验值,以确保下载或传输后的文件和源文件一致 2.校验方法 如上图所示,可以使用Linux自带的校验命令来计算一个文件的校验值 Li ...

  5. Xshell7 / Xftp7 永久免费,官网直连下载地址

    主要目的是让大家随时随地从官网下载Xshell和Xftp免费版(个人/家庭/学校免费) 最新变动:官方目前仅提供最新版以及上一个版本的软件下载!其他版本不提供下载 免费版5版本(最后一个版本,无任何限 ...

  6. Python脚本批量造数据、跑定时任务协助测试

    批量造数据 连接Mysql的信息 1 import pymysql 2 # 数据库连接信息 3 # 多个库要有多个conn 4 conn = pymysql.connect( 5 host=" ...

  7. 图解Spark排序算子sortBy的核心源码

    原创/朱季谦 一.案例说明 以前刚开始学习Spark的时候,在练习排序算子sortBy的时候,曾发现一个有趣的现象是,在使用排序算子sortBy后直接打印的话,发现打印的结果是乱序的,并没有出现完整排 ...

  8. Solution Set -「CF 1539」

    我是傻逼. 「CF 1539A」Contest Start Link. 答案是 \(\sum_{i=1}^{n-1}\min\{i,\lfloor\frac{t}{x}\rfloor\}\),等差数列 ...

  9. 对某个接口进行限流 以 Aop 注解的形式绑定接口 用redis实现

    简单的针对某个接口进行限流,如果需要整体限流的话还是建议在网关上面或者服务器上面动手Controller: @LimitRequest(count = 1,time = 60 * 1000 * 2) ...

  10. 如何选择适合你的HTAP数据库?

    最近,在数据库行业对HTAP(混合事务/分析处理,Hybrid Transactional/Analytical Processing)这个概念宣传的非常火爆,也衍生出 Real-Time HTAP的 ...