今天遇到一个性能问题,再调优过程中发现耗时最久的计划是exist 部分涉及的三个表。

然后计划用left join 来替换exist,然后查询了很多资料,大部分都说exist和left join 性能差不多。 为了验证这一结论进行了如下实验

步骤如下

1、创建测试表

drop table app_family;

CREATE TABLE app_family (

"family_id" character varying(32 char) NOT NULL,

"application_id" character varying(32 char) NULL,

"family_number" character varying(50 char) ,

"household_register_number" character varying(50 char),

"poverty_reason" character varying(32 char),

CONSTRAINT "pk_app_family_idpk" PRIMARY KEY (family_id));

insert into app_family select generate_series(1,1000000),generate_series(1,1000000),'aaaa','aaa','bbb' from dual ;

create table app_family2 as select * from app_family;

create table app_memeber as select * from app_family;

2、验证两张表join和exist 性能对比

语句1、两张表exist

explain analyze select a1.application_id,a1.family_id from app_family a1 where

a1.family_id >1000 and

EXISTS(

SELECT

1

FROM

app_family2 a2

WHERE

a2.application_id=a1.application_id

and a2.family_id > 500000

)

总计用时646.203 ms

 ----------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=16927.11..44466.84 rows=111111 width=12) (actual time=354.314..621.714 rows=500000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Semi Join (cost=15927.11..32355.74 rows=46296 width=12) (actual time=355.657..512.049 rows=166667 loops=3)
Hash Cond: ((a1.application_id)::text = (a2.application_id)::text)
-> Parallel Seq Scan on app_family a1 (cost=0.00..13648.00 rows=138889 width=12) (actual time=0.222..111.618 rows=333000 loops=3)
Filter: ((family_id)::integer > 1000)
Rows Removed by Filter: 333
-> Parallel Hash (cost=13648.00..13648.00 rows=138889 width=6) (actual time=149.203..149.204 rows=166667 loops=3)
Buckets: 131072 Batches: 8 Memory Usage: 3520kB
-> Parallel Seq Scan on app_family2 a2 (cost=0.00..13648.00 rows=138889 width=6) (actual time=48.576..109.251 rows=166667 loops=3)
Filter: ((family_id)::integer > 500000)
Rows Removed by Filter: 166667
Planning Time: 0.145 ms
Execution Time: 645.095 ms
(15 rows) Time: 646.203 ms
kingbase=#

语句2 两张表join

explain analyze select a1.application_id,a1.family_id from app_family a1 LEFT JOIN app_family2 a2 ON a2.application_id=a1.application_id

WHERE a1.family_id >1000 AND a2.family_id > 500000

总计执行时间624.211 ms

---------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=16927.11..44300.95 rows=111111 width=12) (actual time=349.752..601.304 rows=500000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Join (cost=15927.11..32189.85 rows=46296 width=12) (actual time=337.548..508.139 rows=166667 loops=3)
Hash Cond: ((a1.application_id)::text = (a2.application_id)::text)
-> Parallel Seq Scan on app_family a1 (cost=0.00..13648.00 rows=138889 width=12) (actual time=0.087..111.949 rows=333000 loops=3)
Filter: ((family_id)::integer > 1000)
Rows Removed by Filter: 333
-> Parallel Hash (cost=13648.00..13648.00 rows=138889 width=6) (actual time=131.718..131.719 rows=166667 loops=3)
Buckets: 131072 Batches: 8 Memory Usage: 3488kB
-> Parallel Seq Scan on app_family2 a2 (cost=0.00..13648.00 rows=138889 width=6) (actual time=31.730..90.917 rows=166667 loops=3)
Filter: ((family_id)::integer > 500000)
Rows Removed by Filter: 166667
Planning Time: 0.093 ms
Execution Time: 623.465 ms
(15 rows) Time: 624.211 ms

两张表场景总结

针对两张表的对比可以发现join还相对满了10几ms但是总的来说两边 差异不大。所以再两张表的关联情况下 join和exist 性能相近。

3、验证3张表join和exist 性能对比

语句1 三张表exist

本场景最开始执行时 exit 用户6 s多,原因时用到了内存排序,后来调整了work_mem 排除了内存排序的影响,最终执行时间

2911.146 ms

explain analyze select a1.application_id,a1.family_id from app_family a1 ,app_family2 a2 where

a1.family_id >1000 and a2.family_id < 900000 and

EXISTS(

SELECT

1

FROM

app_memeber m

WHERE

m.application_id=a1.application_id

and m.family_id=a2.family_id

)

------------------------------------------------------------------------------------------------------------------------------------------------------
--
Gather (cost=61282.11..88664.67 rows=111111 width=12) (actual time=2112.079..2847.233 rows=898999 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Join (cost=60282.11..76553.57 rows=46296 width=12) (actual time=2119.345..2705.935 rows=299666 loops=3)
Hash Cond: ((m.family_id)::text = (a2.family_id)::text)
-> Hash Join (cost=44898.00..60455.72 rows=138889 width=18) (actual time=1885.923..2264.850 rows=333000 loops=3)
Hash Cond: ((a1.application_id)::text = (m.application_id)::text)
-> Parallel Seq Scan on app_family a1 (cost=0.00..13648.00 rows=138889 width=12) (actual time=0.091..109.196 rows=333000 loops=3)
Filter: ((family_id)::integer > 1000)
Rows Removed by Filter: 333
-> Hash (cost=32398.00..32398.00 rows=1000000 width=12) (actual time=1880.027..1880.028 rows=1000000 loops=3)
Buckets: 1048576 Batches: 1 Memory Usage: 52897kB
-> HashAggregate (cost=22398.00..32398.00 rows=1000000 width=12) (actual time=957.973..1382.683 rows=1000000 loops=3)
Group Key: (m.application_id)::text, (m.family_id)::text
-> Seq Scan on app_memeber m (cost=0.00..17398.00 rows=1000000 width=12) (actual time=0.047..247.902 rows=1000000 loops=3
)
-> Parallel Hash (cost=13648.00..13648.00 rows=138889 width=6) (actual time=231.705..231.706 rows=300000 loops=3)
Buckets: 1048576 (originally 524288) Batches: 1 (originally 1) Memory Usage: 47552kB
-> Parallel Seq Scan on app_family2 a2 (cost=0.00..13648.00 rows=138889 width=6) (actual time=0.039..100.756 rows=300000 loops=3)
Filter: ((family_id)::integer < 900000)
Rows Removed by Filter: 33334
Planning Time: 0.359 ms
Execution Time: 2911.146 ms
(22 rows)

语句2 三张表join

为了保证语句的一致性,三张表的join顺序保持和语句1的执行计划中的顺序一致,join总计用时1476.651 ms

explain analyze select a1.application_id,a1.family_id from app_family a1

left join app_memeber m on a1.application_id = m.application_id LEFT JOIN app_family2 a2 ON m.family_id = a2.family_id

WHERE a1.family_id >1000 AND a2.family_id < 900000

 Gather  (cost=32990.22..64898.93 rows=111111 width=12) (actual time=993.681..1436.895 rows=898999 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Join (cost=31990.22..52787.83 rows=46296 width=12) (actual time=982.512..1241.385 rows=299666 loops=3)
Hash Cond: ((m.application_id)::text = (a1.application_id)::text)
-> Parallel Hash Join (cost=15927.11..34245.98 rows=138889 width=6) (actual time=377.411..635.945 rows=300000 loops=3)
Hash Cond: ((m.family_id)::text = (a2.family_id)::text)
-> Parallel Seq Scan on app_memeber m (cost=0.00..11564.67 rows=416667 width=12) (actual time=0.034..59.470 rows=333333 loops=3)
-> Parallel Hash (cost=13648.00..13648.00 rows=138889 width=6) (actual time=232.286..232.287 rows=300000 loops=3)
Buckets: 131072 (originally 131072) Batches: 16 (originally 8) Memory Usage: 3296kB
-> Parallel Seq Scan on app_family2 a2 (cost=0.00..13648.00 rows=138889 width=6) (actual time=0.030..104.370 rows=300000 loops=
3)
Filter: ((family_id)::integer < 900000)
Rows Removed by Filter: 33334
-> Parallel Hash (cost=13648.00..13648.00 rows=138889 width=12) (actual time=271.185..271.185 rows=333000 loops=3)
Buckets: 131072 (originally 131072) Batches: 16 (originally 8) Memory Usage: 4032kB
-> Parallel Seq Scan on app_family a1 (cost=0.00..13648.00 rows=138889 width=12) (actual time=0.091..129.188 rows=333000 loops=3)
Filter: ((family_id)::integer > 1000)
Rows Removed by Filter: 333
Planning Time: 0.140 ms
Execution Time: 1475.305 ms
(20 rows) Time: 1476.651 ms (00:01.477)

总结三张表场景

在三张表的场景下exist用时2911.146 ms ,join用时1476.651 ms 可见 join的顺序明显优于exist。

在三张表的场景下可以看到,针对中间表appmember扫描时, exist语句用到HashAggregate 并做了 Group Key,所以导致exist 执行时间增加。如果work_mem 配置不合适时间会更长。

exist和left join 性能对比的更多相关文章

  1. Go 字符串连接+=与strings.Join性能对比

    Go字符串连接 对于字符串的连接大致有两种方式: 1.通过+号连接 func StrPlus1(a []string) string { var s, sep string for i := 0; i ...

  2. 自己写的轻量级PHP框架trig与laravel5.1,yii2性能对比

    看了下当前最热门的php开发框架,想对比一下自己写的框架与这些框架的性能对比.先看下当前流行框架的投票情况. 看结果对比,每个测试脚本做了一个数据库的联表查询并进行print_r输出,查询的sql语句 ...

  3. SQL点滴10—使用with语句来写一个稍微复杂sql语句,附加和子查询的性能对比

    原文:SQL点滴10-使用with语句来写一个稍微复杂sql语句,附加和子查询的性能对比 今天偶尔看到sql中也有with关键字,好歹也写了几年的sql语句,居然第一次接触,无知啊.看了一位博主的文章 ...

  4. python3下multiprocessing、threading和gevent性能对比----暨进程池、线程池和协程池性能对比

    python3下multiprocessing.threading和gevent性能对比----暨进程池.线程池和协程池性能对比   标签: python3 / 线程池 / multiprocessi ...

  5. SQL Server-聚焦IN VS EXISTS VS JOIN性能分析(十九)

    前言 本节我们开始讲讲这一系列性能比较的终极篇IN VS EXISTS VS JOIN的性能分析,前面系列有人一直在说场景不够,这里我们结合查询索引列.非索引列.查询小表.查询大表来综合分析,简短的内 ...

  6. [原] KVM 环境下MySQL性能对比

    KVM 环境下MySQL性能对比 标签(空格分隔): Cloud2.0 [TOC] 测试目的 对比MySQL在物理机和KVM环境下性能情况 压测标准 压测遵循单一变量原则,所有的对比都是只改变一个变量 ...

  7. 浅谈C++之冒泡排序、希尔排序、快速排序、插入排序、堆排序、基数排序性能对比分析之后续补充说明(有图有真相)

    如果你觉得我的有些话有点唐突,你不理解可以想看看前一篇<C++之冒泡排序.希尔排序.快速排序.插入排序.堆排序.基数排序性能对比分析>. 这几天闲着没事就写了一篇<C++之冒泡排序. ...

  8. Java--Stream,NIO ByteBuffer,NIO MappedByteBuffer性能对比

    目前Java中最IO有多种文件读取的方法,本文章对比Stream,NIO ByteBuffer,NIO MappedByteBuffer的性能,让我们知道到底怎么能写出性能高的文件读取代码. pack ...

  9. C正则库做DNS域名验证时的性能对比

    C正则库做DNS域名验证时的性能对比   本文对C的正则库regex和pcre在做域名验证的场景下做评测. 验证DNS域名的正则表达式为: "^[0-9a-zA-Z_-]+(\\.[0-9a ...

  10. 开发语言性能对比,C++、Java、Python、LUA、TCC

    一直想做开发语言性能对比,刚好有时间都做了给大家参考一下, 编译类:C++和Java表现还不错 脚本类:TCC脚本动态运行C语言,性能比其他脚本快好多... 想玩TCC的同学下载测试包,TCC目录下修 ...

随机推荐

  1. Laravel入坑指南(8)——控制台程序

    我们知道,php代码不仅可以用web的形式对外提供服务,同时也可以在命令行下执行. 对于原生的php来说,假设我们有一个php文件,名为Command.php,如果想要在控制台下执行这个文件,那么我们 ...

  2. Swift —— 一、架构解析

    一.简介 OpenStack 对象存储 (swift) 用于冗余.可扩展的数据 使用标准化服务器集群存储PB的存储 可访问的数据.它是一种长期存储系统,可存储大量 可以检索和更新的静态数据.对象存储使 ...

  3. virtualbox中linux设置NAT和Host-Only上网(实现双机互通同时可上外网)

    关于虚拟机中几种网络连接方式请参考其他教程. 平常,我们安装好虚机,用桥接方式也就够了.毕竟它能上内网和外网. 但是有个问题,如果你的网络环境发生变化,虚机的Ip也会随之改变(桥接的Ip和主机ip必须 ...

  4. iptables临时控制某ip访问权限

    iptables -A INPUT -p tcp -s {src_ip} --dport 80 -j ACCEPT iptables -A INPUT -p tcp -s {src_ip} --dpo ...

  5. leetcode - 中序遍历

    给定一个二叉树的根节点 root ,返回 它的 中序 遍历 . 示例 1: 输入:root = [1,null,2,3] 输出:[1,3,2] 示例 2: 输入:root = [] 输出:[] 示例 ...

  6. Hi3516开发笔记(八):Hi3516虚拟机交叉开发环境搭建之配置QtCreator开发交叉编译环境

    海思开发专栏 上一篇:<Hi3516开发笔记(七):Hi3516虚拟机交叉开发环境搭建之交叉编译Qt>下一篇:<Hi3516开发笔记(九):在QtCreator开发环境中引入海思sd ...

  7. 在MATPool矩池云完成Pytorch训练MNIST数据集

    本文为矩池云入门手册的补充:Pytorch训练MNIST数据集代码运行过程. 案例代码和对应数据集,以及在矩池云上的详细操作可以在矩池云入门手册中查看,本文基于矩池云入门手册,默认用户已经完成了机器租 ...

  8. 02、etcd单机部署和集群部署

    上一章我们认识了etcd,简单的介绍了 etcd 的基础概念,但是理解起来还是比较抽象的.这一章我们就一起来部署下 etcd .这样可以让我们对 etcd 有更加确切的认识. 1.etcd单实例部署 ...

  9. ABP的版本升级,从7.2.2升级到7.2.3

    1.升级ABP CLI 见前面的文章:ABP开发需要用到的命令 更新最新版本: ~~~ dotnet tool update -g Volo.Abp.Cli ~~~ 2.升级ABP Suite 见前面 ...

  10. 别再低效筛选数据了!试试pandas query函数

    数据过滤在数据分析过程中具有极其重要的地位,因为在真实世界的数据集中,往往存在重复.缺失或异常的数据.pandas提供的数据过滤功能可以帮助我们轻松地识别和处理这些问题数据,从而确保数据的质量和准确性 ...