记录一次 postgresql 优化案例( volatility 自定义函数无法并行查询 )

同事最近做个金融适配项目,找我看条SQL,告知ORACLE跑1分钟,PG要跑30分钟(其实并没有这么夸张), 废话不说,贴慢SQL。
慢SQL(关键信息已经加密):
explain analyze
SELECT
c_qxxxxaode,
'2023-03-22 00:00:00' AS d_cdate,
SUM(CASE WHEN l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -60) THEN 1 ELSE 0 END) AS bt5ycusts,
SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -60) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36) THEN 1 ELSE 0 END) AS bt3to5ycusts,
SUM(CASE WHEN l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36) THEN 1 ELSE 0 END) AS bt3ycusts,
SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -36) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -12) THEN 1 ELSE 0 END) AS bt1yto3ycusts,
SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -12) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -6) THEN 1 ELSE 0 END) AS bt6to12mcusts,
SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -6) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -3) THEN 1 ELSE 0 END) AS bt3to6mcusts,
SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -3) THEN 1 ELSE 0 END) AS btlose3mcusts,
SUM(CASE WHEN l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -60) THEN f_qwwvddvvzz ELSE 0 END) AS bt5yshares,
SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -60) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36) THEN f_qwwvddvvzz ELSE 0 END) AS bt3to5yshares,
SUM(CASE WHEN l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36) THEN f_qwwvddvvzz ELSE 0 END) AS bt3yshares,
SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -36) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -12) THEN f_qwwvddvvzz ELSE 0 END) AS bt1to3yshares,
SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -12) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -6) THEN f_qwwvddvvzz ELSE 0 END) AS bt6to12mshares,
SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -6) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -3) THEN f_qwwvddvvzz ELSE 0 END) AS bt3to6mshares,
SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -3) THEN f_qwwvddvvzz ELSE 0 END) AS btlose3mshares,
round(AVG(months_between('2023-03-22 00:00:00', l.f_qdqdqdq)), 2) AS avgmonth,
round(AVG(nvl(f_qwwvddvvzz, 0)), 2) AS avgshares,
COUNT(a_daccoxxz) AS custsum,
SUM(f_qwwvddvvzz) AS sharessum
FROM
stcdlbxxxxx l
WHERE
nvl(l.f_qwwvddvvzz, 0) > 0 AND
l.f_qdqdqdq <= '2023-03-22 00:00:00' AND
l.a_daccoxxz <> '996000000000' AND
c_qxxxxaode IN (SELECT c_qxxxxaode FROM tfundinfo WHERE c_raisetype = '1')
GROUP BY
c_qxxxxaode;
执行计划:
HashAggregate (cost=1043326.49..1043332.91 rows=151 width=424) (actual time=381246.429..381246.640 rows=150 loops=1)
Group Key: l.c_qxxxxaode
-> Hash Semi Join (cost=8.78..936347.95 rows=301348 width=38) (actual time=0.057..30237.230 rows=30056793 loops=1)
Hash Cond: (l.c_qxxxxaode = tfundinfo.c_qxxxxaode)
-> Seq Scan on stcdlbxxxxx l (cost=0.00..906618.70 rows=10044941 width=38) (actual time=0.008..25908.814 rows=30157190 loops=1)
" Filter: ((NVL(f_qwwvddvvzz, '0'::numeric) > '0'::numeric) AND (f_qdqdqdq <= '2023-03-22 00:00:00'::timestamp without time zone) AND (a_daccoxxz <> '996000000000'::text))"
Rows Removed by Filter: 4842810
-> Hash (cost=6.91..6.91 rows=150 width=8) (actual time=0.046..0.047 rows=150 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 14kB
-> Index Only Scan using idx_tfundinfo_fundcode on tfundinfo (cost=0.28..6.91 rows=150 width=8) (actual time=0.021..0.037 rows=150 loops=1)
Index Cond: (c_raisetype = '1'::text)
Heap Fetches: 0
Planning Time: 0.512 ms
Execution Time: 381246.699 ms
select count(1) from stcdlbxxxxx;
count
----------
35000000
(1 row)
stcdlbxxxxx 表数据量 3500W,数据量不算大,这条SQL主要含义应该是对 fact 表做统计,一堆聚合函数,fact 表是数仓概念,如果不明白可以去百度。
这条SQL主要慢在HashAggregate这个节点上,Hash Semi Join 花了 30237.230毫秒(30秒),然后到上面 Group Key: l.c_qxxxxaode 分个组以后,HashAggregate 直接飙到 381246.640毫秒(6.3分钟)。
其实这条SQL最理想的状态是走 HashAggregate + parallel 的计划,但是优化器并没有这样做,我怀疑可能是SQL写法导致优化器没有走并行,没多想,直接改写了一版的SQL让同事去试试。
改写版本SQL:(PG独占的语法,聚合函数新增 FILTER 属性,代替 case when 写法)
explain analyze
SELECT c_qxxxxaode,
'2023-03-22 00:00:00' AS d_cdate,
COUNT(*) FILTER (WHERE l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -60)) AS bt5ycusts,
COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -60) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36)) AS bt3to5ycusts,
COUNT(*) FILTER (WHERE l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36)) AS bt3ycusts,
COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -36) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -12)) AS bt1yto3ycusts,
COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -12) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -6)) AS bt6to12mcusts,
COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -6) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -3)) AS bt3to6mcusts,
COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -3)) AS btlose3mcusts,
SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -60)) AS bt5yshares,
SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -60) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36)) AS bt3to5yshares,
SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36)) AS bt3yshares,
SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -36) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -12)) AS bt1to3yshares,
SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -12) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -6)) AS bt6to12mshares,
SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -6) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -3)) AS bt3to6mshares,
SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -3)) AS btlose3mshares,
ROUND(AVG(months_between('2023-03-22 00:00:00', l.f_qdqdqdq)), 2) AS avgmonth,
ROUND(AVG(NVL(f_qwwvddvvzz, 0)), 2) AS avgshares,
COUNT(a_daccoxxz) AS custsum,
SUM(f_qwwvddvvzz) AS sharessum
FROM stcdlbxxxxx l
WHERE NVL(l.f_qwwvddvvzz, 0) > 0
AND l.f_qdqdqdq <= '2023-03-22 00:00:00'
AND l.a_daccoxxz <> '996000000000'
AND c_qxxxxaode IN (SELECT c_qxxxxaode FROM tfundinfo WHERE c_raisetype = '1')
GROUP BY c_qxxxxaode;
改写后执行计划:
HashAggregate (cost=1043326.49..1043332.91 rows=151 width=424) (actual time=380246.621..380246.849 rows=150 loops=1)
Group Key: l.c_qxxxxaode
-> Hash Semi Join (cost=8.78..936347.95 rows=301348 width=38) (actual time=0.055..29983.463 rows=30056793 loops=1)
Hash Cond: (l.c_qxxxxaode = tfundinfo.c_qxxxxaode)
-> Seq Scan on stcdlbxxxxx l (cost=0.00..906618.70 rows=10044941 width=38) (actual time=0.008..25415.490 rows=30157190 loops=1)
" Filter: ((NVL(f_qwwvddvvzz, '0'::numeric) > '0'::numeric) AND (f_qdqdqdq <= '2023-03-22 00:00:00'::timestamp without time zone) AND (a_daccoxxz <> '996000000000'::text))"
Rows Removed by Filter: 4842810
-> Hash (cost=6.91..6.91 rows=150 width=8) (actual time=0.043..0.044 rows=150 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 14kB
-> Index Only Scan using idx_tfundinfo_fundcode on tfundinfo (cost=0.28..6.91 rows=150 width=8) (actual time=0.018..0.035 rows=150 loops=1)
Index Cond: (c_raisetype = '1'::text)
Heap Fetches: 0
Planning Time: 0.533 ms
Execution Time: 380246.909 ms
当时我看到这个改写后执行计划就无语了,和原SQL的计划一毛一样,直接把我整懵逼,怀疑是否现场环境没开并行参数,但是被同事告知是已经设置了并行参数。
qtbg=> show max_parallel_workers_per_gather
qtbg-> ;
max_parallel_workers_per_gather
---------------------------------
16
(1 row)
了解到情况以后开始研究SQL,发现使用了两个自定义函数,add_months,months_between(我们研发写的函数、为了兼容ORACLE),然后看了下这两个函数的信息。
qtbg=> \df+ add_months
List of functions
Schema | Name | Result data type | Argument data types | Type | Volatility | Parallel | Owner | Security | Access privileges | Language | Source code | Description
--------+------------+------------------+-----------------------------------+------+------------+----------+--------+----------+-------------------+----------+----------------------------------------------------------------------------------+-------------
sys | add_months | pg_catalog.date | pg_catalog.date, boolean | func | immutable | safe | system | invoker | | c | add_months_bool |
sys | add_months | pg_catalog.date | pg_catalog.date, integer | func | immutable | safe | system | invoker | | c | add_months |
sys | add_months | date | timestamp with time zone, integer | func | immutable | safe | system | invoker | | sql | select (add_months($1::pg_catalog.date, $2) + $1::pg_catalog.time)::sys.date; |
(3 rows) qtbg=> \df+ months_between
List of functions
Schema | Name | Result data type | Argument data types | Type | Volatility | Parallel | Owner | Security | Access privileges | Language | Source code | Descri
ption
--------+----------------+------------------+----------------------------------------------------+------+------------+----------+--------+----------+-------------------+----------+-------------------------------------------------------------------------------------------------------------------------------------+-------
------
sys | months_between | double precision | date, date | func | volatile | unsafe | system | invoker | | plsql | +|
| | | | | | | | | | | begin +|
| | | | | | | | | | | case when +|
| | | | | | | | | | | (last_day($1) = $1 and last_day($2) = $2) +|
| | | | | | | | | | | or +|
| | | | | | | | | | | (extract(day from $1) = extract(day from $2)) +|
| | | | | | | | | | | then +|
| | | | | | | | | | | return (select (extract(years from $1)::int * 12 - extract(years from $2)::int * 12)::float + +|
| | | | | | | | | | | (extract(month from $1)::int - extract(month from $2)::int)::float); +|
| | | | | | | | | | | else +|
| | | | | | | | | | | return (select (extract(years from $1)::int * 12 - extract(years from $2)::int * 12)::float + +|
| | | | | | | | | | | (extract(month from $1)::int - extract(month from $2)::int)::float + +|
| | | | | | | | | | | (extract(day from $1)::int - extract(day from $2)::int)/31::float + +|
| | | | | | | | | | | (extract(hour from $1)::int * 3600 + extract(minutes from $1)::int * 60 + extract(seconds from $1)::int)/(3600*24*31)::float -+|
| | | | | | | | | | | (extract(hour from $2)::int * 3600 + extract(minutes from $2)::int * 60 + extract(seconds from $2)::int)/(3600*24*31)::float);+|
| | | | | | | | | | | end case; +|
| | | | | | | | | | | end; +|
| | | | | | | | | | | |
sys | months_between | double precision | timestamp with time zone, timestamp with time zone | func | volatile | unsafe | system | invoker | | plsql | +|
| | | | | | | | | | | begin +|
| | | | | | | | | | | case when +|
| | | | | | | | | | | (last_day($1) = $1 and last_day($2) = $2) +|
| | | | | | | | | | | or +|
| | | | | | | | | | | (extract(day from $1) = extract(day from $2)) +|
| | | | | | | | | | | then +|
| | | | | | | | | | | return (select (extract(years from $1)::int * 12 - extract(years from $2)::int * 12)::float + +|
| | | | | | | | | | | (extract(month from $1)::int - extract(month from $2)::int)::float); +|
| | | | | | | | | | | else +|
| | | | | | | | | | | return (select (extract(years from $1)::int * 12 - extract(years from $2)::int * 12)::float + +|
| | | | | | | | | | | (extract(month from $1)::int - extract(month from $2)::int)::float + +|
| | | | | | | | | | | (extract(day from $1)::int - extract(day from $2)::int)/31::float + +|
| | | | | | | | | | | (extract(hour from $1)::int * 3600 + extract(minutes from $1)::int * 60 + extract(seconds from $1)::int)/(3600*24*31)::float -+|
| | | | | | | | | | | (extract(hour from $2)::int * 3600 + extract(minutes from $2)::int * 60 + extract(seconds from $2)::int)/(3600*24*31)::float);+|
| | | | | | | | | | | end case; +|
| | | | | | | | | | | end; +|
| | | | | | | | | | | |
(2 rows)
当时就给我整崩溃了,研发真的太坑啦。
months_between 函数居然还是 volatile(不稳定)状态,我一直以为研发写好的ORACLE 兼容函数都是 immutable 或者 是 stable 状态,所以刚开始也没往这方面想。
PG的自定义函数有三种状态,volatile、immutable 、stable 。
volatile 是不可以走并行的,这三种状态具体代表的是什么含义,如果不清楚的同学可以下去百度,这里就不废话了,我也怕解释不清楚。
和客户沟通以后,我写了一个 months_between1 函数代替原来的 months_between函数,返回结果小数点的精度可能和ORACLE有点区别,但是客户表示可以接受。
months_between1 函数:
CREATE OR REPLACE FUNCTION months_between1(date1 DATE, date2 DATE)
RETURNS FLOAT AS $$
DECLARE
years_diff INT;
months_diff INT;
days_diff FLOAT;
BEGIN
years_diff := EXTRACT(YEAR FROM date1) - EXTRACT(YEAR FROM date2);
months_diff := EXTRACT(MONTH FROM date1) - EXTRACT(MONTH FROM date2);
days_diff := EXTRACT(DAY FROM date1) - EXTRACT(DAY FROM date2);
days_diff := days_diff / 30.0;
RETURN (years_diff * 12) + months_diff + days_diff;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
将 months_between1 设置可以并行:
ALTER FUNCTION scott.months_between1(date, date) PARALLEL SAFE;
months_between1 函数信息:
\df+ months_between1
List of functions
Schema | Name | Result data type | Argument data types | Type | Volatility | Parallel | Owner | Security | Access privileges | Language | Source code | Description
--------+-----------------+------------------+------------------------+------+------------+----------+-------+----------+-------------------+----------+---------------------------------------------------------------------------+-------------
scott | months_between1 | double precision | date1 date, date2 date | func | immutable | safe | qtbg | invoker | | plpgsql | +|
| | | | | | | | | | | DECLARE +|
| | | | | | | | | | | years_diff INT; +|
| | | | | | | | | | | months_diff INT; +|
| | | | | | | | | | | days_diff FLOAT; +|
| | | | | | | | | | | BEGIN +|
| | | | | | | | | | | years_diff := EXTRACT(YEAR FROM date1) - EXTRACT(YEAR FROM date2); +|
| | | | | | | | | | | months_diff := EXTRACT(MONTH FROM date1) - EXTRACT(MONTH FROM date2);+|
| | | | | | | | | | | days_diff := EXTRACT(DAY FROM date1) - EXTRACT(DAY FROM date2); +|
| | | | | | | | | | | days_diff := days_diff / 30.0; +|
| | | | | | | | | | | RETURN (years_diff * 12) + months_diff + days_diff; +|
| | | | | | | | | | | END; +|
| | | | | | | | | | | |
(1 row)
最后执行SQL测试效率:
1 explain analyze
2 SELECT c_qxxxxaode,
3 '2023-03-22 00:00:00' AS d_cdate,
4 COUNT(*) FILTER (WHERE l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -60)) AS bt5ycusts,
5 COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -60) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36)) AS bt3to5ycusts,
6 COUNT(*) FILTER (WHERE l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36)) AS bt3ycusts,
7 COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -36) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -12)) AS bt1yto3ycusts,
8 COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -12) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -6)) AS bt6to12mcusts,
9 COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -6) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -3)) AS bt3to6mcusts,
10 COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -3)) AS btlose3mcusts,
11 SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -60)) AS bt5yshares,
12 SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -60) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36)) AS bt3to5yshares,
13 SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36)) AS bt3yshares,
14 SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -36) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -12)) AS bt1to3yshares,
15 SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -12) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -6)) AS bt6to12mshares,
16 SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -6) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -3)) AS bt3to6mshares,
17 SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -3)) AS btlose3mshares,
18 ROUND(AVG(months_between1('2023-03-22 00:00:00', l.f_qdqdqdq)), 2) AS avgmonth,
19 ROUND(AVG(NVL(f_qwwvddvvzz, 0)), 2) AS avgshares,
20 COUNT(a_daccoxxz) AS custsum,
21 SUM(f_qwwvddvvzz) AS sharessum
22 FROM stcdlbxxxxx l
23 WHERE NVL(l.f_qwwvddvvzz, 0) > 0
24 AND l.f_qdqdqdq <= '2023-03-22 00:00:00'
25 AND l.a_daccoxxz <> '996000000000'
26 AND c_qxxxxaode IN (SELECT c_qxxxxaode FROM tfundinfo WHERE c_raisetype = '1')
27 GROUP BY c_qxxxxaode;
执行计划:
Finalize GroupAggregate (cost=406085.34..424225.64 rows=151 width=424) (actual time=9533.754..16818.165 rows=150 loops=1)
Group Key: l.c_qxxxxaode
-> Gather Merge (cost=406085.34..424155.80 rows=906 width=392) (actual time=9484.778..16812.958 rows=1050 loops=1)
Workers Planned: 6
Workers Launched: 6
-> Partial GroupAggregate (cost=405085.25..423045.59 rows=151 width=392) (actual time=9383.774..16482.619 rows=150 loops=7)
Group Key: l.c_qxxxxaode
-> Sort (cost=405085.25..405210.81 rows=50225 width=38) (actual time=9358.407..10328.135 rows=4293828 loops=7)
Sort Key: l.c_qxxxxaode
Sort Method: quicksort Memory: 461463kB
Worker 0: Sort Method: quicksort Memory: 452754kB
Worker 1: Sort Method: quicksort Memory: 452822kB
Worker 2: Sort Method: quicksort Memory: 451891kB
Worker 3: Sort Method: quicksort Memory: 448327kB
Worker 4: Sort Method: quicksort Memory: 454387kB
Worker 5: Sort Method: quicksort Memory: 453623kB
-> Hash Semi Join (cost=8.78..401163.65 rows=50225 width=38) (actual time=0.219..4369.485 rows=4293828 loops=7)
Hash Cond: (l.c_qxxxxaode = tfundinfo.c_qxxxxaode)
-> Parallel Seq Scan on stcdlbxxxxx l (cost=0.00..396201.45 rows=1674157 width=38) (actual time=0.017..3948.943 rows=4308170 loops=7)
" Filter: ((NVL(f_qwwvddvvzz, '0'::numeric) > '0'::numeric) AND (f_qdqdqdq <= '2023-03-22 00:00:00'::timestamp without time zone) AND (a_daccoxxz <> '996000000000'::text))"
Rows Removed by Filter: 691830
-> Hash (cost=6.91..6.91 rows=150 width=8) (actual time=0.164..0.173 rows=150 loops=7)
Buckets: 1024 Batches: 1 Memory Usage: 14kB
-> Index Only Scan using idx_tfundinfo_fundcode on tfundinfo (cost=0.28..6.91 rows=150 width=8) (actual time=0.072..0.151 rows=150 loops=7)
Index Cond: (c_raisetype = '1'::text)
Heap Fetches: 0
Planning Time: 0.588 ms
Execution Time: 16851.196 ms
最后可以看到原来 6分钟的SQL,现在已经 16.8秒就能跑出结果,研发真的太坑了。
其实这条SQL完全不用改成 (聚合函数新增 FILTER 属性)这种写法,原来的SQL只要把 months_between 函数替换成 months_between1,一样也可以走并行。
但是为了更好看,还是把这种新的写法提交给了客户。
记录一次 postgresql 优化案例( volatility 自定义函数无法并行查询 )的更多相关文章
- 【Redis技术专区】「优化案例」谈谈使用Redis慢查询日志以及Redis慢查询分析指南
前提介绍 本篇文章主要介绍了Redis的执行的慢查询的功能的查询和配置功能,从而可以方便我们在实际工作中,进行分析Redis的性能运行状况以及对应的优化Redis性能的佐证和指标因素. 在我们5.0左 ...
- MYSQL数据库重点:自定义函数、存储过程、触发器、事件、视图
一.自定义函数 mysql自定义函数就是实现程序员需要sql逻辑处理,参数是IN参数,含有RETURNS字句用来指定函数的返回类型,而且函数体必须包含一个RETURN value语句. 语法: 创建: ...
- PostgreSQL常用操作合辑:时间日期、系统函数、正则表达式、库表导入导出、元数据查询、自定义函数、常用案例
〇.参考地址 1.pg官方文档 http://www.postgres.cn/docs/9.6/index.html 2.腾讯云仓pg文档 https://cloud.tencent.com/docu ...
- 数据库优化案例——————某市中心医院HIS系统
记得在自己学习数据库知识的时候特别喜欢看案例,因为优化的手段是容易掌握的,但是整体的优化思想是很难学会的.这也是为什么自己特别喜欢看案例,今天也开始分享自己做的优化案例. 最近一直很忙,博客产出也少的 ...
- mysql优化案例
MySQL优化案例 Mysql5.1大表分区效率测试 Mysql5.1大表分区效率测试MySQL | add at 2009-03-27 12:29:31 by PConline | view:60, ...
- Hive优化案例
1.Hadoop计算框架的特点 数据量大不是问题,数据倾斜是个问题. jobs数比较多的作业效率相对比较低,比如即使有几百万的表,如果多次关联多次汇总,产生十几个jobs,耗时很长.原因是map re ...
- 数据库优化案例——————某知名零售企业ERP系统
写在前面 记得在自己学习数据库知识的时候特别喜欢看案例,因为优化的手段是容易掌握的,但是整体的优化思想是很难学会的.这也是为什么自己特别喜欢看案例,今天也分享自己做的优化案例. 之前分享过OA系统.H ...
- ORACLE当中自定义函数性优化浅析
为什么函数影响性能 在SQL语句中,如果不合理的使用函数(Function)就会严重影响性能,其实这里想说的是PL/SQL中的自定义函数,反而对于一些内置函数而言,影响性能的可能性较小.那么为什么SQ ...
- Spark集群之yarn提交作业优化案例
Spark集群之yarn提交作业优化案例 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.启动Hadoop集群 1>.自定义批量管理脚本 [yinzhengjie@s101 ...
- MySQL参数优化案例
环境介绍 优化层级与指导思想 优化过程 最小化安装情况下的性能表现 优化innodb_buffer_pool_size 优化innodb_log_files_in_group&innodb_l ...
随机推荐
- Godot无法响应鼠标点击等输入事件时,检查这些内容
注:本文以Godot 4.0 为基准,可能其他版本也能参考. 这是我用C#写项目时发现的,可能和gdscript使用者遇到的问题有一定区别. 如果你用Godot制作的游戏无法响应鼠标点击等输入事件,请 ...
- 基于 SharpPcap 开发的简易嗅探器
Sniffer Based on SharpPcap A packet capture coursework based on sharpcap development 一个基于 SharpPcap ...
- 2023-08-22:请用go语言编写。给定一个长度为N的正数数组,还有一个正数K, 返回有多少子序列的最大公约数为K。 结果可能很大,对1000000007取模。 1 <= N <= 10^5, 1
2023-08-22:请用go语言编写.给定一个长度为N的正数数组,还有一个正数K, 返回有多少子序列的最大公约数为K. 结果可能很大,对1000000007取模. 1 <= N <= 1 ...
- JOIN 关联表中 ON、WHERE 后面跟条件的区别
SQL中join连接查询时条件放在on后与where后的区别 数据库在通过连接两张或多张表来返回记录时,都会生成一张中间的临时表,然后再将这张临时表返回给用户. 在使用left jion时,on和wh ...
- 程序员:你如何写可重复执行的SQL语句?
上图的意思: 百战百胜,屡试不爽. 故事 程序员小张: 刚毕业,参加工作1年左右,日常工作是CRUD 架构师老李: 多个大型项目经验,精通各种开发架构屠龙宝术: 小张注意到,在实际的项目开发场景中,很 ...
- 保护个人数据安全,使用luks加密硬盘分区
create:2023-01-24 17:44:44 准备工作 新硬盘4T,无数据.在root用户或sudo状态下执行. 首先创建分区表,由于mbr最大支持只有2T,因此分区表创建为gpt格式. 然后 ...
- XL-Formula流式统计运算方式配置说明
1.简介 XL-Formula是一种用于描述流式统计运算方式的配置标准,它代表着一种通用型流式统计系统的实现方法,更深层次它代表着一种以通用型流式统计技术为切入点,低成本实现企业数据化运营的理念.该配 ...
- 「openjudge / poj - 1057」Chessboard
link. 调起来真的呕吐,网上又没篇题解.大概是个不错的题. 首先行和列一定是独立的,所以我们把行列分开考虑.这样的问题就弱化为:在一个长度为 \(n\) 的格子带上,有 \(n\) 个物品,每个物 ...
- Solution -「THUPC 2019」Duckchess
Description Link. 大模拟是不可能给你概括题意的. Solution (据说鸭棋题解用这个标题很吉利)(这里是被点名批评的 长度 19k 的打法)(先说好代码里 Chinglish 满 ...
- Redis系列之——主从复制原理与优化、缓存的使用和优化
@ 目录 一 什么是主从复制 1.1 原理 1.2 主库是否要开启持久化 1.3 辅助配置(主从数据一致性配置) 二 复制的 配置 2.1 slave 命令 2.2 配置文件 四 故障处理 五 复制常 ...