KingbaseES 函数稳定性与SQL性能

背景：客户现场的一次艰苦的调优过程（https://www.cnblogs.com/kingbase/p/16015834.html），让我觉得非常有必要让数据库用户了解函数的不同稳定性属性，及其对于SQL性能可能带来的影响。很多DBA、开发人员根本就不在意，或者根本就没意识到函数稳定性对于性能的影响，这就导致出现性能问题时没有头绪。以下的例子都是实际客户现场问题的提炼，为了让用户有直观的了解。

一、函数的三种稳定态

函数的稳定性状态，简单地说就是相同的输入参数情况下，函数返回值是否相同。稳定性使得优化器可以判断不同函数的行为。为了得到最佳的优化结果，在创建函数时我们应该指定与函数功能相对应稳定性级别。

如果本应该是 volatile ，而设置成 immutable，会导致结果错误。如：currtid 函数，不同时刻执行的结果可能是不同的，必须是volatile 。
如果本应该是 immutable，而设置成 volatile ，就会导致性能问题。如：trunc 函数，不同时刻执行结果应该是相同的，只需执行一次即可。

PostgreSQL 或 KingbaseES 函数在定义时有三种稳定态级别：volatile、stable 和 immutable。默认情况下，创建函数的稳定性为volatile。以下是这三种函数的区别：

Volatile 函数可以做任何事情，包括修改数据库。在调用中，输入同样的参数可能会返回不同的结果，比如：currtid 。在一个Query中，对于每一行都会重新计算该函数。
Stable 函数不能修改数据库，单个Query中所有行给定同样的参数确保返回相同的结果。这种稳定级别允许优化器将多次函数调用转换为一次。在索引扫描的条件中使用这种函数是可行的，因为索引扫描只计算一次比较值(comparison value),而不是每行都计算一次。
Immutable 函数不能修改数据库，在任何情况下，只要输入参数相同，返回结果就相同。这种级别的函数，优化器可以提前进行计算，在查询过程中作为常量参数。比如：SELECT...WHERE x=2+2 可以简化为SELECT...WHERE x=4。

KingbaseES 为了兼容Oracle，增加了 Deterministic稳定态，等价于 immutable 。

以下例子同时在PostgreSQL 与 KingbaseES 进行过验证。

二、等式左边函数稳定性与性能

结论：等式左边的 volatile 函数可能导致函数不必要的被重复执行。

1、构建例子：创建三种稳定性的函数。

create or replace function test_volatile(id integer)

    returns bigint

    volatile

    language sql

as

$$ select count(*) from t1 $$ ;

/

create or replace function test_stable(id integer)

    returns bigint

    stable

    language sql

as

$$ select count(*) from t1 $$ ;

/

create or replace function test_immutable(id integer)

    returns bigint

    immutable

    language sql

as

$$ select count(*) from t1 $$ ;

/

例子数据如下：

create table test(id integer);

insert into test select 1 from generate_series(1,10);

2、以列为参数调用函数

可以看到三个函数调用时间基本没有差别，因为，传入的参数是id 变量值（虽然实际值是相同的）。

test=# select count(*) from test where test_volatile(id)=1;

 count

-------

     0

(1 row)

Time: 39652.495 ms (00:39.652)


test=# select count(*) from test where test_stable(id)=1;

 count

-------

     0

(1 row)

Time: 38789.952 ms (00:38.790)


test=# select count(*) from test where test_immutable(id)=1;

 count

-------

     0

(1 row)

Time: 38591.957 ms (00:38.592)

3、函数在等式右边的情况

test=# select count(*) from test where id=test_volatile(1);

 count

-------

     0

(1 row)

Time: 40353.777 ms (00:40.354)

test=# select count(*) from test where id=test_stable(1);

 count

-------

     0

(1 row)

Time: 40500.253 ms (00:40.500)

test=# select count(*) from test where id=test_immutable(1);

 count

-------

     0

(1 row)

Time: 374.300 ms

4、传入常量值

可以看到，对于常量值，stable 和 immutable 类型的函数实际只需调用一次。

test=# select count(*) from test where test_volatile(1)=1;

 count

-------

     0

(1 row)

Time: 41647.551 ms (00:41.648)

test=# select count(*) from test where test_stable(1)=1;

 count

-------

     0

(1 row)

Time: 399.161 ms

test=# select count(*) from test where test_immutable(1)=1;

 count

-------

     0

(1 row)

Time: 389.367 ms

5、for循环

可以看到，对于相同的输入参数， test_immutable 在同一query 只执行一次。

对于for 循环，实际结果相同。

test=# begin

test-#   for i in 1..100 loop

test-#     perform test_immutable(1);

test-#   end loop;

test-# end;

test-# /

ANONYMOUS BLOCK

Time: 1001.184 ms (00:01.001)

6、等式右边的函数属性也会影响索引的使用

来看以下例子：

test=# create table t2(id integer,name char(999));

CREATE TABLE                                                                    ^

test=# insert into t2 select generate_series(1,100),repeat('a',999);

INSERT 0 100

test=# create index ind_t2 on t2(id);

CREATE INDEX

test=#

test=# analyze t2;

ANALYZE

test=# explain analyze select count(*) from t2 where id=test_volatile(1);

                                                QUERY PLAN

-----------------------------------------------------------------------------------------------------------

 Aggregate  (cost=41.25..41.26 rows=1 width=8) (actual time=38735.427..38735.428 rows=1 loops=1)

   ->  Seq Scan on t2  (cost=0.00..41.25 rows=1 width=0) (actual time=38735.424..38735.425 rows=0 loops=1)

         Filter: (id = test_volatile(1))

         Rows Removed by Filter: 100

 Planning Time: 0.191 ms

 Execution Time: 38735.447 ms

(6 rows)

test=# explain analyze select count(*) from t2 where id=test_stable(1);

                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=8.41..8.42 rows=1 width=8) (actual time=385.449..385.450 rows=1 loops=1)

   ->  Index Only Scan using ind_t2 on t2  (cost=0.39..8.41 rows=1 width=0) (actual time=385.446..385.446 rows=0 loops=1)

         Index Cond: (id = test_stable(1))

         Heap Fetches: 0

 Planning Time: 398.499 ms

 Execution Time: 385.487 ms

(6 rows)

test=# explain analyze select count(*) from t2 where id=test_immutable(1);

                                                      QUERY PLAN

----------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=8.16..8.17 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=1)

   ->  Index Only Scan using ind_t2 on t2  (cost=0.14..8.16 rows=1 width=0) (actual time=0.011..0.011 rows=0 loops=1)

         Index Cond: (id = '10000000'::bigint)

         Heap Fetches: 0

 Planning Time: 383.902 ms

 Execution Time: 0.032 ms

(6 rows)

可以看到volatile 情况下，无法使用索引

三、Volatile 函数影响子查询提升

1、构建例子

create table tab1(id1 integer,name1 varchar(9),addr1 text);

create table tab2(id2 integer,name2 varchar(9),addr2 text);

insert into tab1 select generate_series(1,1000000),generate_series(1,1000000),'abc';

insert into tab2 select generate_series(1,1000000),generate_series(1,1000000),'abc';

create index ind_tab1 on tab1(id1);

create index ind_tab2 on tab2(id2);

2、volatile 函数影响子查询提升

对于 tab2 表的访问无法使用索引。

test=# \df+ replace

                                                                                      List of functions

 Schema |  Name   | Result data type  | Argument data types | Type | Volatility | Parallel | Owner  | Security | Access privileges | Language |           Source code           | Description

--------+---------+-------------------+---------------------+------+------------+----------+--------+----------+-------------------+----------+---------------------------------+-------------

 sys    | replace | character varying | text, text, text    | func | volatile   | safe     | system | invoker  |                   | c        | ora_replace_text                | 

test=# explain select id1,name1 from tab1 a ,(select id2,replace(id2,'b','B') name2 from tab2 ) b where a.id1=b.id2 and name1='123';

                             QUERY PLAN

---------------------------------------------------------------------

 Hash Join  (cost=17935.01..52120.02 rows=1 width=10)

   Hash Cond: (tab2.id2 = a.id1)

   ->  Seq Scan on tab2  (cost=0.00..20435.00 rows=1000000 width=36)

   ->  Hash  (cost=17935.00..17935.00 rows=1 width=10)

         ->  Seq Scan on tab1 a  (cost=0.00..17935.00 rows=1 width=10)

               Filter: ((name1)::text = '123'::text)

(6 rows)

不使用子查询情况下的，可以使用索引：

test=# explain select  id1,name1,id2,replace(id2,'b','B') name2 from tab1 a,tab2 b where a.id1=b.id2 and name1='123';

                                  QUERY PLAN

------------------------------------------------------------------------------

 Nested Loop  (cost=0.42..17943.46 rows=1 width=46)

   ->  Seq Scan on tab1 a  (cost=0.00..17935.00 rows=1 width=10)

         Filter: ((name1)::text = '123'::text)

   ->  Index Only Scan using ind_tab2 on tab2 b  (cost=0.42..8.44 rows=1 width=4)

         Index Cond: (id2 = a.id1)

(5 rows)

3、immutable 函数与执行计划

改成immutable 函数后，子查询可以提升，从而能够使用索引。

test=# \df+ replace

                                                                                      List of functions

 Schema |  Name   | Result data type  | Argument data types | Type | Volatility | Parallel | Owner  | Security | Access privileges | Language |           Source code           | Description

--------+---------+-------------------+---------------------+------+------------+----------+--------+----------+-------------------+----------+---------------------------------+-------------

 sys    | replace | character varying | text, text, text    | func | immutable  | safe     | system | invoker  |                   | c        | ora_replace_text                | 

test=# explain select id1,name1 from tab1 a ,(select id2,replace(id2,'b','B') name2 from tab2 ) b where a.id1=b.id2 and name1='123';

                                 QUERY PLAN

----------------------------------------------------------------------------

 Nested Loop  (cost=0.42..17943.45 rows=1 width=10)

   ->  Seq Scan on tab1 a  (cost=0.00..17935.00 rows=1 width=10)

         Filter: ((name1)::text = '123'::text)

   ->  Index Only Scan using ind_tab2 on tab2  (cost=0.42..8.44 rows=1 width=4)

         Index Cond: (id2 = a.id1)

(5 rows)

五、结论

函数的稳定性特性不是PostgreSQL 独有的，Oracle 也有同样的机制（默认volatile ，可以加 deterministic），不同的稳定态会影响性能及数据的准确性。用户在创建函数时，必须对了解自己所创建的函数应归属的属性，错误的属性设置可能导致函数返回结果错误，以及函数执行的性能问题。