PostgreSQL 涉及复杂视图查询的优化案例

一、前言

对于含有union , group by 等的视图，我们称之为复杂视图。这类的视图会影响优化器对于视图的提升，也就是视图无法与父查询进行合并，从而影响访问路径、连接方法、连接顺序等。本文通过例子，给大家展示PostgreSQL这类问题及针对该问题的优化方法。

二、Union 视图的优化

1、构建例子：

create table t1(id1 integer);

insert into t1 select generate_series(1,10);

create table t2(id2 integer,name char(500));

insert into t2 select generate_series(1,1000000),repeat('a',400);

create index ind_t2 on t2(id2);

create table t3(id3 integer,name char(500));

insert into t3 select generate_series(1,1000000),repeat('a',400);

create index ind_t3 on t3(id3);

create or replace view v_t2_t3 as

select id2 as id from t2

union

select id3 as id from t3;

2、分析执行计划

执行计划如下：

testdb=# explain analyze select * from t1,v_t2_t3 where id1=id;

                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------

 Merge Join  (cost=447340.31..483340.14 rows=99999 width=8) (actual time=1313.700..1313.711 rows=10 loops=1)

   Merge Cond: (t1.id1 = t2.id2)

   ->  Sort  (cost=1.27..1.29 rows=10 width=4) (actual time=0.019..0.021 rows=10 loops=1)

         Sort Key: t1.id1

         Sort Method: quicksort  Memory: 25kB

         ->  Seq Scan on t1  (cost=0.00..1.10 rows=10 width=4) (actual time=0.009..0.011 rows=10 loops=1)

   ->  Unique  (cost=447339.04..457338.98 rows=1999988 width=4) (actual time=1313.674..1313.681 rows=10 loops=1)

         ->  Sort  (cost=447339.04..452339.01 rows=1999988 width=4) (actual time=1313.673..1313.676 rows=19 loops=1)

               Sort Key: t2.id2

               Sort Method: external merge  Disk: 27488kB

               ->  Append  (cost=0.00..183333.70 rows=1999988 width=4) (actual time=0.017..923.420 rows=2000000 loops=1)

                     ->  Seq Scan on t2  (cost=0.00..76666.94 rows=999994 width=4) (actual time=0.016..547.533 rows=1000000 loops=1)

                     ->  Seq Scan on t3  (cost=0.00..76666.94 rows=999994 width=4) (actual time=0.014..261.595 rows=1000000 loops=1)

 Planning Time: 3.124 ms

 Execution Time: 1316.691 ms

(15 rows)

问题分析：视图 v_t2_t3 并没有与 t1进行合并（Unique部分），而是 t1 与结果进行连接。这个执行计划的问题在于 t1 表的数据量很少，如果能把 t1.id1 传入到视图，视图内部访问 t2 , t3 时就可以走索引，效率上要更高。

3、修改SQL

testdb=# explain analyze select * from t1,v_t2_t3 where id1=id and id=any(array(select id1 from t1));

                                                                QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------

 Hash Join  (cost=98.95..100.08 rows=10 width=8) (actual time=0.126..0.129 rows=10 loops=1)

   Hash Cond: (t1.id1 = t2.id2)

   InitPlan 1 (returns $0)

     ->  Seq Scan on t1 t1_1  (cost=0.00..1.10 rows=10 width=4) (actual time=0.001..0.002 rows=10 loops=1)

   ->  Seq Scan on t1  (cost=0.00..1.10 rows=10 width=4) (actual time=0.005..0.006 rows=10 loops=1)

   ->  Hash  (cost=97.60..97.60 rows=20 width=4) (actual time=0.115..0.116 rows=10 loops=1)

         Buckets: 1024  Batches: 1  Memory Usage: 9kB

         ->  HashAggregate  (cost=97.20..97.40 rows=20 width=4) (actual time=0.112..0.114 rows=10 loops=1)

               Group Key: t2.id2

               ->  Append  (cost=0.42..97.15 rows=20 width=4) (actual time=0.060..0.105 rows=20 loops=1)

                     ->  Index Only Scan using ind_t2 on t2  (cost=0.42..48.43 rows=10 width=4) (actual time=0.060..0.072 rows=10 loops=1)

                           Index Cond: (id2 = ANY ($0))

                           Heap Fetches: 10

                     ->  Index Only Scan using ind_t3 on t3  (cost=0.42..48.43 rows=10 width=4) (actual time=0.022..0.032 rows=10 loops=1)

                           Index Cond: (id3 = ANY ($0))

                           Heap Fetches: 10

 Planning Time: 0.171 ms

 Execution Time: 0.163 ms

(18 rows)

分析：通过增加条件 id=any(array(select id1 from t1)) ，可以看到该条件可以传入到视图内部。视图内部对于 t2 , t3 的访问是走索引的。

4、问题分析结论

对于类似 v_t2_t3 这种含有 union 的复杂视图，除非是指定明确的值，如 v_t2_t3.id=xxx ，才可以传入的视图内部。而对于连接条件，如: id1=id，则无法将 id1 传入到视图内部，这时视图只能走全表访问。

三、KingbaseES 提升了优化器能力

KingbaseES 通过修改优化器算法，实现了在RBO 层面对于该类SQL 的改写，避免了该问题。