小结:

1、覆盖索引 回表

2、

All indexes in PostgreSQL are secondary indexes, meaning that each index is stored separately from the table's main data area (which is called the table's heap in PostgreSQL terminology). This means that in an ordinary index scan, each row retrieval requires fetching data from both the index and the heap.

3、

To solve this performance problem, PostgreSQL supports index-only scans, which can answer queries from an index alone without any heap access. The basic idea is to return values directly out of each index entry instead of consulting the associated heap entry. There are two fundamental restrictions on when this method can be used:

  1. The index type must support index-only scans. B-tree indexes always do. GiST and SP-GiST indexes support index-only scans for some operator classes but not others. Other index types have no support. The underlying requirement is that the index must physically store, or else be able to reconstruct, the original data value for each index entry. As a counterexample, GIN indexes cannot support index-only scans because each index entry typically holds only part of the original data value.

  2. The query must reference only columns stored in the index. For example, given an index on columns x and y of a table that also has a column z, these queries could use index-only scans:

    SELECT x, y FROM tab WHERE x = 'key';
    SELECT x FROM tab WHERE x = 'key' AND y < 42;

    but these queries could not:

    SELECT x, z FROM tab WHERE x = 'key';
    SELECT x FROM tab WHERE x = 'key' AND z < 42;

    (Expression indexes and partial indexes complicate this rule, as discussed below.)

PostgreSQL: Documentation: 12: 11.9. Index-Only Scans and Covering Indexes https://www.postgresql.org/docs/12/indexes-index-only-scans.html

Index-Only Scans and Covering Indexes

11.9. Index-Only Scans and Covering Indexes

All indexes in PostgreSQL are secondary indexes, meaning that each index is stored separately from the table's main data area (which is called the table's heap in PostgreSQL terminology). This means that in an ordinary index scan, each row retrieval requires fetching data from both the index and the heap. Furthermore, while the index entries that match a given indexable WHERE condition are usually close together in the index, the table rows they reference might be anywhere in the heap. The heap-access portion of an index scan thus involves a lot of random access into the heap, which can be slow, particularly on traditional rotating media. (As described in Section 11.5, bitmap scans try to alleviate this cost by doing the heap accesses in sorted order, but that only goes so far.)

To solve this performance problem, PostgreSQL supports index-only scans, which can answer queries from an index alone without any heap access. The basic idea is to return values directly out of each index entry instead of consulting the associated heap entry. There are two fundamental restrictions on when this method can be used:

  1. The index type must support index-only scans. B-tree indexes always do. GiST and SP-GiST indexes support index-only scans for some operator classes but not others. Other index types have no support. The underlying requirement is that the index must physically store, or else be able to reconstruct, the original data value for each index entry. As a counterexample, GIN indexes cannot support index-only scans because each index entry typically holds only part of the original data value.

  2. The query must reference only columns stored in the index. For example, given an index on columns x and y of a table that also has a column z, these queries could use index-only scans:

    SELECT x, y FROM tab WHERE x = 'key';
    SELECT x FROM tab WHERE x = 'key' AND y < 42;

    but these queries could not:

    SELECT x, z FROM tab WHERE x = 'key';
    SELECT x FROM tab WHERE x = 'key' AND z < 42;

    (Expression indexes and partial indexes complicate this rule, as discussed below.)

If these two fundamental requirements are met, then all the data values required by the query are available from the index, so an index-only scan is physically possible. But there is an additional requirement for any table scan in PostgreSQL: it must verify that each retrieved row be “visible” to the query's MVCC snapshot, as discussed in Chapter 13. Visibility information is not stored in index entries, only in heap entries; so at first glance it would seem that every row retrieval would require a heap access anyway. And this is indeed the case, if the table row has been modified recently. However, for seldom-changing data there is a way around this problem. PostgreSQL tracks, for each page in a table's heap, whether all rows stored in that page are old enough to be visible to all current and future transactions. This information is stored in a bit in the table's visibility map. An index-only scan, after finding a candidate index entry, checks the visibility map bit for the corresponding heap page. If it's set, the row is known visible and so the data can be returned with no further work. If it's not set, the heap entry must be visited to find out whether it's visible, so no performance advantage is gained over a standard index scan. Even in the successful case, this approach trades visibility map accesses for heap accesses; but since the visibility map is four orders of magnitude smaller than the heap it describes, far less physical I/O is needed to access it. In most situations the visibility map remains cached in memory all the time.

In short, while an index-only scan is possible given the two fundamental requirements, it will be a win only if a significant fraction of the table's heap pages have their all-visible map bits set. But tables in which a large fraction of the rows are unchanging are common enough to make this type of scan very useful in practice.

To make effective use of the index-only scan feature, you might choose to create a covering index, which is an index specifically designed to include the columns needed by a particular type of query that you run frequently. Since queries typically need to retrieve more columns than just the ones they search on, PostgreSQL allows you to create an index in which some columns are just “payload” and are not part of the search key. This is done by adding an INCLUDE clause listing the extra columns. For example, if you commonly run queries like

SELECT y FROM tab WHERE x = 'key';

the traditional approach to speeding up such queries would be to create an index on x only. However, an index defined as

CREATE INDEX tab_x_y ON tab(x) INCLUDE (y);

could handle these queries as index-only scans, because y can be obtained from the index without visiting the heap.

Because column y is not part of the index's search key, it does not have to be of a data type that the index can handle; it's merely stored in the index and is not interpreted by the index machinery. Also, if the index is a unique index, that is

CREATE UNIQUE INDEX tab_x_y ON tab(x) INCLUDE (y);

the uniqueness condition applies to just column x, not to the combination of x and y. (An INCLUDE clause can also be written in UNIQUE and PRIMARY KEY constraints, providing alternative syntax for setting up an index like this.)

It's wise to be conservative about adding non-key payload columns to an index, especially wide columns. If an index tuple exceeds the maximum size allowed for the index type, data insertion will fail. In any case, non-key columns duplicate data from the index's table and bloat the size of the index, thus potentially slowing searches. And remember that there is little point in including payload columns in an index unless the table changes slowly enough that an index-only scan is likely to not need to access the heap. If the heap tuple must be visited anyway, it costs nothing more to get the column's value from there. Other restrictions are that expressions are not currently supported as included columns, and that only B-tree indexes currently support included columns.

Before PostgreSQL had the INCLUDE feature, people sometimes made covering indexes by writing the payload columns as ordinary index columns, that is writing

CREATE INDEX tab_x_y ON tab(x, y);

even though they had no intention of ever using y as part of a WHERE clause. This works fine as long as the extra columns are trailing columns; making them be leading columns is unwise for the reasons explained in Section 11.3. However, this method doesn't support the case where you want the index to enforce uniqueness on the key column(s). Also, explicitly marking non-searchable columns as INCLUDE columns makes the index slightly smaller, because such columns need not be stored in upper B-tree levels.

In principle, index-only scans can be used with expression indexes. For example, given an index on f(x) where x is a table column, it should be possible to execute

SELECT f(x) FROM tab WHERE f(x) < 1;

as an index-only scan; and this is very attractive if f() is an expensive-to-compute function. However, PostgreSQL's planner is currently not very smart about such cases. It considers a query to be potentially executable by index-only scan only when all columns needed by the query are available from the index. In this example, x is not needed except in the context f(x), but the planner does not notice that and concludes that an index-only scan is not possible. If an index-only scan seems sufficiently worthwhile, this can be worked around by adding x as an included column, for example

CREATE INDEX tab_f_x ON tab (f(x)) INCLUDE (x);

An additional caveat, if the goal is to avoid recalculating f(x), is that the planner won't necessarily match uses of f(x) that aren't in indexable WHERE clauses to the index column. It will usually get this right in simple queries such as shown above, but not in queries that involve joins. These deficiencies may be remedied in future versions of PostgreSQL.

Partial indexes also have interesting interactions with index-only scans. Consider the partial index shown in Example 11.3:

CREATE UNIQUE INDEX tests_success_constraint ON tests (subject, target)
WHERE success;

In principle, we could do an index-only scan on this index to satisfy a query like

SELECT target FROM tests WHERE subject = 'some-subject' AND success;

But there's a problem: the WHERE clause refers to success which is not available as a result column of the index. Nonetheless, an index-only scan is possible because the plan does not need to recheck that part of the WHERE clause at run time: all entries found in the index necessarily have success = true so this need not be explicitly checked in the plan. PostgreSQL versions 9.6 and later will recognize such cases and allow index-only scans to be generated, but older versions will not.


Prev  Up  Next
11.8. Partial Indexes  Home  11.10. Operator Classes and Operator Families

Index-Only Scans and Covering Indexes的更多相关文章

  1. Covering Indexes in MySQL, PostgreSQL, and MongoDB

    Covering Indexes in MySQL, PostgreSQL, and MongoDB - Orange Matter https://orangematter.solarwinds.c ...

  2. 【黑魔法】Covering Indexes、STRAIGHT_JOIN

    今天给大家介绍两个黑魔法,这都是压箱底的法宝.大家在使用时,一定要弄清他们的适用场景及用法,用好了,就是一把开天斧,用不好那就是画蛇添足.自从看过耗子哥(左耳朵耗子)的博客,都会给对相应专题有兴趣的小 ...

  3. 【性能提升神器】Covering Indexes

    可能有小伙伴会问,Covering Indexes到底是什么神器呢?它又是如何来提升性能的呢?接下来我会用最通俗易懂的语言来进行介绍,毕竟不是每个程序猿都要像DBA那样深刻理解数据库,知道如何用以及如 ...

  4. 高性能MySQL笔记-第5章Indexing for High Performance-005聚集索引

    一.聚集索引介绍 1.什么是聚集索引? InnoDB’s clustered indexes actually store a B-Tree index and the rows together i ...

  5. oracle parallel_index hint在非分区表的生效

    之前没特别注意,在有些场景下希望使用并行索引扫描的时候,发现parallel_index hint并没有生效,于是抽空看了下文档:The PARALLEL_INDEX hint instructs t ...

  6. Oracle 常见hint

    Hints 应该慎用,收集相关表的统计信息,根据执行计划,来改变查询方式 只能在SELECT, UPDATE, INSERT, MERGE, or DELETE 关键字后面,只有insert可以用2个 ...

  7. PostgreSQL 11 新特性之覆盖索引(Covering Index)(转载)

    通常来说,索引可以用于提高查询的速度.通过索引,可以快速访问表中的指定数据,避免了表上的扫描.有时候,索引不仅仅能够用于定位表中的数据.某些查询可能只需要访问索引的数据,就能够获取所需要的结果,而不需 ...

  8. MySQL: Building the best INDEX for a given SELECT

    Table of Contents The ProblemAlgorithmDigressionFirst, some examplesAlgorithm, Step 1 (WHERE "c ...

  9. Differences between INDEX, PRIMARY, UNIQUE, FULLTEXT in MySQL?

    487down vote Differences KEY or INDEX refers to a normal non-unique index.  Non-distinct values for ...

随机推荐

  1. 为什么游戏公司的server不愿意微服务化?

    背景介绍 笔者最近去面试了家游戏公司(有上市).我问他,公司有没有做微服务架构的打算及考量?他很惊讶的,我没听说过微服务耶,你可以解释一下吗? 我大概说了,方便测试,方便维护,方便升级,服务之间松耦合 ...

  2. Turtlebot3新手教程:仿真

    本文章针对如何利用turtlebot3实现仿真功能进行讲解 测试环境:Ubuntu 16.04 和 ROS Kinetic Kame. 注意:TurtleBot3 Simulation 依赖 turt ...

  3. python实现域名注册查询

    author:摘繁华-蓝白社区 联合出品 域名生成与查询 文件说明: [x] .py源文件 [x] .exe可执行文件 [x] .config.json配置文件 ps: .exe和config.jso ...

  4. 解决黑群晖"抱歉,您所指定的页面不存在"-记一次黑群晖修复案例

    起因 搞了一个usb外接硬盘准备备份数,刚好看到群晖有个工具软件"USB Copy". 安装后设置拷贝docker文件夹,然后就悲剧了,nas主页抛出提示 一开始也是直接网上搜索标 ...

  5. uni-app 页面跳转的两种方法

    1.navigator  标签 <navigator url="../component/classdetails/classdetails"> <view cl ...

  6. cookie和session会话技术

    因为http协议是无状态的,也就是说每个客户端访问服务器端资源时,服务器并不知道该客户端是谁,所以需要会话技术识别客户端状态.会话技术是帮助服务器记住客户端状态的. 一次会话的开始是通过浏览器访问某个 ...

  7. Redis-4.X 版本 Redis Cluster集群 (一)

    一 创建redis cluster 集群前提条件: 1 ) 每个redis node 节点采用相同的硬件配置,相同的密码. 2 ) 每个节点必须开启的参数: cluster-enabled yes # ...

  8. 【JS学习】var let const声明变量的异同点

    [JS学习]var let const声明变量的异同点 前言: 本博客系列为学习后盾人js教程过程中的记录与产出,如果对你有帮助,欢迎关注,点赞,分享.不足之处也欢迎指正,作者会积极思考与改正. 总述 ...

  9. 【分布式锁的演化】终章!手撸ZK分布式锁!

    前言 这应该是分布式锁演化的最后一个章节了,相信很多小伙伴们看完这个章节之后在应对高并发的情况下,如何保证线程安全心里肯定也会有谱了.在实际的项目中也可以参考一下老猫的github上的例子,当然代码没 ...

  10. appium识别工具介绍