Indexing GROUP BY

SQL databases use two entirely different group by algorithms. The first one, the hash algorithm, aggregates the input records in a temporary hash table. Once all input records are processed, the hash table is returned as the result. The second algorithm, the sort/group algorithm, first sorts the input data by the grouping key so that the rows of each group follow each other in immediate succession. Afterwards, the database just needs to aggregate them. In general, both algorithms need to materialize an intermediate state, so they are not executed in a pipelined manner. Nevertheless the sort/group algorithm can use an index to avoid the sort operation, thus enabling a pipelined group by.

Consider the following query. It delivers yesterday's revenue grouped by PRODUCT_ID:

SELECT product_id, sum(eur_value)

  FROM sales

 WHERE sale_date = TRUNC(sysdate) - INTERVAL '' DAY

 GROUP BY product_id

Knowing the index on SALE_DATE and PRODUCT_ID from the previous section, the sort/group algorithm is more appropriate because an INDEX RANGE SCAN automatically delivers the rows in the required order. That means the database avoids materialization because it does not need an explicit sort operation—the group by is executed in a pipelined manner.

oracle:

---------------------------------------------------------------

|Id |Operation                    | Name        | Rows | Cost |

---------------------------------------------------------------

| 0 |SELECT STATEMENT             |             |   17 |  192 |

| 1 | SORT GROUP BY NOSORT        |             |   17 |  192 |

| 2 |  TABLE ACCESS BY INDEX ROWID| SALES       |  321 |  192 |

|*3 |   INDEX RANGE SCAN          | SALES_DT_PR |  321 |    3 |

---------------------------------------------------------------

The Oracle database's execution plan marks a pipelined SORT GROUP BY operation with the NOSORT addendum. The execution plan of other databases does not mention any sort operation at all.

The pipelined group by has the same prerequisites as the pipelined order by, except there are no ASC and DESC modifiers. That means that defining an index with ASC/DESC modifiers should not affect pipelined group by execution. The same is true for NULLS FIRST/LAST. Nevertheless there are databases that cannot properly use an ASC/DESC index for a pipelined group by.

For PostgreSQL, you must add an order by clause to make an index with NULLS LAST sorting usable for a pipelined group by. The Oracle database cannot read an index backwards in order to execute a pipelined group by that is followed by an order by. More details are available in the respective appendices: PostgreSQL, Oracle.

If we extend the query to consider all sales since yesterday, as we did in the example for the pipelined order by, it prevents the pipelined group by for the same reason as before: the INDEX RANGE SCAN does not deliver the rows ordered by the grouping key.

SELECT product_id, sum(eur_value)

  FROM sales

 WHERE sale_date >= TRUNC(sysdate) - INTERVAL '' DAY

 GROUP BY product_id

Oracle:

---------------------------------------------------------------

|Id |Operation                    | Name        | Rows | Cost |

---------------------------------------------------------------

| 0 |SELECT STATEMENT             |             |   24 |  356 |

| 1 | HASH GROUP BY               |             |   24 |  356 |

| 2 |  TABLE ACCESS BY INDEX ROWID| SALES       |  596 |  355 |

|*3 |   INDEX RANGE SCAN          | SALES_DT_PR |  596 |    4 |

---------------------------------------------------------------

Instead, the Oracle database uses the hash algorithm. The advantage of the hash algorithm is that it only needs to buffer the aggregated result, whereas the sort/group algorithm materializes the complete input set. In other words: the hash algorithm needs less memory.

As with pipelined order by, a fast execution is not the most important aspect of the pipelined group by execution. It is more important that the database executes it in a pipelined manner and delivers the first result before reading the entire input.

参考：

http://use-the-index-luke.com/sql/sorting-grouping/indexed-group-by

Indexing GROUP BY的更多相关文章

Elasticsearch: Indexing SQL databases. The easy way
Elasticsearchis a great search engine, flexible, fast and fun. So how can I get started with it? Thi ...
Indexing Sensor Data
In particular embodiments, a method includes, from an indexer in a sensor network, accessing a set o ...
pandas 之 group by 过程
import numpy as np import pandas as pd Categorizing a dataset and applying a function to each group ...
LINQ Group By操作
在上篇文章 .NET应用程序与数据库交互的若干问题这篇文章中,讨论了一个计算热门商圈的问题,现在在这里扩展一下,假设我们需要从两张表中统计出热门商圈,这两张表内容如下: 上表是所有政区,商圈中的餐饮 ...
Kafka消费组(consumer group)
一直以来都想写一点关于kafka consumer的东西,特别是关于新版consumer的中文资料很少.最近Kafka社区邮件组已经在讨论是否应该正式使用新版本consumer替换老版本,笔者也觉得时 ...
LINQ to SQL语句(6)之Group By/Having
适用场景:分组数据,为我们查找数据缩小范围. 说明:分配并返回对传入参数进行分组操作后的可枚举对象.分组:延迟 1.简单形式: var q = from p in db.Products group ...
学习笔记 MYSQL报错注入(count()、rand()、group by)
首先看下常见的攻击载荷,如下: select count(*),(floor(rand(0)*2))x from table group by x; 然后对于攻击载荷进行解释, floor(rand( ...
[备查]使用 SPQuery 查询 "Person or Group" 字段
原文地址:http://www.stum.de/2008/02/06/querying-the-person-or-group-field-using-spquery/ Querying the “P ...
order by 与 group by 区别
order by 排序查询.asc升序.desc降序示例: select * from 学生表 order by 年龄 ---查询学生表信息.按年龄的升序(默认.可缺省.从低到高)排列显示也可以多 ...

随机推荐

【C++模版之旅】项目中一次活用C++模板(traits)的经历 -新注解
问题与需求: 请读者先看这篇文章,[C++模版之旅]项目中一次活用C++模板(traits)的经历. 对于此篇文章提出的问题,我给出一个新的思路. talking is cheap,show me t ...
JAVA基础学习之路（五）数组的定义及使用
什么是数组:就是一堆相同类型的数据放一堆(一组相关变量的集合) 定义语法: 1.声明并开辟数组数据类型数组名[] = new 数据类型[长度]: 2.分布完成声明数组:数据类型数组名 [] = ...
Vue-cli 工具 / 通过 Vue-cli 工具重构 todoList
本博文归纳在 Vue 学习过程中, Vue-cli 工具的使用说明.除此之外还通过 Vue-cli 工具将之前 Vuejs 基本语法当中实现的 todoList 进行重构. 安装 npm instal ...
C struct中的位域 bitfield
C struct中的位域 bitfield 结构体的成员可以限制其位域,每个成员可以使用用比字节还小的取值范围,下面的结构体s1中,四个成员每个成员都是2bit的值(0~3),整个结构体占据的空间依然 ...
SpringCloud IDEA 教学 (四) 断路器(Hystrix)
写在开始在SpringCloud项目中,服务之间相互调用(RPC Remote Procedure Call —远程过程调用),处于调用链路底层的服务产生不可用情况时,请求会产生堆积使得服务器线程阻 ...
wamp下安装https 实现 ssl 协议，主要是编写小程序通讯
也不知道腾讯为啥要这个限制,是想卖他的服务器资源么简单几句话 1 wamp3.0.X的版本不行,我折腾了一天半,放弃了,换成wamp2.5 一次通过 2 证书去腾讯云申请,单独域名的可以申请免费的 ...
java DTO 转 POJO
如果这两个类的要转化的属性其属性名不一样的话,那只能用get和set方法赋值如果你的两个类要转化的属性名都一样,那可以用org.springframework.beans.BeanUtils这个类来 ...
URAL 1664 Pipeline Transportation（平面图最大流）
Description An oligarch Vovan, as many other oligarchs, transports oil from West Cuckooland to East ...
NFC学习总结二
移动支付这事情热了总归还是会回归理性,就如同之前的10几年间的几次轮回一样.字面上看,移动支付比支付大也不大可能,有相同,有扩展,有交集有不通才是. NFC这事情也是说了快十年了,真心希望它能回归到其 ...
Python基础1 Hello World!
从今天开始和大家分享一下python最基础的知识,以便帮助初学者快速入门. 最最基础的当然是hello world 了,无论哪门语言都会从它开始... 简单的‘Hello World!’ 1. 直接运 ...

Indexing GROUP BY

Indexing GROUP BY的更多相关文章

随机推荐

热门专题