SQL databases use two entirely different group by algorithms. The first one, the hash algorithm, aggregates the input records in a temporary hash table. Once all input records are processed, the hash table is returned as the result. The second algorithm, the sort/group algorithm, first sorts the input data by the grouping key so that the rows of each group follow each other in immediate succession. Afterwards, the database just needs to aggregate them. In general, both algorithms need to materialize an intermediate state, so they are not executed in a pipelined manner. Nevertheless the sort/group algorithm can use an index to avoid the sort operation, thus enabling a pipelined group by.

 
 
Consider the following query. It delivers yesterday's revenue grouped by PRODUCT_ID:
SELECT product_id, sum(eur_value)
FROM sales
WHERE sale_date = TRUNC(sysdate) - INTERVAL '' DAY
GROUP BY product_id

Knowing the index on SALE_DATE and PRODUCT_ID from the previous section, the sort/group algorithm is more appropriate because an INDEX RANGE SCAN automatically delivers the rows in the required order. That means the database avoids materialization because it does not need an explicit sort operation—the group by is executed in a pipelined manner.

oracle:
---------------------------------------------------------------
|Id |Operation | Name | Rows | Cost |
---------------------------------------------------------------
| 0 |SELECT STATEMENT | | 17 | 192 |
| 1 | SORT GROUP BY NOSORT | | 17 | 192 |
| 2 | TABLE ACCESS BY INDEX ROWID| SALES | 321 | 192 |
|*3 | INDEX RANGE SCAN | SALES_DT_PR | 321 | 3 |
---------------------------------------------------------------
The Oracle database's execution plan marks a pipelined SORT GROUP BY operation with the NOSORT addendum. The execution plan of other databases does not mention any sort operation at all. 
 
The pipelined group by has the same prerequisites as the pipelined order by, except there are no ASC and DESC modifiers. That means that defining an index with ASC/DESC modifiers should not affect pipelined group by execution. The same is true for NULLS FIRST/LAST. Nevertheless there are databases that cannot properly use an ASC/DESC index for a pipelined group by.
 
For PostgreSQL, you must add an order by clause to make an index with NULLS LAST sorting usable for a pipelined group by. The Oracle database cannot read an index backwards in order to execute a pipelined group by that is followed by an order by. More details are available in the respective appendices: PostgreSQLOracle.

If we extend the query to consider all sales since yesterday, as we did in the example for the pipelined order by, it prevents the pipelined group by for the same reason as before: the INDEX RANGE SCAN does not deliver the rows ordered by the grouping key.

SELECT product_id, sum(eur_value)
FROM sales
WHERE sale_date >= TRUNC(sysdate) - INTERVAL '' DAY
GROUP BY product_id
Oracle:
---------------------------------------------------------------
|Id |Operation | Name | Rows | Cost |
---------------------------------------------------------------
| 0 |SELECT STATEMENT | | 24 | 356 |
| 1 | HASH GROUP BY | | 24 | 356 |
| 2 | TABLE ACCESS BY INDEX ROWID| SALES | 596 | 355 |
|*3 | INDEX RANGE SCAN | SALES_DT_PR | 596 | 4 |
---------------------------------------------------------------

Instead, the Oracle database uses the hash algorithm. The advantage of the hash algorithm is that it only needs to buffer the aggregated result, whereas the sort/group algorithm materializes the complete input set. In other words: the hash algorithm needs less memory.

As with pipelined order by, a fast execution is not the most important aspect of the pipelined group by execution. It is more important that the database executes it in a pipelined manner and delivers the first result before reading the entire input.

参考:

http://use-the-index-luke.com/sql/sorting-grouping/indexed-group-by

Indexing GROUP BY的更多相关文章

  1. Elasticsearch: Indexing SQL databases. The easy way

    Elasticsearchis a great search engine, flexible, fast and fun. So how can I get started with it? Thi ...

  2. Indexing Sensor Data

    In particular embodiments, a method includes, from an indexer in a sensor network, accessing a set o ...

  3. pandas 之 group by 过程

    import numpy as np import pandas as pd Categorizing a dataset and applying a function to each group ...

  4. LINQ Group By操作

    在上篇文章 .NET应用程序与数据库交互的若干问题 这篇文章中,讨论了一个计算热门商圈的问题,现在在这里扩展一下,假设我们需要从两张表中统计出热门商圈,这两张表内容如下: 上表是所有政区,商圈中的餐饮 ...

  5. Kafka消费组(consumer group)

    一直以来都想写一点关于kafka consumer的东西,特别是关于新版consumer的中文资料很少.最近Kafka社区邮件组已经在讨论是否应该正式使用新版本consumer替换老版本,笔者也觉得时 ...

  6. LINQ to SQL语句(6)之Group By/Having

    适用场景:分组数据,为我们查找数据缩小范围. 说明:分配并返回对传入参数进行分组操作后的可枚举对象.分组:延迟 1.简单形式: var q = from p in db.Products group ...

  7. 学习笔记 MYSQL报错注入(count()、rand()、group by)

    首先看下常见的攻击载荷,如下: select count(*),(floor(rand(0)*2))x from table group by x; 然后对于攻击载荷进行解释, floor(rand( ...

  8. [备查]使用 SPQuery 查询 "Person or Group" 字段

    原文地址:http://www.stum.de/2008/02/06/querying-the-person-or-group-field-using-spquery/ Querying the “P ...

  9. order by 与 group by 区别

    order by 排序查询.asc升序.desc降序 示例: select * from 学生表 order by 年龄 ---查询学生表信息.按年龄的升序(默认.可缺省.从低到高)排列显示 也可以多 ...

随机推荐

  1. Swift使用AVAudioPlayer来调节游戏的背景音乐大小

    音乐文件的声音大小有时在做为游戏背景音乐时会过大,而如果我们只是简单应用SKAudioNode来加载音乐的话,是无法进行声音大小的调节的,因此我们必须使用更强大的AVAudioPlayer来进行声音大 ...

  2. 基于 CPython 解释器,为你深度解析为什么Python中整型不会溢出

    前言 本次分析基于 CPython 解释器,python3.x版本 在python2时代,整型有 int 类型和 long 长整型,长整型不存在溢出问题,即可以存放任意大小的整数.在python3后, ...

  3. JavaWeb--------JSP语法基础学习(特别适合入门)

    准备工作: 需要Tomcat8.0,MyEclipse,JDK JSP是一种运行在服务器端的脚本语言,JSP页面又是基于HTML网页的程序,它是Java Web 开发技术的基础. 基本内容: JSP页 ...

  4. JS变量定义时连续赋值的坑!

    在定义变量时,可以将值相同的变量采用连续赋值的方式,如下代码: var a = b = c = ''; 其实这里面有一个很大很大的坑,以代码说明问题: <script language=&quo ...

  5. 【rich-text】 富文本组件说明

    [rich-text] 富文本组件可以显示HTML代码样式. 1)支持事件:tap.touchstart.touchmove.touchcancel.touchend和longtap 2)信任的HTM ...

  6. Spring Boot - Filter实现简单的Http Basic认证

    Copy自http://blog.csdn.net/sun_t89/article/details/51916834 @SpringBootApplicationpublic class Spring ...

  7. 使用DataTables导出html表格

    去年与同事一起做一个小任务,需要把HTML表格中的数据导出到Excel.用原生js想要实现,只有IE浏览器提供导出到微软的Excel的接口,这就要求你电脑上必须安装IE浏览器.Excel,而且必须修改 ...

  8. Thunder——基于NABCD评价“欢迎来怼”团队作品

    基于NABCD N——need需求 对于开设了软件工程课并且正在进行教学活动的老师和同学,除了在写作业时会打开电脑进行操作,平时我们更希望可以通过一些简单方便的方法来查看有关作业的内容,比如查看一下老 ...

  9. 20162328蔡文琛week09

    学号 2016-2017-2 <程序设计与数据结构>第X周学习总结 教材学习内容总结 数据库是为了其他程序提供数据的应用软件. 关系书就哭通过唯一的标识符在不同表的记录见建立了关系. JD ...

  10. c#程序的config文件问题

    1.vshost.exe.config和app.config两个文件可不要,但exe.config文件不可少. 2.但是app.config最好也要修改了,每次重新生成程序的时候.exe.cmonfi ...