Indexing GROUP BY
SQL databases use two entirely different group by algorithms. The first one, the hash algorithm, aggregates the input records in a temporary hash table. Once all input records are processed, the hash table is returned as the result. The second algorithm, the sort/group algorithm, first sorts the input data by the grouping key so that the rows of each group follow each other in immediate succession. Afterwards, the database just needs to aggregate them. In general, both algorithms need to materialize an intermediate state, so they are not executed in a pipelined manner. Nevertheless the sort/group algorithm can use an index to avoid the sort operation, thus enabling a pipelined group by.
PRODUCT_ID:SELECT product_id, sum(eur_value)
FROM sales
WHERE sale_date = TRUNC(sysdate) - INTERVAL '' DAY
GROUP BY product_id
Knowing the index on SALE_DATE and PRODUCT_ID from the previous section, the sort/group algorithm is more appropriate because an INDEX RANGE SCAN automatically delivers the rows in the required order. That means the database avoids materialization because it does not need an explicit sort operation—the group by is executed in a pipelined manner.
oracle:
---------------------------------------------------------------
|Id |Operation | Name | Rows | Cost |
---------------------------------------------------------------
| 0 |SELECT STATEMENT | | 17 | 192 |
| 1 | SORT GROUP BY NOSORT | | 17 | 192 |
| 2 | TABLE ACCESS BY INDEX ROWID| SALES | 321 | 192 |
|*3 | INDEX RANGE SCAN | SALES_DT_PR | 321 | 3 |
---------------------------------------------------------------
SORT GROUP BY operation with the NOSORT addendum. The execution plan of other databases does not mention any sort operation at all. group by has the same prerequisites as the pipelined order by, except there are no ASC and DESC modifiers. That means that defining an index with ASC/DESC modifiers should not affect pipelined group by execution. The same is true for NULLS FIRST/LAST. Nevertheless there are databases that cannot properly use an ASC/DESC index for a pipelined group by.order by clause to make an index with NULLS LAST sorting usable for a pipelined group by. The Oracle database cannot read an index backwards in order to execute a pipelined group by that is followed by an order by. More details are available in the respective appendices: PostgreSQL, Oracle.If we extend the query to consider all sales since yesterday, as we did in the example for the pipelined order by, it prevents the pipelined group by for the same reason as before: the INDEX RANGE SCAN does not deliver the rows ordered by the grouping key.
SELECT product_id, sum(eur_value)
FROM sales
WHERE sale_date >= TRUNC(sysdate) - INTERVAL '' DAY
GROUP BY product_id
Oracle:
---------------------------------------------------------------
|Id |Operation | Name | Rows | Cost |
---------------------------------------------------------------
| 0 |SELECT STATEMENT | | 24 | 356 |
| 1 | HASH GROUP BY | | 24 | 356 |
| 2 | TABLE ACCESS BY INDEX ROWID| SALES | 596 | 355 |
|*3 | INDEX RANGE SCAN | SALES_DT_PR | 596 | 4 |
---------------------------------------------------------------
Instead, the Oracle database uses the hash algorithm. The advantage of the hash algorithm is that it only needs to buffer the aggregated result, whereas the sort/group algorithm materializes the complete input set. In other words: the hash algorithm needs less memory.
As with pipelined order by, a fast execution is not the most important aspect of the pipelined group by execution. It is more important that the database executes it in a pipelined manner and delivers the first result before reading the entire input.
参考:
http://use-the-index-luke.com/sql/sorting-grouping/indexed-group-by
Indexing GROUP BY的更多相关文章
- Elasticsearch: Indexing SQL databases. The easy way
Elasticsearchis a great search engine, flexible, fast and fun. So how can I get started with it? Thi ...
- Indexing Sensor Data
In particular embodiments, a method includes, from an indexer in a sensor network, accessing a set o ...
- pandas 之 group by 过程
import numpy as np import pandas as pd Categorizing a dataset and applying a function to each group ...
- LINQ Group By操作
在上篇文章 .NET应用程序与数据库交互的若干问题 这篇文章中,讨论了一个计算热门商圈的问题,现在在这里扩展一下,假设我们需要从两张表中统计出热门商圈,这两张表内容如下: 上表是所有政区,商圈中的餐饮 ...
- Kafka消费组(consumer group)
一直以来都想写一点关于kafka consumer的东西,特别是关于新版consumer的中文资料很少.最近Kafka社区邮件组已经在讨论是否应该正式使用新版本consumer替换老版本,笔者也觉得时 ...
- LINQ to SQL语句(6)之Group By/Having
适用场景:分组数据,为我们查找数据缩小范围. 说明:分配并返回对传入参数进行分组操作后的可枚举对象.分组:延迟 1.简单形式: var q = from p in db.Products group ...
- 学习笔记 MYSQL报错注入(count()、rand()、group by)
首先看下常见的攻击载荷,如下: select count(*),(floor(rand(0)*2))x from table group by x; 然后对于攻击载荷进行解释, floor(rand( ...
- [备查]使用 SPQuery 查询 "Person or Group" 字段
原文地址:http://www.stum.de/2008/02/06/querying-the-person-or-group-field-using-spquery/ Querying the “P ...
- order by 与 group by 区别
order by 排序查询.asc升序.desc降序 示例: select * from 学生表 order by 年龄 ---查询学生表信息.按年龄的升序(默认.可缺省.从低到高)排列显示 也可以多 ...
随机推荐
- java后台接收微信服务号/订阅号消息
1.申请订阅号(适合个人)或者服务号(适合企业) 微信公众平台 2.填写配置 服务器地址: 需要接收消息 的服务端接口地址 令牌:通话识别码,随便写,后端接收时,使用一样的就可以了. 消息加密秘钥 : ...
- 211. String Permutation【LintCode by java】
Description Given two strings, write a method to decide if one is a permutation of the other. Exampl ...
- (一)Spring Boot修改内置Tomcat端口号--解决tomcat端口被占用的问题
Spring Boot 内置Tomcat默认端口号为8080,在开发多个应用调试时很不方便,本文介绍了修改 Spring Boot内置Tomcat端口号的方法. 一.EmbeddedServletCo ...
- Linux查看物理CPU个数,核数,逻辑CPU个数;内存信息
# 总核数 = 物理CPU个数 X 每颗物理CPU的核数 # 总逻辑CPU数 = 物理CPU个数 X 每颗物理CPU的核数 X 超线程数 # 查看物理CPU个数 cat /proc/cpuinfo| ...
- Thunder团队第一次Scrum会议
Scrum会议1 小组名称:Thunder 项目名称:待定 参会成员: 王航(Master):http://www.cnblogs.com/wangh013/ 李传康:http://www.cnblo ...
- DAY7敏捷冲刺
站立式会议 工作安排 (1)服务器配置 服务器端项目结构调整 (2)数据库配置 单词学习记录+用户信息 (3)客户端 客户端项目结构调整,代码功能分离 燃尽图 燃尽图有误,已重新修改,先贴卡片的界面, ...
- XDA-University: Getting Started
XDA-University: Getting Started A while back, we introduced XDA-University to the world, an ongoing ...
- 【OpenGL】无法启动此程序,因为计算机中丢失 glut32.dll。尝试重新安装该程序以解决此问题。
运行OpenGL程序的时候报错,如图: 解决方法:把glut32.dll复制到C:\Windows\SysWOW64目录下,而不是像网上教程那样复制到C:\Windows\System32目录下. 原 ...
- TreeView的使用
用于显示多级层次关系 每一项是一个节点,也就是一个Node,是一个TreeNode节点,Nodes是该控件节点的集合. selectedNode用户选中的节点,如果没有选中则为null 1. 当选中后 ...
- sql sever 数据表
对视图进行操作,要在第三块区域进行添加记录操作,回车,然后会同步到所有相关数据表中. 记录不是列,而是行,不要混淆. 第二块区域是各个属性,就是说明: 第一块区域是要进行显示的字段,选中什么 显示什么 ...