oracle 12c AUTO_SAMPLE_SIZE动态采用工作机制

The ESTIMATE_PERCENT parameter in DBMS_STATS.GATHER_*_STATS procedures controls the percentage of rows to sample when gathering optimizer statistics. What percentage of rows should you sample to achieve accurate statistics? 100% will ensure that statistics are accurate, but it could take a long time. A 1% sample will finish much more quickly but it could result in poor statistics. It’s not an easy question to answer, which is why it is best practice to use the default: AUTO_SAMPLE_SIZE.

In this post, I’ll cover how the AUTO_SAMPLE_SIZE algorithm works in Oracle Database 12c and how it affects the accuracy of the statistics being gathered. If you want to learn more of the history prior to Oracle Database 12c, then this post on Oracle Database 11g is a good place to look. I will indicate below where there are differences between Oracle Database 11g and Oracle Database 12c.

It’s not always appreciated that (in general) a large proportion of the time and resource cost required to gather statistics is associated with evaluating the number of distinct values (NDVs) for each column. Calculating NDV using an exact algorithm can be expensive because the database needs to record and sort column values while statistics are being gathered. If the NDV is high, retaining and sorting column values can become resource-intensive, especially if the sort spills to TEMP. Auto sample size instead uses an approximate (but accurate) algorithm to calculate NDV that avoids the need to sort column data or spill to TEMP. In return for this saving, the database can afford to use a full table scan to ensure that the other basic column statistics are accurate.

Similarly, it can be resource-intensive to generate histograms but the Oracle Database mitigates this cost as follows:

Frequency and top frequency histograms are created as the database gathers basic column statistics (such as NDV, MIN, MAX) from the full table scan mentioned above. This is new to Oracle Database 12c.
If a frequency or top frequency histogram is not feasible, then the database will collect hybrid histograms using a sample of the column data. Top frequency is only feasible when the top 254 values constitute more than 99% of the entire non null column values and frequency is only feasible if NDV is 254 or less.
When the user has specified 'SIZE AUTO' in the METHOD_OPT clause for automatic histogram creation, the Oracle Database chooses which columns to consider for histogram creation based column usage data that’s gathered by the optimizer. Columns that are not used in WHERE-clause predicates or joins are not considered for histograms.

Both Oracle Database 11g and Oracle Database 12c use the following query to gather basic column statistics (it is a simplified here for illustrative purposes).

SELECT COUNT(c1), MIN(c1), MAX(c1)

FROM  t;

The query reads the table (T) and scans all rows (rather than using a sample). The database also needs to calculate the number of distinct values (NDV) for each column but the query does not use COUNT(DISTINCT c1) and so on, but instead, during execution, a special statistics gathering row source is injected into the query. The statistics gathering row source uses a one-pass, hash-based distinct algorithm to gather NDV. The algorithm requires a full scan of the data, uses a bounded amount of memory and yields a highly accurate NDV that is nearly identical to a 100 percent sampling (a fact that can be proven mathematically). The statistics gathering row source also gathers the number of rows, number of nulls and average column length. Since a full scan is used, the number of rows, average column length, minimum and maximum values are 100% accurate.

Effect of auto sample size on histogram gathering

Hybrid histogram gathering is decoupled from basic column statistics gathering and uses a sampleof column values. This technique was used in Oracle Database 11g to build height-balanced histograms. More information on this can be found in this blog post. Oracle Database 12c replaced height-balanced histograms with hybrid histograms.

Effect of auto sample size on index stats gathering

AUTO_SAMPLE_SIZE affects how index statistics are gathered. Index statistics gathering is sample-based and it can potentially go through several iterations if the sample contains too few blocks or the sample size was too small to properly gather number of distinct keys (NDKs). The algorithm has not changed since Oracle Database 11g, so I’ve left it to the previous blog to go more detail. There one other thing to note:

At the time of writing, there are some cases where index sampling can lead to NDV mis-estimates for composite indexes. The best work-around is to create a column group on the relevant columns and use gather_table_stats. Alternatively, there is a one-off fix - 27268249. This patch changes the way NDV is calculated for indexes on large tables (and no column group is required). It is available for 12.2.0.1 at the moment, but note that it cannot be backported. As you might guess, it's significantly slower than index block sampling, but it's still very fast. At the time of writing, if you find a case where index NDV is causing an issue with a query plan, then the recommended approach is to add a column group rather than attempting to apply this patch.

Summary:

Note that top frequency and hybrid histograms are new to Oracle Database 12c. Oracle Database 11g had frequency and height-balanced histograms only. Hybrid histograms replaced height-balanced histograms.

The auto sample size algorithm uses a full table scan (a 100% sample) to gather basic column statistics.
The cost of a full table scan (verses row sampling) is mitigated by the approximate NDV algorithm, which eliminates the need to sort column data.
The approximate NDV gathered by AUTO_SAMPLE_SIZE is close to the accuracy of a 100% sample.
Other basic column statistics, such as the number of nulls, average column length, minimal and maximal values have an accuracy equivalent to 100% sampling.
Frequency and top frequency histograms are created using a 100%* sample of column values and are created when basic column statistics are gathered. This is different to Oracle Database 11g, which decoupled frequency histogram creation from basic column statistics gathering (and used a sample of column values).
Hybrid histograms are created using a sample of column values. Internally, this step is decoupled from basic column statistics gathering.
Index statistics are gathered using a sample of column values. The sample size is determined automatically.

*There is an exception to case 5, above. Frequency histograms are created using a sample if OPTIONS=>'GATHER AUTO' is used after a bulk load where statistics have been gathered using online statistics gathering.

oracle 12c AUTO_SAMPLE_SIZE动态采用工作机制的更多相关文章

oracle 11g AUTO_SAMPLE_SIZE动态采用工作机制
Note that if you're interested in learning about Oracle Database 12c, there's an updated version of ...
2014年2月5日 Oracle ORACLE的工作机制[转]
网上看到一篇描写ORACLE工作机制的文章,觉得很不错!特摘录了下来. ORACLE的工作机制-1 (by xyf_tck) 我们从一个用户请求开始讲,ORACLE的简要的工作机制是怎样的,首 ...
java开发连接Oracle 12c采用PDB遇到问题记录
今天初次使用java连接Oracle 12c,遇到各种问题,为方便后续查询,在汇总了问题记录及解决方案如下. ORA-28040: No matching authentication protoco ...
从一个简单的main方法执行谈谈JVM工作机制
本来JVM的工作原理浅到可以泛泛而谈,但如果真的想把JVM工作机制弄清楚,实在是很难,涉及到的知识领域太多.所以,本文通过简单的mian方法执行,浅谈JVM工作原理,看看JVM里面都发生了什么. 先上 ...
JavaScript工作机制：V8 引擎内部机制及如何编写优化代码的5个诀窍
概述 JavaScript引擎是一个执行JavaScript代码的程序或解释器.JavaScript引擎可以被实现为标准解释器,或者实现为以某种形式将JavaScript编译为字节码的即时编译器. 下 ...
oracle 12c 列式存储 ( In Memory 理论)
随着Oracle 12c推出了in memory组件,使得Oracle数据库具有了双模式数据存放方式,从而能够实现对混合类型应用的支持:传统的以行形式保存的数据满足OLTP应用:列形式保存的数据满足以 ...
Httpd服务入门知识-http协议版本,工作机制及http服务器应用扫盲篇
Httpd服务入门知识-http协议版本,工作机制及http服务器应用扫盲篇作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.Internet与中国 Internet最早来源于美 ...
malloc 函数工作机制(转)
malloc()工作机制 malloc函数的实质体现在,它有一个将可用的内存块连接为一个长长的列表的所谓空闲链表.调用malloc函数时,它沿连接表寻找一个大到足以满足用户请求所需要的内存块.然后,将 ...
Oracle 12C RAC的optimizer_adaptive_features造成数据插入超时
问题分析使用10046事件追踪方式,直接生成上传时的数据库事件日志进行分析,发现主要区别在于以下两条sql语句在每次长时间上传时都有出现,并且执行用户不是上传用户,而是数据库SYS用户. ***** ...

随机推荐

git cherry-pick 报错is a merge but no -m option was given
gerrit上提示代码冲突的时候,我们首先会想到rebase下,不行的话就只能解决冲突了,最简单的做法是我的另一篇博客https://www.cnblogs.com/zndxall/p/9140813 ...
npm发布包--所遇到的问题
npm发布包: 解决方案--npm adduser的坑:http://www.tuicool.com/articles/FZbYve npm ERR publish 403,nodejs发布包流程 : ...
python list成员函数extend与append的区别
extend 原文解释,是以list中元素形式加入到列表中 extend list by appending elements from the iterable append(obj) 是将整个ob ...
在window下搭建Vue.Js开发环境
nodejs官网http://nodejs.cn/下载安装包,无特殊要求可本地傻瓜式安装,这里选择2017-5-2发布的 v6.10.3,也可选择阿里云镜像https://npm.taobao.org ...
Nodejs中原生遍历文件夹
最近在听老师讲的node课程,有个关于把异步变为同步读取文件夹的知识点做一些笔记, 让迭代器逐个自执行.
==与Equals的作用
string str1 = "Blackteeth"; string str2 = str1; string str3 = "Blackteeth"; Cons ...
Java多线程-----创建线程的几种方式
1.继承Thread类创建线程定义Thread类的子类,并重写该类的run()方法,该方法的方法体就是线程需要完成的任务,run()方法也称为线程执行体创建Thread子类的实例,也就是创建 ...
webservice 生成客户端代码
使用 jdk 自带工具 wsimport wsimport -keep http://webservice/url?wsdl
python SQLite说一点点, python使用数据库需要注意的几点
SQLite是一种嵌入式数据库,它的数据库就是一个文件.由于SQLite本身是C写的,而且体积很小,所以,经常被集成到各种应用程序中,甚至在iOS和Android的App中都可以集成. Python就 ...
JAVA基础3---运算符大全
Java中的运算符有以下种类:算术运算符.关系运算符.位运算符.逻辑运算符.赋值运算符.其他的运算符现在假设定义 int A = 10,B = 5: 一.算术运算符运算符描述案例 + 等同于数 ...

oracle 12c AUTO_SAMPLE_SIZE动态采用工作机制

oracle 12c AUTO_SAMPLE_SIZE动态采用工作机制的更多相关文章

随机推荐

热门专题