oracle 12c AUTO_SAMPLE_SIZE动态采用工作机制
The ESTIMATE_PERCENT parameter in DBMS_STATS.GATHER_*_STATS procedures controls the percentage of rows to sample when gathering optimizer statistics. What percentage of rows should you sample to achieve accurate statistics? 100% will ensure that statistics are accurate, but it could take a long time. A 1% sample will finish much more quickly but it could result in poor statistics. It’s not an easy question to answer, which is why it is best practice to use the default: AUTO_SAMPLE_SIZE.
In this post, I’ll cover how the AUTO_SAMPLE_SIZE algorithm works in Oracle Database 12c and how it affects the accuracy of the statistics being gathered. If you want to learn more of the history prior to Oracle Database 12c, then this post on Oracle Database 11g is a good place to look. I will indicate below where there are differences between Oracle Database 11g and Oracle Database 12c.
It’s not always appreciated that (in general) a large proportion of the time and resource cost required to gather statistics is associated with evaluating the number of distinct values (NDVs) for each column. Calculating NDV using an exact algorithm can be expensive because the database needs to record and sort column values while statistics are being gathered. If the NDV is high, retaining and sorting column values can become resource-intensive, especially if the sort spills to TEMP. Auto sample size instead uses an approximate (but accurate) algorithm to calculate NDV that avoids the need to sort column data or spill to TEMP. In return for this saving, the database can afford to use a full table scan to ensure that the other basic column statistics are accurate.
Similarly, it can be resource-intensive to generate histograms but the Oracle Database mitigates this cost as follows:
- Frequency and top frequency histograms are created as the database gathers basic column statistics (such as NDV, MIN, MAX) from the full table scan mentioned above. This is new to Oracle Database 12c.
- If a frequency or top frequency histogram is not feasible, then the database will collect hybrid histograms using a sample of the column data. Top frequency is only feasible when the top 254 values constitute more than 99% of the entire non null column values and frequency is only feasible if NDV is 254 or less.
- When the user has specified 'SIZE AUTO' in the METHOD_OPT clause for automatic histogram creation, the Oracle Database chooses which columns to consider for histogram creation based column usage data that’s gathered by the optimizer. Columns that are not used in WHERE-clause predicates or joins are not considered for histograms.
Both Oracle Database 11g and Oracle Database 12c use the following query to gather basic column statistics (it is a simplified here for illustrative purposes).
SELECT COUNT(c1), MIN(c1), MAX(c1)
FROM t;
The query reads the table (T) and scans all rows (rather than using a sample). The database also needs to calculate the number of distinct values (NDV) for each column but the query does not use COUNT(DISTINCT c1) and so on, but instead, during execution, a special statistics gathering row source is injected into the query. The statistics gathering row source uses a one-pass, hash-based distinct algorithm to gather NDV. The algorithm requires a full scan of the data, uses a bounded amount of memory and yields a highly accurate NDV that is nearly identical to a 100 percent sampling (a fact that can be proven mathematically). The statistics gathering row source also gathers the number of rows, number of nulls and average column length. Since a full scan is used, the number of rows, average column length, minimum and maximum values are 100% accurate.
Effect of auto sample size on histogram gathering
Hybrid histogram gathering is decoupled from basic column statistics gathering and uses a sampleof column values. This technique was used in Oracle Database 11g to build height-balanced histograms. More information on this can be found in this blog post. Oracle Database 12c replaced height-balanced histograms with hybrid histograms.
Effect of auto sample size on index stats gathering
AUTO_SAMPLE_SIZE affects how index statistics are gathered. Index statistics gathering is sample-based and it can potentially go through several iterations if the sample contains too few blocks or the sample size was too small to properly gather number of distinct keys (NDKs). The algorithm has not changed since Oracle Database 11g, so I’ve left it to the previous blog to go more detail. There one other thing to note:
At the time of writing, there are some cases where index sampling can lead to NDV mis-estimates for composite indexes. The best work-around is to create a column group on the relevant columns and use gather_table_stats. Alternatively, there is a one-off fix - 27268249. This patch changes the way NDV is calculated for indexes on large tables (and no column group is required). It is available for 12.2.0.1 at the moment, but note that it cannot be backported. As you might guess, it's significantly slower than index block sampling, but it's still very fast. At the time of writing, if you find a case where index NDV is causing an issue with a query plan, then the recommended approach is to add a column group rather than attempting to apply this patch.
Summary:
Note that top frequency and hybrid histograms are new to Oracle Database 12c. Oracle Database 11g had frequency and height-balanced histograms only. Hybrid histograms replaced height-balanced histograms.
- The auto sample size algorithm uses a full table scan (a 100% sample) to gather basic column statistics.
- The cost of a full table scan (verses row sampling) is mitigated by the approximate NDV algorithm, which eliminates the need to sort column data.
- The approximate NDV gathered by AUTO_SAMPLE_SIZE is close to the accuracy of a 100% sample.
- Other basic column statistics, such as the number of nulls, average column length, minimal and maximal values have an accuracy equivalent to 100% sampling.
- Frequency and top frequency histograms are created using a 100%* sample of column values and are created when basic column statistics are gathered. This is different to Oracle Database 11g, which decoupled frequency histogram creation from basic column statistics gathering (and used a sample of column values).
- Hybrid histograms are created using a sample of column values. Internally, this step is decoupled from basic column statistics gathering.
- Index statistics are gathered using a sample of column values. The sample size is determined automatically.
*There is an exception to case 5, above. Frequency histograms are created using a sample if OPTIONS=>'GATHER AUTO' is used after a bulk load where statistics have been gathered using online statistics gathering.
oracle 12c AUTO_SAMPLE_SIZE动态采用工作机制的更多相关文章
- oracle 11g AUTO_SAMPLE_SIZE动态采用工作机制
Note that if you're interested in learning about Oracle Database 12c, there's an updated version of ...
- 2014年2月5日 Oracle ORACLE的工作机制[转]
网上看到一篇描写ORACLE工作机制的文章,觉得很不错!特摘录了下来. ORACLE的工作机制-1 (by xyf_tck) 我们从一个用户请求开始讲,ORACLE的简要的工作机制是怎样的,首 ...
- java开发连接Oracle 12c采用PDB遇到问题记录
今天初次使用java连接Oracle 12c,遇到各种问题,为方便后续查询,在汇总了问题记录及解决方案如下. ORA-28040: No matching authentication protoco ...
- 从一个简单的main方法执行谈谈JVM工作机制
本来JVM的工作原理浅到可以泛泛而谈,但如果真的想把JVM工作机制弄清楚,实在是很难,涉及到的知识领域太多.所以,本文通过简单的mian方法执行,浅谈JVM工作原理,看看JVM里面都发生了什么. 先上 ...
- JavaScript工作机制:V8 引擎内部机制及如何编写优化代码的5个诀窍
概述 JavaScript引擎是一个执行JavaScript代码的程序或解释器.JavaScript引擎可以被实现为标准解释器,或者实现为以某种形式将JavaScript编译为字节码的即时编译器. 下 ...
- oracle 12c 列式存储 ( In Memory 理论)
随着Oracle 12c推出了in memory组件,使得Oracle数据库具有了双模式数据存放方式,从而能够实现对混合类型应用的支持:传统的以行形式保存的数据满足OLTP应用:列形式保存的数据满足以 ...
- Httpd服务入门知识-http协议版本,工作机制及http服务器应用扫盲篇
Httpd服务入门知识-http协议版本,工作机制及http服务器应用扫盲篇 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.Internet与中国 Internet最早来源于美 ...
- malloc 函数工作机制(转)
malloc()工作机制 malloc函数的实质体现在,它有一个将可用的内存块连接为一个长长的列表的所谓空闲链表.调用malloc函数时,它沿连接表寻找一个大到足以满足用户请求所需要的内存块.然后,将 ...
- Oracle 12C RAC的optimizer_adaptive_features造成数据插入超时
问题分析 使用10046事件追踪方式,直接生成上传时的数据库事件日志进行分析,发现主要区别在于以下两条sql语句在每次长时间上传时都有出现,并且执行用户不是上传用户,而是数据库SYS用户. ***** ...
随机推荐
- [LeetCode] 830. Positions of Large Groups_Easy tag: Two Pointers
In a string S of lowercase letters, these letters form consecutive groups of the same character. For ...
- 原生js---ajax---get方法传数据
<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title> ...
- 输出调试技巧 PRINTF()
#define PRINTF(...) \ do { \ printf( "%d:%s::",__LINE__, __FUNCTION__);\ printf(__VA_ARGS_ ...
- LeetCode69.x的平方根
实现 int sqrt(int x) 函数. 计算并返回 x 的平方根,其中 x 是非负整数. 由于返回类型是整数,结果只保留整数的部分,小数部分将被舍去. 示例 1: 输入: 4 输出: 2 示例 ...
- clear/reset select2,重置select2,恢复默认
4.0 version //方法一$('#yourButton').on('click', function() { $('#yourfirstSelect2').val(null).trigger( ...
- Metasploit渗透技巧:后渗透Meterpreter代理
Metasploit是一个免费的.可下载的渗透测试框架,通过它可以很容易地获取.开发并对计算机软件漏洞实施攻击测试.它本身附带数百个已知软件漏洞的专业级漏洞攻击测试工具. 当H.D. Moore在20 ...
- mysql运用now(3)存储时间到毫秒
) from DUAL;
- 【2017-2-23】C#switch case分支语句,for循环语句
switch case分支语句 switch(一个变量值) { case 值:要执行的代码段;break; case 值:要执行的代码段;break; … default:代码段;break;(def ...
- AmiGO2:在线浏览和查询GO信息的利器
GO数据库的信息是非常庞大的,为了有效的检索和浏览GO数据库的信息,官方提供了AmiGO, 可以方便的浏览,查询和下载对应信息,官网如下 http://amigo.geneontology.org/a ...
- [转]Hive开发经验问答式总结
本文转载自:http://www.crazyant.net/1625.html 本文是自己开发Hive经验的总结,希望对大家有所帮助,有问题请留言交流. Hive开发经验思维导图 Hive开发经验总结 ...