[20181130]如何猜测那些值存在hash冲突.txt
[20181130]如何猜测那些值存在hash冲突.txt
--//今年6月份开始kerrycode的1个帖子提到子查询结果缓存在哈希表中情况:
--//链接:http://www.cnblogs.com/kerrycode/p/9099507.html,摘要:
通俗来将,当使用标量子查询的时候,ORACLE会将子查询结果缓存在哈希表中, 如果后续的记录出现同样的值,优化器通过缓存在哈希
表中的值,判断重复值不用重复调用函数,直接使用上次计算结果即可。从而减少调用函数次数,从而达到优化性能的效果。另外在
ORACLE 10和11中, 哈希表只包含了255个Buckets,也就是说它能存储255个不同值,如果超过这个范围,就会出现散列冲突,那些出现
散列冲突的值就会重复调用函数,即便如此,依然能达到大幅改善性能的效果。
--//我当时就非常想从作者了解"哈希表只包含了255个Buckets",这个观点的出处.kerrycode给了我一个链接:
https://blogs.oracle.com/oraclemagazine/on-caching-and-evangelizing-sql
Oracle Database will use this hash table to remember the scalar subquery and the inputs to it—just :DEPTNO in this case
—and the output from it. At the beginning of every query execution, this cache is empty, but suppose you run the query
and the first PROJECTS row you retrieve has a DEPTNO value of 10. Oracle Database will assign the number 10 to a hash
value between 1 and 255 (the size of the hash table cache in Oracle Database 10g and Oracle Database 11g currently) and
will look in that hash table slot to see if the answer exists. In this case, it will not, so Oracle Database must run
the scalar subquery with the input of 10 to get the answer. If that answer (count) is 42, the hash table may look
something like this:
Select count(*) from emp where emp.deptno = :deptno
:deptno Count(*)
You'll have saved the DEPTNO value of 10 and the answer (count) of 42 in some slot—probably not the first or last slot,
but whatever slot the hash value 10 is assigned to. Now suppose the second row you get back from the PROJECTS table
includes a DEPTNO value of 20. Oracle Database will again look in the hash table after assigning the value 20, and it
will discover "no result in the cache yet." So it will run the scalar subquery, get the result, and put it into the hash
table cache. Now the cache may look like this:
Select count(*) from emp where emp.deptno = :deptno
:deptno Count(*)
Select count(*) from emp where emp.deptno = :deptno
:deptno Count(*)
… …
10 42
Now suppose the query returns a third row and it again includes a DEPTNO value of 10. This time, Oracle Database will
see DEPTNO = 10, find that it already has that value in the hash table cache, and will simply return 42 from the cache
instead of executing the scalar subquery. In fact, it will never have to run that scalar subquery for the DEPTNO values
of 10 or 20 again for that query—it will already have the answer.
What happens if the number of unique DEPTNO values exceeds the size of the hash table? What if there are more than 255
values? Or, more generally, if more than one DEPTNO value is assigned to the same slot in the hash table, what happens
in a hash collision?
The answer is the same for all these questions and is rather simple: Oracle Database will not be able to cache the
second or nth value to that slot in the hash table. For example, what if the third row returned by the query contains
the DEPTNO = 30 value? Further, suppose that DEPTNO = 30 is to be assigned to exactly the same hash table slot as DEPTNO
= 10. The database won't be able to effectively cache DEPTNO = 30 in this case—the value will never make it into the
hash table. It will, however, be "partially cached." Oracle Database still has the hash table with all the previous
executions, but it also keeps the last scalar subquery result it had "next to" the hash table. That is, if the fourth
row also includes a DEPTNO = 30 value, Oracle Database will discover that the result is not in the hash table but is
"next to" the hash table, because the last time it ran the scalar subquery, it was run with an input of 30. On the other
hand, if the fourth row includes a DEPTNO = 40 value, Oracle Database will run the scalar subquery with the DEPTNO = 40
value (because it hasn't seen that value yet during this query execution) and overwrite the DEPTNO = 30 result. The next
time Oracle Database sees DEPTNO = 30 in the result set, it'll have to run that scalar subquery again.
--//我自己开始瞎尝试各种方法验证hash buckets是否是255.我开始先入为主,认为就是255(或者256),经历许多混乱,最后kerrycode给我
--//一个测试方法,链接如下:
http://blog.itpub.net/267265/viewspace-2156702/
http://www.cnblogs.com/kerrycode/p/9223093.html
--//按照这个方法很容易验证hash buckets大小,11.2.0.4是1024,10.2.0.4是512,12.1.0.1是1024.
--//我想起开始测试时,75与48存在冲突的情况,当时我没有想到这么靠前的值存在冲突,为了验证我几乎是1个1个尝试.
--//因为你根本不知道oracle的算法.
--//昨天看https://jonathanlewis.wordpress.com/2018/11/26/shrink-space-2/,验证为什么4与432存在冲突.
1.环境:
SCOTT@book> @ &r/ver1
PORT_STRING VERSION BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
create or replace function f( x in varchar2 ) return number
as
begin
dbms_application_info.set_client_info(userenv('client_info')+1 );
return length(x);
end;
/
SCOTT@book> create table t as select rownum id1,mod(rownum-1,10000)+1 id2 from dual connect by level<=20000;
Table created.
SCOTT@book> create table t1 ( a number ,b number);
Table created.
--//字段a 记录调用函数次数.
2.建立测试脚本:
--//建立脚本cz.txt
exec dbms_application_info.set_client_info(0);
set term off
exec :x := &&1;
select count(distinct f_id2) from (select id2,(select f(id2) from dual) as f_id2 from t where id2 in (&&2,:x ));
set term on
insert into t1 values (userenv('client_info') ,:x) ;
commit ;
--//建立shell脚本cz.sh:
#! /bin/bash
sqlplus -s -l scott/book <<EOF >> hz.txt
variable x number;
$(seq 500 | xargs -I{} echo @cz.txt {} $1)
quit
EOF
3.测试:
--//执行脚本cz.sh:
$ . cz.sh 4
SCOTT@book> select * from t1 where a<>2;
A B
---------- ----------
1 4
3 432
--//可以发现4,432存在冲突.函数调用了3次.
SCOTT@book> delete t1;
500 rows deleted.
SCOTT@book> commit ;
Commit complete.
--//验证1与那个值存在冲突.
$ . cz.sh 1
SCOTT@book> select * from t1 where a<>2;
A B
---------- ----------
3 484
1 1
--//可以验证1与484存在hash冲突.
4.再拿链接例子做测试:
--//链接:https://jonathanlewis.wordpress.com/2018/11/26/shrink-space-2/
SCOTT@book> update emp set dept_no=484 where dept_no=432;
1 row updated.
SCOTT@book> commit ;
Commit complete.
SCOTT@book> alter session set statistics_level = all;
Session altered.
select
/*+ gather_plan_statistics post-shrink */
count(*)
from (
select /*+ no_merge */
outer.*
from emp outer
where outer.sal >
(
select /*+ no_unnest */ avg(inner.sal)
from emp inner
where inner.dept_no = outer.dept_no
)
)
;
COUNT(*)
----------
9498
SCOTT@book> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID gx7xb7rhfd2zf, child number 0
-------------------------------------
select /*+ gather_plan_statistics post-shrink */
count(*) from ( select /*+ no_merge */
outer.* from emp outer where outer.sal >
( select /*+ no_unnest */ avg(inner.sal)
from emp inner where
inner.dept_no = outer.dept_no ) )
Plan hash value: 322796046
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 569 (100)| | 1 |00:00:03.43 | 783K|
| 1 | SORT AGGREGATE | | 1 | 1 | | | | 1 |00:00:03.43 | 783K|
| 2 | VIEW | | 1 | 143 | | 569 (1)| 00:00:07 | 9498 |00:00:03.42 | 783K|
|* 3 | FILTER | | 1 | | | | | 9498 |00:00:03.42 | 783K|
| 4 | TABLE ACCESS FULL | EMP | 1 | 20001 | 156K| 71 (0)| 00:00:01 | 19001 |00:00:00.01 | 247 |
| 5 | SORT AGGREGATE | | 3173 | 1 | 8 | | | 3173 |00:00:03.41 | 783K|
|* 6 | TABLE ACCESS FULL| EMP | 3173 | 2857 | 22856 | 71 (0)| 00:00:01 | 10M|00:00:02.71 | 783K|
------------------------------------------------------------------------------------------------------------------------
--//循环3173.
SCOTT@book> select dept_no,count(*) from emp group by dept_no order by 1;
DEPT_NO COUNT(*)
---------- ----------
0 3167
1 3167
2 3167
3 3166
4 3166
5 3167
484 1
7 rows selected.
--//dept_no=1出现hash冲突.
--//dept_no=484 循环1次
--//dept_no=0 循环1次
--//dept_no=1 循环3167次
--//dept_no=2 循环1次
--//dept_no=3 循环1次
--//dept_no=4 循环1次
--//dept_no=5 循环1次
--//这样累加: 1+1+3167+1+1+1+1 = 3173,这样就相互验证了.
4.我上面的测试纯粹是蛮力测试,改写为PL/SQL脚本看看,PL/sql确实不熟练....
SCOTT@book> create table t2 ( a number ,b number,c number);
Table created.
--//字段a 记录调用函数次数.
--//脚本cy.txt
declare
x number;
begin
for i in 1..10000 loop
dbms_application_info.set_client_info(0);
select count(distinct f_id2) into x from (select id2,(select f(id2) from dual) as f_id2 from t where id2 in (i, &&1 ) );
if ( userenv('client_info') =3 ) then
insert into t2 values (userenv('client_info') ,i,&&1) ;
commit ;
exit;
END IF;
end loop;
end;
/
--//我加入发现后exit(退出).你可以注解或者取消,这样测试1..10000之间的hash buckets冲突值.
--//执行如下:
@ cy.txt 4
@ cy.txt 1
@ cy.txt 3
@ cy.txt 18
@ cy.txt 48
@ cy.txt 75
SCOTT@book> select * from t2;
A B C
---------- ---------- ----------
3 432 4
3 484 1
3 735 3
3 2071 18
3 75 48
3 48 75
6 rows selected.
--//这样就很快知道那些值会发生hash冲突了.
--//不知道那位还有什么更好的方法...
[20181130]如何猜测那些值存在hash冲突.txt的更多相关文章
- [20181130]hash冲突导致查询缓慢.txt
[20181130]hash冲突导致查询缓慢.txt --//昨天看了链接https://jonathanlewis.wordpress.com/2018/11/26/shrink-space-2/, ...
- 没想到 Hash 冲突还能这么玩,你的服务中招了吗?
背景 其实这个问题我之前也看到过,刚好在前几天,洪教授在某个群里分享的一个<一些有意思的攻击手段.pdf>,我觉得这个话题还是有不少人不清楚的,今天我就准备来“实战”一把,还请各位看官轻拍 ...
- hash冲突随笔
一:hash表 也叫散列表,以key-value的形式存储数据,就是将需要存储的关键码值通过hash函数映射到表中的位置,可加快访问速度. 二:hash冲突 如果两个相同的关键码值通过hash函数映射 ...
- 解决hash冲突之分离链接法
解决hash冲突之分离链接法 分离链接法:其做法就是将散列到同一个值的所有元素保存到一个表中. 这样讲可能比较抽象,下面看一个图就会很清楚,图如下 相应的实现可以用分离链接散列表来实现(其实就是一个l ...
- HashMap的hash冲突解决方案
Hash函数 非哈希表的特点:关键字在表中的位置和它之间不存在一个确定的关系,查找的过程为给定值一次和各个关键字进行比较,查找的效率取决于和给定值进行比较的次数. 哈希表的特点:关键字在表中位置和它之 ...
- hash 冲突及解决办法。
hash 冲突及解决办法. 关键字值不同的元素可能会映象到哈希表的同一地址上就会发生哈希冲突.解决办法: 1)开放定址法:当冲突发生时,使用某种探查(亦称探测)技术在散列表中形成一个探查(测)序列.沿 ...
- hash冲突解决和javahash冲突解决
其实就是四种方法的演变 1.开放定址法 具体就是把数据的标志等的对长度取模 有三种不同的取模 线性探测再散列 给数据的标志加增量,取模 平方探测再散列 给数据的标志平方,取模 随机探测再散列 把数据的 ...
- [转]hash冲突的四种办法
原文地址:http://blog.csdn.net/qq_27093465/article/details/52269862 一)哈希表简介 非哈希表的特点:关键字在表中的位置和它之间不存在一个确定的 ...
- Hash冲突的几种解决方法
1. 开放定值法: 也叫再散列法,当关键字key的哈希地址p=H(key)出现冲突时,以p为基础,产生另一个哈希地址p1,如果p1仍然冲突,再以p为基础,产生另一个哈希地址p2,…,直到找出一个不冲突 ...
随机推荐
- 前端测试框架Jest系列教程 -- Expect(验证)
写在前面 在编写测试时,我们通常需要检查值是否满足某些条件,Jest中提供的expect允许你访问很多“Matchers”,这些“匹配器”允许您验证不同的东西. Expect 可以验证什么 Jest中 ...
- MFC应用技术之CTreeControl的使用
MFC应用技术之CTreeControl的使用 一丶MFC添加树控件.添加父节点跟子节点. MFC上面放一个树控件.并未这个树控件绑定变量.然后添加一个按钮.按钮的作用就是添加父节点跟子节点. PS: ...
- shiro源码篇 - shiro的filter,你值得拥有
前言 开心一刻 已经报废了一年多的电脑,今天特么突然开机了,吓老子一跳,只见电脑管家缓缓地出来了,本次开机一共用时一年零六个月,打败了全国0%的电脑,电脑管家已经对您的电脑失去信心,然后它把自己卸载了 ...
- Android Navigation 架构组件入门教程
Android Navigation 架构组件入门教程 版权声明:本文为博主原创文章,未经博主允许不得转载. 转载请表明出处:https://www.cnblogs.com/cavalier-/p/1 ...
- #2 Python面向对象(一)
前言 对于萌新来说,面向对象,这是一个很陌生的词语.如果你接触过Linux,你一定知道“一切皆文件”,也就是说,在Linux中所有都是文件,鼠标是文件.键盘是文件.目录是文件等等一切都是文件:Pyth ...
- [转]web串口调试助手,浏览器控制串口设备
本文转自:https://blog.csdn.net/ldevs/article/details/39664697 打开串口时查找可用串口供选择 通过javascript调用activex控制串口收发 ...
- 关于EF中出现FOREIGNKEY约束可能会导致循环或多重级联路径的问题
ef中,我们创建外键的时候需要注意,否则会出现标题所示问题. 例:有项目表,项目收藏表,用户表 项目表有如下字段:ProjectId,InputPersonId等 项目收藏表有如下字段:Project ...
- ASP.NET MVC 学习笔记-7.自定义配置信息(后续)
自定义配置信息的高级应用 通过上篇博文对简单的自定义配置信息的学习,使得更加灵活的控制系统配置信息.实际项目中,这种配置的灵活度往往无法满足项目的灵活度和扩展性. 比如,一个配置信息有三部分组成,而每 ...
- [android] 插入一条记录到系统短信应用里
谷歌市场上有这些应用,模拟短信,原理就是把数据插入到短信应用的数据库里 获取ContentResolver对象,通过getContentResolver()方法 调用resolver对象的insert ...
- oracle中rownum的使用
rownum是系统的一个关键字,表示行号,是系统自动分配的,第一条符合要求的数据行号就是1,第二条符合要求的数据行号就是2. Rownum 不能直接使用 例:取前多少条数据: 取中间的一些数据: se ...