Hash table lengths and prime numbers
Website:http://srinvis.blogspot.ca/2006/07/hash-table-lengths-and-prime-numbers.html
This has been bugging me for some time now...
The first thing you do when inserting/retreiving from hash table is to calculate the hashCode for the given key and then find the correct bucket by trimming the hashCode to the size of the hashTable by doing hashCode % table_length. Here are 2 statements that
you most probably have read somewhere
- If you use a power of 2 for table_length, finding (hashCode(key) % 2^n ) is as simple and quick as (hashCode(key) & (2^n -1)). But if your function to calculate hashCode for a given key isn't good,
you will definitely suffer from clustering of many keys in a few hash buckets. - But if you use prime numbers for table_length, hashCodes calculated could map into the different hash buckets even if you have a slightly stupid hashCode function.
Ok, but now can someone tell me why it is so and give me the proof for the above statements...I couldn't find much on google. Most reasons given are again just statements and I cannot accept statements (And there are some stupid proofs also on the net, so beware).
And don't even try telling me that the proof is experimental.
Today i have finally been able to get rid of this thought from my head...below is the proof I came up with (Hope it doesn't fall into the stupid category, it doesn't seem atleast for now, if u think otherwise put in a comment, atleast for the sake of others).
I suggest you think about the solution on your own before reading further...
If suppose your hashCode function results in the following hashCodes among others {x , 2x, 3x, 4x, 5x, 6x...}, then all these are going to be clustered in just m number of buckets, where m = table_length/GreatestCommonFactor(table_length, x). (It is trivial
to verify/derive this). Now you can do one of the following to avoid clustering
- Make sure that you don't generate too many hashCodes that are multiples of another hashCode like in {x, 2x, 3x, 4x, 5x, 6x...}.But this may be kind of difficult if your hashTable is supposed to have millions
of entries. - Or simply make m equal to the table_length by making GreatestCommonFactor(table_length, x) equal to 1, i.e by making table_length coprime with x. And if x can be just about any number then make sure that table_length
is a prime number.
If you need some more kick from hash maps take a look at
Doug lea's highly performant ConcurrentHashMap http://www-128.ibm.com/developerworks/java/library/j-jtp08223/
Google's space/time efficient hashmap implementations http://goog-sparsehash.sourceforge.net/doc/implementation.html
Hash table lengths and prime numbers的更多相关文章
- Hash table: why size should be prime?
Question: Possible Duplicate:Why should hash functions use a prime number modulus? Why is it necessa ...
- Hash Map (Hash Table)
Reference: Wiki PrincetonAlgorithm What is Hash Table Hash table (hash map) is a data structure use ...
- [转载] 散列表(Hash Table)从理论到实用(上)
转载自:白话算法(6) 散列表(Hash Table)从理论到实用(上) 处理实际问题的一般数学方法是,首先提炼出问题的本质元素,然后把它看作一个比现实无限宽广的可能性系统,这个系统中的实质关系可以通 ...
- [转载] 散列表(Hash Table)从理论到实用(中)
转载自:白话算法(6) 散列表(Hash Table)从理论到实用(中) 不用链接法,还有别的方法能处理碰撞吗?扪心自问,我不敢问这个问题.链接法如此的自然.直接,以至于我不敢相信还有别的(甚至是更好 ...
- [转载] 散列表(Hash Table) 从理论到实用(下)
转载自: 白话算法(6) 散列表(Hash Table) 从理论到实用(下) [澈丹,我想要个钻戒.][小北,等等吧,等我再修行两年,你把我烧了,舍利子比钻戒值钱.] ——自扯自蛋 无论开发一个程序还 ...
- algorithm@ Sieve of Eratosthenes (素数筛选算法) & Related Problem (Return two prime numbers )
Sieve of Eratosthenes (素数筛选算法) Given a number n, print all primes smaller than or equal to n. It is ...
- POJ2739 Sum of Consecutive Prime Numbers(尺取法)
POJ2739 Sum of Consecutive Prime Numbers 题目大意:给出一个整数,如果有一段连续的素数之和等于该数,即满足要求,求出这种连续的素数的个数 水题:艾氏筛法打表+尺 ...
- [LeetCode] 1. Two Sum_Easy tag: Hash Table
Given an array of integers, return indices of the two numbers such that they add up to a specific ta ...
- [译]C语言实现一个简易的Hash table(1)
说明 Hash table翻译过来就是Hash表,是一种提供了类似于关联数组的数据结构,可以通过key执行搜索.插入和删除操作.Hash表由一些列桶(buckets)组成,而每一个bucket都是由k ...
随机推荐
- 再谈控制 cxGrid 的行列颜色
1. [转]CxGrid 改变某行或单元格的颜色 (2016-01-19 09:37:19) 转载▼ 标签: it delphi 分类: Delphi 一个表(T)的结构结构如下. ID Test 1 ...
- 可在广域网部署运行的即时通讯系统 -- GGTalk总览(附源码下载)
(最新版本:V6.2,2019.01.03 .Xamarin移动端版本已经推出,包括 Android 和 iOS) GGTalk开源即时通讯系统(简称GG)是QQ的高仿版,同时支持局域网和广域网, ...
- ASP.NET MVC下使用AngularJs语言(一):Hello your name
新春节后,分享第一个教程. 是教一位新朋友全新学习ASP.NET MVC下使用AngularJs语言. 一,新建一个空的Web项目.使用NuGet下载AngularJs和jQuery.二,配置Bund ...
- zookeeper日志级别
查看源代码得知zookeeper(版本3.4.13)内部的日志用的slf4j,项目启动zk连接了之后一直在打debug日志(如下所示),甚是讨厌,logback日志级别调成info没用. 17:24: ...
- javascript知识整理之this
js中的this是一个头疼的问题,尤其对于笔者这种初级的菜鸟来讲,下面梳理下this的知识,可以当做是初级进阶也好入门也罢,总归输出的才是自己掌握的: Js中this不是由词法作用域决定的 而是调用时 ...
- JavaScript对象编程-第3章
目录 Date对象 Math对象 数组对象 字符串对象 正则表达式对象 什么是对象 对象拥有属性和方法,属性各种数据类型,方法对属性中的数据进行操作. JavaScript的对象 内置对象 Date. ...
- Linux - 查看命令所属的软件包
这里以查看netstat命令所属的软件包为例. CentOS:利用yum provides命令 netstat命令所属的软件包为net-tools [root@CentOS7 ~]# yum prov ...
- Ubuntu 18.0.4安装docker
第一步:如果之前安装过docker,执行下面命令删除 apt-get remove docker docker-engine docker.io 删除后执行sudo apt-get update更新软 ...
- 算法手记(2)Dijkstra双栈算术表达式求值算法
这两天看到的内容是关于栈和队列,在栈的模块发现了Dijkstra双栈算术表达式求值算法,可以用来实现计算器类型的app. 编程语言系统一般都内置了对算术表达式的处理,但是他们是如何在内部实现的呢?为了 ...
- iOS-实现后台长时间运行
前言 一般APP在按下Home键被挂起后,这时APP的 backgroundTimeRemaining 也就是后台运行时间大约只有3分钟,如果在退出APP后,过十几二十二分钟或者更长时间再回到APP, ...