Hash table lengths and prime numbers

Website:http://srinvis.blogspot.ca/2006/07/hash-table-lengths-and-prime-numbers.html

This has been bugging me for some time now...

The first thing you do when inserting/retreiving from hash table is to calculate the hashCode for the given key and then find the correct bucket by trimming the hashCode to the size of the hashTable by doing hashCode % table_length. Here are 2 statements that
you most probably have read somewhere

If you use a power of 2 for table_length, finding (hashCode(key) % 2^n ) is as simple and quick as (hashCode(key) & (2^n -1)). But if your function to calculate hashCode for a given key isn't good,
you will definitely suffer from clustering of many keys in a few hash buckets.
But if you use prime numbers for table_length, hashCodes calculated could map into the different hash buckets even if you have a slightly stupid hashCode function.

Ok, but now can someone tell me why it is so and give me the proof for the above statements...I couldn't find much on google. Most reasons given are again just statements and I cannot accept statements (And there are some stupid proofs also on the net, so beware).
And don't even try telling me that the proof is experimental.

Today i have finally been able to get rid of this thought from my head...below is the proof I came up with (Hope it doesn't fall into the stupid category, it doesn't seem atleast for now, if u think otherwise put in a comment, atleast for the sake of others).
I suggest you think about the solution on your own before reading further...

If suppose your hashCode function results in the following hashCodes among others {x , 2x, 3x, 4x, 5x, 6x...}, then all these are going to be clustered in just m number of buckets, where m = table_length/GreatestCommonFactor(table_length, x). (It is trivial
to verify/derive this). Now you can do one of the following to avoid clustering

Make sure that you don't generate too many hashCodes that are multiples of another hashCode like in {x, 2x, 3x, 4x, 5x, 6x...}.But this may be kind of difficult if your hashTable is supposed to have millions
of entries.
Or simply make m equal to the table_length by making GreatestCommonFactor(table_length, x) equal to 1, i.e by making table_length coprime with x. And if x can be just about any number then make sure that table_length
is a prime number.

If you need some more kick from hash maps take a look at

Doug lea's highly performant ConcurrentHashMap http://www-128.ibm.com/developerworks/java/library/j-jtp08223/

Google's space/time efficient hashmap implementations http://goog-sparsehash.sourceforge.net/doc/implementation.html

Hash table lengths and prime numbers的更多相关文章

Hash table: why size should be prime?
Question: Possible Duplicate:Why should hash functions use a prime number modulus? Why is it necessa ...
Hash Map (Hash Table)
Reference: Wiki PrincetonAlgorithm What is Hash Table Hash table (hash map) is a data structure use ...
[转载] 散列表(Hash Table)从理论到实用（上）
转载自:白话算法(6) 散列表(Hash Table)从理论到实用(上) 处理实际问题的一般数学方法是,首先提炼出问题的本质元素,然后把它看作一个比现实无限宽广的可能性系统,这个系统中的实质关系可以通 ...
[转载] 散列表(Hash Table)从理论到实用（中）
转载自:白话算法(6) 散列表(Hash Table)从理论到实用(中) 不用链接法,还有别的方法能处理碰撞吗?扪心自问,我不敢问这个问题.链接法如此的自然.直接,以至于我不敢相信还有别的(甚至是更好 ...
[转载] 散列表(Hash Table) 从理论到实用（下）
转载自: 白话算法(6) 散列表(Hash Table) 从理论到实用(下) [澈丹,我想要个钻戒.][小北,等等吧,等我再修行两年,你把我烧了,舍利子比钻戒值钱.] ——自扯自蛋无论开发一个程序还 ...
algorithm@ Sieve of Eratosthenes (素数筛选算法) & Related Problem (Return two prime numbers )
Sieve of Eratosthenes (素数筛选算法) Given a number n, print all primes smaller than or equal to n. It is ...
POJ2739 Sum of Consecutive Prime Numbers(尺取法）
POJ2739 Sum of Consecutive Prime Numbers 题目大意:给出一个整数,如果有一段连续的素数之和等于该数,即满足要求,求出这种连续的素数的个数水题:艾氏筛法打表+尺 ...
[LeetCode] 1. Two Sum_Easy tag: Hash Table
Given an array of integers, return indices of the two numbers such that they add up to a specific ta ...
[译]C语言实现一个简易的Hash table(1)
说明 Hash table翻译过来就是Hash表,是一种提供了类似于关联数组的数据结构,可以通过key执行搜索.插入和删除操作.Hash表由一些列桶(buckets)组成,而每一个bucket都是由k ...

随机推荐

JDBC创建链接的几种方式
首先,使用java程序访问数据库的前提数据库的主机地址(ip地址) 端口数据库用户名数据库用户密码连接的数据库代码: private static String url = "jd ...
【WPF】实现类似QQ聊天消息的界面
最近公司有个项目,是要求实现类似 QQ 聊天这种功能的. 如下图这没啥难的,稍微复杂的也就表情的解析而已. 表情在传输过程中的实现参考了新浪微博,采用半角中括号代表表情的方式.例如:“abc[dog ...
宽字符————_T、_TEXT、L、TEXT之间的区别
_T._TEXT.L.TEXT之间的区别在分析前先对三者做一个简单的分类 _T._TEXT.TEXT三者都是根据编译器的环境进行ANSI/UNICODE变换的,_T和_TEXT是根据_UNICODE ...
The MAC is invalid
在使用laravel框架进行网站开发时,我们会使用laravel的Crypt类对用户的密码进行加密来达到信息加密的目的,Crypt类会对数据加密时会依赖APP_KEY,所以当更换了APP_KEY时,再 ...
bootstrap fileinput 使用记录
第一次使用bootstrap fileinput碰到了许多坑,做下记录需求本次使用bootstrap fileinput文件上传组件,主要用来上传和预览图片.作为一个后台管理功能,为某个表的某个字 ...
09-部署配置kubedns插件
安装和配置 kubedns 插件官方的yaml文件在:kubernetes/cluster/addons/dns. 该插件直接使用kubernetes部署,官方的配置文件中包含以下镜像: gcr.i ...
Python函数学习——作用域与嵌套函数
全局与局部变量在函数中定义的变量称为局部变量,在程序的一开始定义的变量称为全局变量. 全局变量作用域是整个程序,局部变量作用域是定义该变量的函数. 当全局变量与局部变量同名时,在定义局部变量的函数内 ...
常用的 git 命令清单
(来自阮一峰的网络日志,看别人对git命令掌握的如此熟练,羡慕,但每次又记不得,无奈.供自己学习) git工作区,暂存区,版本库之间的关系: 我们建立的项目文件夹就是工作区,在初始化git(git i ...
hdu 6127---Hard challenge（思维）
题目链接 Problem Description There are n points on the plane, and the ith points has a value vali, and i ...
Xamarin.Android 使用 SQLite 出现 Couldn't read row 0, col -1 from CursorWindow. 异常
异常:Java.Lang.IllegalStateException: Couldn't read row 0, col -1 from CursorWindow. Make sure the Cu ...

Hash table lengths and prime numbers

Hash table lengths and prime numbers的更多相关文章

随机推荐

热门专题