Hash table lengths and prime numbers
Website:http://srinvis.blogspot.ca/2006/07/hash-table-lengths-and-prime-numbers.html
This has been bugging me for some time now...
The first thing you do when inserting/retreiving from hash table is to calculate the hashCode for the given key and then find the correct bucket by trimming the hashCode to the size of the hashTable by doing hashCode % table_length. Here are 2 statements that
you most probably have read somewhere
- If you use a power of 2 for table_length, finding (hashCode(key) % 2^n ) is as simple and quick as (hashCode(key) & (2^n -1)). But if your function to calculate hashCode for a given key isn't good,
you will definitely suffer from clustering of many keys in a few hash buckets. - But if you use prime numbers for table_length, hashCodes calculated could map into the different hash buckets even if you have a slightly stupid hashCode function.
Ok, but now can someone tell me why it is so and give me the proof for the above statements...I couldn't find much on google. Most reasons given are again just statements and I cannot accept statements (And there are some stupid proofs also on the net, so beware).
And don't even try telling me that the proof is experimental.
Today i have finally been able to get rid of this thought from my head...below is the proof I came up with (Hope it doesn't fall into the stupid category, it doesn't seem atleast for now, if u think otherwise put in a comment, atleast for the sake of others).
I suggest you think about the solution on your own before reading further...
If suppose your hashCode function results in the following hashCodes among others {x , 2x, 3x, 4x, 5x, 6x...}, then all these are going to be clustered in just m number of buckets, where m = table_length/GreatestCommonFactor(table_length, x). (It is trivial
to verify/derive this). Now you can do one of the following to avoid clustering
- Make sure that you don't generate too many hashCodes that are multiples of another hashCode like in {x, 2x, 3x, 4x, 5x, 6x...}.But this may be kind of difficult if your hashTable is supposed to have millions
of entries. - Or simply make m equal to the table_length by making GreatestCommonFactor(table_length, x) equal to 1, i.e by making table_length coprime with x. And if x can be just about any number then make sure that table_length
is a prime number.
If you need some more kick from hash maps take a look at
Doug lea's highly performant ConcurrentHashMap http://www-128.ibm.com/developerworks/java/library/j-jtp08223/
Google's space/time efficient hashmap implementations http://goog-sparsehash.sourceforge.net/doc/implementation.html
Hash table lengths and prime numbers的更多相关文章
- Hash table: why size should be prime?
Question: Possible Duplicate:Why should hash functions use a prime number modulus? Why is it necessa ...
- Hash Map (Hash Table)
Reference: Wiki PrincetonAlgorithm What is Hash Table Hash table (hash map) is a data structure use ...
- [转载] 散列表(Hash Table)从理论到实用(上)
转载自:白话算法(6) 散列表(Hash Table)从理论到实用(上) 处理实际问题的一般数学方法是,首先提炼出问题的本质元素,然后把它看作一个比现实无限宽广的可能性系统,这个系统中的实质关系可以通 ...
- [转载] 散列表(Hash Table)从理论到实用(中)
转载自:白话算法(6) 散列表(Hash Table)从理论到实用(中) 不用链接法,还有别的方法能处理碰撞吗?扪心自问,我不敢问这个问题.链接法如此的自然.直接,以至于我不敢相信还有别的(甚至是更好 ...
- [转载] 散列表(Hash Table) 从理论到实用(下)
转载自: 白话算法(6) 散列表(Hash Table) 从理论到实用(下) [澈丹,我想要个钻戒.][小北,等等吧,等我再修行两年,你把我烧了,舍利子比钻戒值钱.] ——自扯自蛋 无论开发一个程序还 ...
- algorithm@ Sieve of Eratosthenes (素数筛选算法) & Related Problem (Return two prime numbers )
Sieve of Eratosthenes (素数筛选算法) Given a number n, print all primes smaller than or equal to n. It is ...
- POJ2739 Sum of Consecutive Prime Numbers(尺取法)
POJ2739 Sum of Consecutive Prime Numbers 题目大意:给出一个整数,如果有一段连续的素数之和等于该数,即满足要求,求出这种连续的素数的个数 水题:艾氏筛法打表+尺 ...
- [LeetCode] 1. Two Sum_Easy tag: Hash Table
Given an array of integers, return indices of the two numbers such that they add up to a specific ta ...
- [译]C语言实现一个简易的Hash table(1)
说明 Hash table翻译过来就是Hash表,是一种提供了类似于关联数组的数据结构,可以通过key执行搜索.插入和删除操作.Hash表由一些列桶(buckets)组成,而每一个bucket都是由k ...
随机推荐
- [Solution] JZOJ3470 最短路
[Solution] JZOJ3470 最短路 题面 Description 给定一个n个点m条边的有向图,有k个标记点,要求从规定的起点按任意顺序经过所有标记点到达规定的终点,问最短的距离是多少. ...
- Docker 启动不了容器的问题
今天在运行 docker 的时候,就是执行 docker exec 命令的时候,发现一直报错.具体的报错信息如下: Error response from daemon: Container XXX ...
- MFS故障测试及维护总结
一.测试环境说明: 10.2.2.230 mfsmaster VIP:10.2.2.130 10.2.2.231 mfsbackup 10.2.2.253 mfsdata01 10.2.2.2 ...
- PMP:3.项目经理角色
成员角色:整合指挥者 在团队中的职责:负终责 知识技能:综合技能&沟通 定义: 职能经理专注于对某个职能领域或业务部门的管理监督. 运营经理负责保证业务运营的高效性. 项目经理是由执行组织 ...
- Rabbit RPC 代码阅读(一)
前言 因为想对RPC内部的机制作一个了解,特作以下阅读代码日志,以备忘. RPC介绍 Rabbit RPC 原理可以用3点概括: 1.服务端启动并且向注册中心发送服务信息,注册中心收到后会定时监控服务 ...
- AndroidStudio制作欢迎界面与应用图标
前言 大家好,给大家带来AndroidStudio制作欢迎界面与应用图标的概述,希望你们喜欢 欢迎界面与应用图标 本项目使用Android Studio 3.0.1作为开发工具 activity_sp ...
- ubuntu16.04 编译出错:fatal error: SDL/SDL.h: No such file or directory
在ubuntu 16.04编译神经网络代码时候,遇到了这样一种错误? fatal error: SDL/SDL.h: No such file or directory 原因是SDL库没有安装,根据你 ...
- numpy中array和asarray的区别
array和asarray都可以将结构数据转化为ndarray,但是主要区别就是当数据源是ndarray时,array仍然会copy出一个副本,占用新的内存,但asarray不会. 举例说明: imp ...
- python的数据驱动
什么叫数据驱动? 登录用例 ->不用的用户名登录,但是自动化化脚本一样,虽然脚本相同,步骤相同,但是不同的用户名登录得出的数据是不一样的,于是就有了数据驱动,就是数据的改变驱动自动化测试的执行导 ...
- vertical-align_CSS参考手册_web前端开发参考手册系列
该属性定义行内元素的基线相对于该元素所在行的基线的垂直对齐.允许指定负长度值和百分比值.这会使元素降低而不是升高.在表单元格中,这个属性会设置单元格框中的单元格内容的对齐方式. <!DOCTYP ...