Hash table lengths and prime numbers
Website:http://srinvis.blogspot.ca/2006/07/hash-table-lengths-and-prime-numbers.html
This has been bugging me for some time now...
The first thing you do when inserting/retreiving from hash table is to calculate the hashCode for the given key and then find the correct bucket by trimming the hashCode to the size of the hashTable by doing hashCode % table_length. Here are 2 statements that
you most probably have read somewhere
- If you use a power of 2 for table_length, finding (hashCode(key) % 2^n ) is as simple and quick as (hashCode(key) & (2^n -1)). But if your function to calculate hashCode for a given key isn't good,
you will definitely suffer from clustering of many keys in a few hash buckets. - But if you use prime numbers for table_length, hashCodes calculated could map into the different hash buckets even if you have a slightly stupid hashCode function.
Ok, but now can someone tell me why it is so and give me the proof for the above statements...I couldn't find much on google. Most reasons given are again just statements and I cannot accept statements (And there are some stupid proofs also on the net, so beware).
And don't even try telling me that the proof is experimental.
Today i have finally been able to get rid of this thought from my head...below is the proof I came up with (Hope it doesn't fall into the stupid category, it doesn't seem atleast for now, if u think otherwise put in a comment, atleast for the sake of others).
I suggest you think about the solution on your own before reading further...
If suppose your hashCode function results in the following hashCodes among others {x , 2x, 3x, 4x, 5x, 6x...}, then all these are going to be clustered in just m number of buckets, where m = table_length/GreatestCommonFactor(table_length, x). (It is trivial
to verify/derive this). Now you can do one of the following to avoid clustering
- Make sure that you don't generate too many hashCodes that are multiples of another hashCode like in {x, 2x, 3x, 4x, 5x, 6x...}.But this may be kind of difficult if your hashTable is supposed to have millions
of entries. - Or simply make m equal to the table_length by making GreatestCommonFactor(table_length, x) equal to 1, i.e by making table_length coprime with x. And if x can be just about any number then make sure that table_length
is a prime number.
If you need some more kick from hash maps take a look at
Doug lea's highly performant ConcurrentHashMap http://www-128.ibm.com/developerworks/java/library/j-jtp08223/
Google's space/time efficient hashmap implementations http://goog-sparsehash.sourceforge.net/doc/implementation.html
Hash table lengths and prime numbers的更多相关文章
- Hash table: why size should be prime?
Question: Possible Duplicate:Why should hash functions use a prime number modulus? Why is it necessa ...
- Hash Map (Hash Table)
Reference: Wiki PrincetonAlgorithm What is Hash Table Hash table (hash map) is a data structure use ...
- [转载] 散列表(Hash Table)从理论到实用(上)
转载自:白话算法(6) 散列表(Hash Table)从理论到实用(上) 处理实际问题的一般数学方法是,首先提炼出问题的本质元素,然后把它看作一个比现实无限宽广的可能性系统,这个系统中的实质关系可以通 ...
- [转载] 散列表(Hash Table)从理论到实用(中)
转载自:白话算法(6) 散列表(Hash Table)从理论到实用(中) 不用链接法,还有别的方法能处理碰撞吗?扪心自问,我不敢问这个问题.链接法如此的自然.直接,以至于我不敢相信还有别的(甚至是更好 ...
- [转载] 散列表(Hash Table) 从理论到实用(下)
转载自: 白话算法(6) 散列表(Hash Table) 从理论到实用(下) [澈丹,我想要个钻戒.][小北,等等吧,等我再修行两年,你把我烧了,舍利子比钻戒值钱.] ——自扯自蛋 无论开发一个程序还 ...
- algorithm@ Sieve of Eratosthenes (素数筛选算法) & Related Problem (Return two prime numbers )
Sieve of Eratosthenes (素数筛选算法) Given a number n, print all primes smaller than or equal to n. It is ...
- POJ2739 Sum of Consecutive Prime Numbers(尺取法)
POJ2739 Sum of Consecutive Prime Numbers 题目大意:给出一个整数,如果有一段连续的素数之和等于该数,即满足要求,求出这种连续的素数的个数 水题:艾氏筛法打表+尺 ...
- [LeetCode] 1. Two Sum_Easy tag: Hash Table
Given an array of integers, return indices of the two numbers such that they add up to a specific ta ...
- [译]C语言实现一个简易的Hash table(1)
说明 Hash table翻译过来就是Hash表,是一种提供了类似于关联数组的数据结构,可以通过key执行搜索.插入和删除操作.Hash表由一些列桶(buckets)组成,而每一个bucket都是由k ...
随机推荐
- spring boot2 集成Redis
1. 引入依赖 <parent> <groupId>org.springframework.boot</groupId> <artifactId>spr ...
- winSCP无法连接虚拟机Linux解决
刚在虚拟机上装上Linux(Centos7)后使用winSCP建立文件共享发现连接超时,经过几个小时的查找发现Linux中没有eth0文件,这说明其网卡名不是eth0,在网上查过一些解决办法有的通过修 ...
- 记一次安装VS2015后启动失败的修复过程
安装过程没有提示任何问题,然而启动vs时提示没有安装 .Net Framework 4.6,那就安装吧,但是安装 4.6 时却提示 Windows Moudle Installer 服务没有启动,于是 ...
- House of Spirit学习调试验证与实践
作家:Bug制造机 原文来自:House of Spirit学习调试验证与实践 House of Spirit和其他的堆的利用手段有所不同.它是将存在的指针改写指向我们伪造的块(这个块可以位于堆.栈. ...
- JavaScript实现单张图片上传功能
前台jsp代码 <%@ page language="java" pageEncoding="UTF-8" contentType="text/ ...
- Android开发工程师文集-layout_weight讲解
前言 大家好,给大家带来Android开发工程师文集-layout_weight讲解的概述,希望你们喜欢 Layout_weight的相关代码展示 <TextView android:layou ...
- 在notepad++中使用正则匹配功能(一-龥!-~) 中文[利刃篇]
用正则时间越久,人就越懒,就越知道正则的强大.正则,不只是在代码里用到,在字符查找是也会用到,学会适当使用正则,将会使你的工作事办功倍!但是,中文却是一个砍,不容易过. 于是在用notepad++,也 ...
- python多线程获取子线程任务返回值
今天想实现多线程更新资产信息,所以使用到了threading,但是我需要每个线程的返回值,这就需要我在threading.Thread的基础上进行封装 def auto_asset(node): re ...
- Thread-方法以及wait、notify简介
Thread.sleep()1.静态方法是定义在Thread类中.2.Thread.sleep()方法用来暂停当前执行的线程,将CPU使用权释放给线程调度器,但不释放锁(也就是说如果有synchron ...
- css回归测试工具:backstopjs
最近在看公开课,一位老师讲了一个自动化的工具,backstopjs,可以自动的对比UI出的图与前端写好的图,不一致的地方会标出,挺好用的,但是写的过程中也会遇到一些问题,现在写出来,记录一下 首先,要 ...