Hash table lengths and prime numbers
Website:http://srinvis.blogspot.ca/2006/07/hash-table-lengths-and-prime-numbers.html
This has been bugging me for some time now...
The first thing you do when inserting/retreiving from hash table is to calculate the hashCode for the given key and then find the correct bucket by trimming the hashCode to the size of the hashTable by doing hashCode % table_length. Here are 2 statements that
you most probably have read somewhere
- If you use a power of 2 for table_length, finding (hashCode(key) % 2^n ) is as simple and quick as (hashCode(key) & (2^n -1)). But if your function to calculate hashCode for a given key isn't good,
you will definitely suffer from clustering of many keys in a few hash buckets. - But if you use prime numbers for table_length, hashCodes calculated could map into the different hash buckets even if you have a slightly stupid hashCode function.
Ok, but now can someone tell me why it is so and give me the proof for the above statements...I couldn't find much on google. Most reasons given are again just statements and I cannot accept statements (And there are some stupid proofs also on the net, so beware).
And don't even try telling me that the proof is experimental.
Today i have finally been able to get rid of this thought from my head...below is the proof I came up with (Hope it doesn't fall into the stupid category, it doesn't seem atleast for now, if u think otherwise put in a comment, atleast for the sake of others).
I suggest you think about the solution on your own before reading further...
If suppose your hashCode function results in the following hashCodes among others {x , 2x, 3x, 4x, 5x, 6x...}, then all these are going to be clustered in just m number of buckets, where m = table_length/GreatestCommonFactor(table_length, x). (It is trivial
to verify/derive this). Now you can do one of the following to avoid clustering
- Make sure that you don't generate too many hashCodes that are multiples of another hashCode like in {x, 2x, 3x, 4x, 5x, 6x...}.But this may be kind of difficult if your hashTable is supposed to have millions
of entries. - Or simply make m equal to the table_length by making GreatestCommonFactor(table_length, x) equal to 1, i.e by making table_length coprime with x. And if x can be just about any number then make sure that table_length
is a prime number.
If you need some more kick from hash maps take a look at
Doug lea's highly performant ConcurrentHashMap http://www-128.ibm.com/developerworks/java/library/j-jtp08223/
Google's space/time efficient hashmap implementations http://goog-sparsehash.sourceforge.net/doc/implementation.html
Hash table lengths and prime numbers的更多相关文章
- Hash table: why size should be prime?
Question: Possible Duplicate:Why should hash functions use a prime number modulus? Why is it necessa ...
- Hash Map (Hash Table)
Reference: Wiki PrincetonAlgorithm What is Hash Table Hash table (hash map) is a data structure use ...
- [转载] 散列表(Hash Table)从理论到实用(上)
转载自:白话算法(6) 散列表(Hash Table)从理论到实用(上) 处理实际问题的一般数学方法是,首先提炼出问题的本质元素,然后把它看作一个比现实无限宽广的可能性系统,这个系统中的实质关系可以通 ...
- [转载] 散列表(Hash Table)从理论到实用(中)
转载自:白话算法(6) 散列表(Hash Table)从理论到实用(中) 不用链接法,还有别的方法能处理碰撞吗?扪心自问,我不敢问这个问题.链接法如此的自然.直接,以至于我不敢相信还有别的(甚至是更好 ...
- [转载] 散列表(Hash Table) 从理论到实用(下)
转载自: 白话算法(6) 散列表(Hash Table) 从理论到实用(下) [澈丹,我想要个钻戒.][小北,等等吧,等我再修行两年,你把我烧了,舍利子比钻戒值钱.] ——自扯自蛋 无论开发一个程序还 ...
- algorithm@ Sieve of Eratosthenes (素数筛选算法) & Related Problem (Return two prime numbers )
Sieve of Eratosthenes (素数筛选算法) Given a number n, print all primes smaller than or equal to n. It is ...
- POJ2739 Sum of Consecutive Prime Numbers(尺取法)
POJ2739 Sum of Consecutive Prime Numbers 题目大意:给出一个整数,如果有一段连续的素数之和等于该数,即满足要求,求出这种连续的素数的个数 水题:艾氏筛法打表+尺 ...
- [LeetCode] 1. Two Sum_Easy tag: Hash Table
Given an array of integers, return indices of the two numbers such that they add up to a specific ta ...
- [译]C语言实现一个简易的Hash table(1)
说明 Hash table翻译过来就是Hash表,是一种提供了类似于关联数组的数据结构,可以通过key执行搜索.插入和删除操作.Hash表由一些列桶(buckets)组成,而每一个bucket都是由k ...
随机推荐
- winSocket编程(十)完成端口
//本篇为转贴 本系列里完成端口的代码在两年前就已经写好了,但是由于许久没有写东西了,不知该如何提笔,所以这篇文档总是在酝酿之中……酝酿了两年之后,终于决定开始动笔了,但愿还不算晚….. 这篇文档我非 ...
- ModelAndView对象
ModelAndView属性中两个最重要的属性是model和view. view即视图,保存的是视图信息. model即模型,以<K,V>形式保存模型数据,上图可以看到是MdelMap类型 ...
- Codeforces828 D. High Load
D. High Load time limit per test 2 seconds memory limit per test 512 megabytes input standard input ...
- oracle基础函数--decode
含义解释:decode(条件,值1,返回值1,值2,返回值2,...值n,返回值n,缺省值) 该函数的含义如下:IF 条件=值1 THEN RETURN(翻译值1)ELSIF 条件=值2 THEN R ...
- 【C#】 使用Gsof.Native 动态调用 C动态库
[C#] 使用Gsof.Native 动态调用 C动态库 一.背景 使用C# 开发客户端时候,我们经常会调用一些标准的动态库或是C的类库.虽然C# 提供的PInvoke的方式,但因为使用的场景的多变, ...
- Mybatis关系映射
一.一对一关系映射 使用resultType+包装类实现 1.假设问题背景是要求在某一个购物平台的后台程序中添加一个这样的功能:查询某个订单的信息和下该订单的用户信息.首先我们可以知道,一般这样的平台 ...
- CGI 和 FastCGI 协议的运行原理
目录 介绍 深入CGI协议 CGI的运行原理 CGI协议的缺陷 深入FastCGI协议 FastCGI协议运行原理 为什么是 FastCGI 而非 CGI 协议 CGI 与 FastCGI 架构 再看 ...
- AB(ApacheBench)工具 -- 压力测试
服务器负载太大而影响程序效率也是很常见的,Apache服务器自带有一个叫AB(ApacheBench)的工具,可以对服务器进行负载测试 同时美多商城的秒杀功能也会被高负载影响,从而导致超卖现象 安装x ...
- Ubuntu16.04安装Anaconda (转)
一. Anaconda下载 Anaconda 官方下载链接: https://www.continuum.io/downloads 根据自己的系统选择下载32位还是64位. 二. 进入下载目录 如 ...
- yarn依赖管理工具,和fis3构建工具 gulp详细用法
看视频所了解到的,正在进行摸索. 参考:https://www.cnblogs.com/2050/p/4198792.html这篇介绍gulp的文章非常棒,唯一有一点,页面随时刷新的目前还没实现,不知 ...