Consistent hashing —— 一致性哈希
原文地址:http://www.codeproject.com/Articles/56138/Consistent-hashing
基于BSD License
What is libconhash
libconhash is a consistent hashing library which can be compiled both on Windows and Linux platforms, with the following features:
- High performance and easy to use, libconhash uses a red-black tree to manage all nodes to achieve high performance.
- By default, it uses the MD5 algorithm, but it also supports user-defined hash functions.
- Easy to scale according to the node's processing capacity.
Consistent hashing
Why you need consistent hashing
Now we will consider the common way to do load balance. The machine number chosen to cache object o will be:
hash(o) mod n
Here, n is the total number of cache machines. While this works well until you add or remove cache machines:
- When you add a cache machine, then object o will be cached into the machine:
hash(o) mod (n+1)
- When you remove a cache machine, then object o will be cached into the machine:
hash(o) mod (n-1)
So you can see that almost all objects will hashed into a new location. This will be a disaster since the originating content servers are swamped with requests from the cache machines. And this is why you need consistent hashing.
Consistent hashing can guarantee that when a cache machine is removed, only the objects cached in it will be rehashed; when a new cache machine is added, only a fairly few objects will be rehashed.
Now we will go into consistent hashing step by step.
Hash space
Commonly, a hash function will map a value into a 32-bit key, 0~2^32-1. Now imagine mapping the range into a circle, then the key will be wrapped, and 0 will be followed by 2^32-1, as illustrated in figure 1.
Map object into hash space
Now consider four objects: object1~object4. We use a hash function to get their key values and map them into the circle, as illustrated in figure 2.
hash(object1) = key1;
.....
hash(object4) = key4;
Map the cache into hash space
The basic idea of consistent hashing is to map the cache and objects into the same hash space using the same hash function.
Now consider we have three caches, A, B and C, and then the mapping result will look like in figure 3.
hash(cache A) = key A;
....
hash(cache C) = key C;
Map objects into cache
Now all the caches and objects are hashed into the same space, so we can determine how to map objects into caches. Take object obj for example, just start from where obj is and head clockwise on the ring until you find a server. If that server is down, you go to the next one, and so forth. See figure 3 above.
According to the method, object1 will be cached into cache A; object2 and object3 will be cached into cache C, and object4 will be cached into cache B.
Add or remove cache
Now consider the two scenarios, a cache is down and removed; and a new cache is added.
If cache B is removed, then only the objects that cached in B will be rehashed and moved to C; in the example, see object4 illustrated in figure 4.
If a new cache D is added, and D is hashed between object2 and object3 in the ring, then only the objects that are between D and B will be rehashed; in the example, see object2, illustrated in figure 5.
Virtual nodes
It is possible to have a very non-uniform distribution of objects between caches if you don't deploy enough caches. The solution is to introduce the idea of "virtual nodes".
Virtual nodes are replicas of cache points in the circle, each real cache corresponds to several virtual nodes in the circle; whenever we add a cache, actually, we create a number of virtual nodes in the circle for it; and when a cache is removed, we remove all its virtual nodes from the circle.
Consider the above example. There are two caches A and C in the system, and now we introduce virtual nodes, and the replica is 2, then three will be 4 virtual nodes. Cache A1 and cache A2 represent cache A; cache C1 and cache C2 represent cache C, illustrated as in figure 6.
Then, the map from object to the virtual node will be:
objec1->cache A2; objec2->cache A1; objec3->cache C1; objec4->cache C2
When you get the virtual node, you get the cache, as in the above figure.
So object1 and object2 are cached into cache A, and object3 and object4 are cached into cache. The result is more balanced now.
So now you know what consistent hashing is.
Using the code
Interfaces of libconhash
/* initialize conhash library
* @pfhash : hash function, NULL to use default MD5 method
* return a conhash_s instance
*/
CONHASH_API struct conhash_s* conhash_init(conhash_cb_hashfunc pfhash); /* finalize lib */
CONHASH_API void conhash_fini(struct conhash_s *conhash); /* set node */
CONHASH_API void conhash_set_node(struct node_s *node,
const char *iden, u_int replica); /*
* add a new node
* @node: the node to add
*/
CONHASH_API int conhash_add_node(struct conhash_s *conhash,
struct node_s *node); /* remove a node */
CONHASH_API int conhash_del_node(struct conhash_s *conhash,
struct node_s *node);
... /*
* lookup a server which object belongs to
* @object: the input string which indicates an object
* return the server_s structure, do not modify the value,
* or it will cause a disaster
*/
CONHASH_API const struct node_s*
conhash_lookup(const struct conhash_s *conhash,
const char *object);
Libconhash is very easy to use. There is a sample in the project that shows how to use the library.
First, create a conhash instance. And then you can add or remove nodes of the instance, and look up objects.
The update node's replica function is not implemented yet.
/* init conhash instance */
struct conhash_s *conhash = conhash_init(NULL);
if(conhash)
{
/* set nodes */
conhash_set_node(&g_nodes[0], "titanic", 32);
/* ... */ /* add nodes */
conhash_add_node(conhash, &g_nodes[0]);
/* ... */
printf("virtual nodes number %d\n", conhash_get_vnodes_num(conhash));
printf("the hashing results--------------------------------------:\n"); /* lookup object */
node = conhash_lookup(conhash, "James.km");
if(node) printf("[%16s] is in node: [%16s]\n", str, node->iden);
}
Reference
License
This article, along with any associated source code and files, is licensed under The BSD License
Consistent hashing —— 一致性哈希的更多相关文章
- hash环/consistent hashing一致性哈希算法
一致性哈希算法在1997年由麻省理工学院提出的一种分布式哈希(DHT)实现算法,设计目标是为了解决因特网中的热点(Hot spot)问题,初衷和CARP十分类似.一致性哈希修正了CARP使用的 ...
- consistent hash(一致性哈希算法)
一.产生背景 今天咱不去长篇大论特别详细地讲解consistent hash,我争取用最轻松的方式告诉你consistent hash算法是什么,如果需要深入,Google一下~. 举个栗子吧: 比如 ...
- 一致性哈希算法(consistent hashing)(转)
原文链接:每天进步一点点——五分钟理解一致性哈希算法(consistent hashing) 一致性哈希算法在1997年由麻省理工学院提出的一种分布式哈希(DHT)实现算法,设计目标是为了解决因特网 ...
- 一致性哈希算法学习及JAVA代码实现分析
1,对于待存储的海量数据,如何将它们分配到各个机器中去?---数据分片与路由 当数据量很大时,通过改善单机硬件资源的纵向扩充方式来存储数据变得越来越不适用,而通过增加机器数目来获得水平横向扩展的方式则 ...
- 一致性哈希算法与Java实现
原文:http://blog.csdn.net/wuhuan_wp/article/details/7010071 一致性哈希算法是分布式系统中常用的算法.比如,一个分布式的存储系统,要将数据存储到具 ...
- Consistent Hashing算法
前几天看了一下Memcached,看到Memcached的分布式算法时,知道了一种Consistent Hashing的哈希算法,上网搜了一下,大致了解了一下这个算法,做下记录. 数据均衡分布技术在分 ...
- _00013 一致性哈希算法 Consistent Hashing 新的讨论,并出现相应的解决
笔者博文:妳那伊抹微笑 博客地址:http://blog.csdn.net/u012185296 个性签名:世界上最遥远的距离不是天涯,也不是海角,而是我站在妳的面前.妳却感觉不到我的存在 技术方向: ...
- 深入一致性哈希(Consistent Hashing)算法原理,并附100行代码实现
转自:https://my.oschina.net/yaohonv/blog/1610096 本文为实现分布式任务调度系统中用到的一些关键技术点分享——Consistent Hashing算法原理和J ...
- (转)每天进步一点点——五分钟理解一致性哈希算法(consistent hashing)
背景:在redis集群中,有关于一致性哈希的使用. 一致性哈希:桶大小0~(2^32)-1 哈希指标:平衡性.单调性.分散性.负载性 为了提高平衡性,引入“虚拟节点” 每天进步一点点——五分钟理解一致 ...
随机推荐
- PHP 文件的操作
操作文件的步骤: 1.打开文件2.做操作PS!!!3.关闭文件 打开 操作
- Java源文件编译成功但是运行时加载不到文件
最近系统重装了一些,Java等环境变量都需要重新配置,配置好以后编写了一个Java源文件编译了一下,通过Javac编译源文件,编译成功,但是再通过Java运行时没找到报出找不到加载文件或者加载文件不存 ...
- 《基于Apache Kylin构建大数据分析平台》
Kyligence联合创始人兼CEO,Apache Kylin项目管理委员会主席(PMC Chair)韩卿 武汉市云升科技发展有限公司董事长,<智慧城市-大数据.物联网和云计算之应用>作者 ...
- ngui的tween的tweenFactor属性
ngui的tween的tweenFactor属性 这个属性是用来记录动画运行的位置的.可以通过设置它来达到动画运行到一半从新设置从新开始
- CSS3图片倒影技术实现及原理
CSS3图片倒影技术实现及原理 目前为止我们已经探讨了很多CSS3中的新功能和新特征.除了上面这些,实际上还有很多CSS新属性并未包含进CSS3官方标准中,像谷歌浏览器或火狐浏览器等都会利用CSS的浏 ...
- (转)px、em、rem的区别和使用
国内的设计师大都喜欢用px,而国外的网站大都喜欢用em和rem(国外的大部分网站能够调整的原因在于其使用了em或rem作为字体单位),那么三者有什么区别,又各自有什么优劣呢? 一.px特点 1. IE ...
- MVC MVP 和 MVVM的图示
一直对于这些什么MVC MVP 和 MVVM都是云里雾里的 完全分不清楚 感觉jq上也没怎么用过,理解也很片面,画几张图也许能够大体分清他们之间的区别. 1.MVC(Model-View-Contro ...
- ORACLE中的支持正则表达式的函数
ORACLE中的支持正则表达式的函数主要有下面四个:1,REGEXP_LIKE :与LIKE的功能相似2,REGEXP_INSTR :与INSTR的功能相似3,REGEXP_SUBSTR :与SUBS ...
- Objective-C_基本数据类型详解
今天在工作群里面看到有人在发面试题求帮解答,顺便看了一眼,发现一个很侮辱程序员的面试题,但是自己也答得不是很好,所以特意上网查了一下资料,废话不说,附原题: “常见的Objective-C的数据类型有 ...
- CAST()函数
语法: CAST(expression AS data_type) 参数说明: expression:任何有效的SQServer表达式 AS:用于分割两个参数,在AS之前的是需要处理的数据,在AS之后 ...