jedis中的一致性hash算法

【http://my.oschina.net/u/866190/blog/192286】

jredis是redis的java客户端，通过sharde实现负载路由，一直很好奇jredis的sharde如何实现，翻开jredis源码研究了一番，所谓sharde其实就是一致性hash算法。其实，通过其源码可以看出一致性hash算法实现还是比较简单的。主要实现类是redis.clients.util.Sharded<R, S>，关键的地方添加了注释：

publicclassSharded<R, S extendsShardInfo<R>> { //S类封装了redis节点的信息，如name、权重

publicstaticfinalintDEFAULT_WEIGHT = 1;//默认权重为1

privateTreeMap<Long, S> nodes;//存放虚拟节点

privatefinalHashing algo;//hash算法

......

publicSharded(List<S> shards, Hashing algo, Pattern tagPattern) {

this.algo = algo;

this.tagPattern = tagPattern;

initialize(shards);

}

privatevoidinitialize(List<S> shards) {

nodes = newTreeMap<Long, S>();//基于红黑树实现排序map, 是根据key排序的 ,注意这里key放的是long类型,最多放2^32个

for(inti = 0; i != shards.size(); ++i) {

finalS shardInfo = shards.get(i);

if(shardInfo.getName() == null)

for(intn = 0; n < 160* shardInfo.getWeight(); n++) {

//一个真实redis节点关联多个虚拟节点  , 通过计算虚拟节点hash值,可很好平衡把它分散到2^32个整数上

nodes.put(this.algo.hash("SHARD-"+ i + "-NODE-"+ n), shardInfo);

}

else

for(intn = 0; n < 160* shardInfo.getWeight(); n++) {

//一个真实redis节点关联多个虚拟节点  , 通过计算虚拟节点hash值,可很好平衡把它分散到2^32个整数上

nodes.put(this.algo.hash(shardInfo.getName() + "*"+ shardInfo.getWeight() + n), shardInfo);

}

resources.put(shardInfo, shardInfo.createResource());

}

/**

* 计算key的hash值查找实际实际节点S

* @param key

* @return

*/

publicS getShardInfo(byte[] key) {

SortedMap<Long, S> tail = nodes.tailMap(algo.hash(key));//取出比较key的hash大的

if(tail.isEmpty()) {//取出虚拟节点为空,直接取第一个

returnnodes.get(nodes.firstKey());

}

returntail.get(tail.firstKey());//取出虚拟节点第一个

}

......

}

整个算法可总结为：首先生成一个长度为2^32个整数环，通过计算虚拟节点hash值映射到整数环上，间接也把实际节点也放到这个环上（因为虚拟节点会关联上一个实际节点）。然后根据需要缓存数据的key的hash值在整数环上查找，环顺时针找到距离这个key的hash值最近虚拟节点，这样就完成了根据key到实际节点之间的路由了。

一致性hash核心是思想是增加虚拟节点这一层来解决实际节点变动而不破坏整体的一致性。这种增加层的概念来解决问题对于我们来说一点都不陌生，如软件开发中分层设计，操作系统层解决了应用层和硬件的协调工作，java虚拟机解决了跨平台。

还有一个问题值得关注是一个实际节点虚拟多少个节点才是合适呢？认真看过上述代码同学会注意160这个值，这个实际上是经验值，太多会影响性能，太少又会影响不均衡。通过调整weight值，可实现实际节点权重，这个很好理解，虚拟出节点越多，落到这个节点概率越高。

参考资料

http://blog.csdn.net/sparkliang/article/details/5279393

http://my.oschina.net/u/90679/blog/188750

【Redis Dict 中的MurmurHash2算法算法】【http://my.oschina.net/fuckphp/blog/270258】

Redis 中很多地方用到了hash算法，比如在向 key space中插入新的key的时候，或者在实现hashset数据结构的时候都用到了hash算法，今天主要记录一下dict中用到的两种hash算法：djb2 hash function 和 MurmurHash2两种算法。

djb2 算法：

unsigned long hash(unsigned char *str)

{

//hash种子

unsigned long hash = 5381;

int c;

//遍历字符串中每一个字符

while (c = *str++)

//对hash种子进行位运算 hash << 5表示 hash乘以32次方，再加上 hash 表示hash乘以33

//然后再加上字符的ascii码，之后循环次操作

hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

return hash;

}

至于种子为什么选择 5381，通过搜索得到以下结论，该数算一个魔法常量：

5381是个奇数
5381是质数
5381是缺数
二进制分布均匀：001/010/100/000/101

由于本人对算法是一窍不通，以上特点对hash结果会有什么影响实在不懂，希望高手们能解释一下。

Redis算法对djbhash的实现方法如下（以下代码在 src/dict.c ）：

//hash种子，默认为 5381

static uint32_t dict_hash_function_seed = 5381;

//设置hash种子

void dictSetHashFunctionSeed(uint32_t seed) {

dict_hash_function_seed = seed;

}

//获取hash种子

uint32_t dictGetHashFunctionSeed(void) {

return dict_hash_function_seed;

}

/* And a case insensitive hash function (based on djb hash) */

unsigned int dictGenCaseHashFunction(const unsigned char *buf, intlen) {

//得到hash种子

unsigned int hash = (unsigned int)dict_hash_function_seed;

//遍历字符串

while (len--)

//使用dbj算法反复乘以33并加上字符串转小写后的ascii码

hash = ((hash << 5) + hash) + (tolower(*buf++)); /* hash * 33 + c */

return hash;

}

Redis对djbhash做了一个小小的修改，将需要处理的字符串进行了大小写的转换，是的hash算法的结果与大小写无关。

MurmurHash2算法：

uint32_t MurmurHash2( const void * key, int len, uint32_t seed )

{

// 'm' and 'r' are mixing constants generated offline.

// They're not really 'magic', they just happen to work well.

const uint32_t m = 0x5bd1e995;

const int r = 24;

// Initialize the hash to a 'random' value

uint32_t h = seed ^ len;

// Mix 4 bytes at a time into the hash

const unsigned char * data = (const unsigned char *)key;

while(len >= 4)

{

//每次循环都将4个字节的字符转成一个int类型

uint32_t k = *(uint32_t*)data;

k *= m;

k ^= k >> r;

k *= m;

h *= m;

h ^= k;

data += 4;

len -= 4;

}

// Handle the last few bytes of the input array

//处理结尾不足4个字节的数据，通过移位操作将其转换为一个int型数据

switch(len)

{

case 3: h ^= data[2] << 16;

case 2: h ^= data[1] << 8;

case 1: h ^= data[0];

h *= m;

};

// Do a few final mixes of the hash to ensure the last few

// bytes are well-incorporated.

h ^= h >> 13;

h *= m;

h ^= h >> 15;

return h;

}

unsigned int dictGenHashFunction(const void *key, int len) {

/* 'm' and 'r' are mixing constants generated offline.

They're not really 'magic', they just happen to work well. */

uint32_t seed = dict_hash_function_seed;

const uint32_t m = 0x5bd1e995;

const int r = 24;

/* Initialize the hash to a 'random' value */

uint32_t h = seed ^ len;

/* Mix 4 bytes at a time into the hash */

const unsigned char *data = (const unsigned char *)key;

while(len >= 4) {

uint32_t k = *(uint32_t*)data;

k *= m;

k ^= k >> r;

k *= m;

h *= m;

h ^= k;

data += 4;

len -= 4;

}

/* Handle the last few bytes of the input array */

switch(len) {

case 3: h ^= data[2] << 16;

case 2: h ^= data[1] << 8;

case 1: h ^= data[0]; h *= m;

};

/* Do a few final mixes of the hash to ensure the last few

* bytes are well-incorporated. */

h ^= h >> 13;

h *= m;

h ^= h >> 15;

return (unsigned int)h;

}

参考资料：

http://lenky.info/archives/2012/12/2150

Redis2.8.9源码 src/dict.h src/dict.c

Redis 设计与实现（第一版）

djb hash function

http://code.google.com/p/smhasher/

jedis中的一致性hash算法的更多相关文章

Jedis中的一致性hash
Jedis中的一致性hash 本文仅供大家参考,不保证正确性,有问题请及时指出一致性hash就不多说了,网上有很多说的很好的文章,这里说说Jedis中的Shard是如何使用一致性hash的,也为大家 ...
分布式缓存技术memcached学习（四）—— 一致性hash算法原理
分布式一致性hash算法简介当你看到“分布式一致性hash算法”这个词时,第一时间可能会问,什么是分布式,什么是一致性,hash又是什么.在分析分布式一致性hash算法原理之前,我们先来了解一下这几 ...
【转载】一致性hash算法释义
http://www.cnblogs.com/haippy/archive/2011/12/10/2282943.html 一致性Hash算法背景一致性哈希算法在1997年由麻省理工学院的Karge ...
一致性Hash算法及使用场景
一.问题产生背景在使用分布式对数据进行存储时,经常会碰到需要新增节点来满足业务快速增长的需求.然而在新增节点时,如果处理不善会导致所有的数据重新分片,这对于某些系统来说可能是灾难性的. 那 ...
分布式缓存技术memcached学习系列（四）—— 一致性hash算法原理
分布式一致性hash算法简介当你看到"分布式一致性hash算法"这个词时,第一时间可能会问,什么是分布式,什么是一致性,hash又是什么.在分析分布式一致性hash算法原理之前, ...
[转载] 一致性hash算法释义
转载自http://www.cnblogs.com/haippy/archive/2011/12/10/2282943.html 一致性Hash算法背景一致性哈希算法在1997年由麻省理工学院的Ka ...
分布式缓存设计:一致性Hash算法
缓存作为数据库前的一道屏障,它的可用性与缓存命中率都会直接影响到数据库,所以除了配置主从保证高可用之外还需要设计分布式缓存来扩充缓存的容量,将数据分布在多台机器上如果有一台不可用了对整体影响也比较小. ...
一致性Hash算法（Consistent Hash）
分布式算法在做服务器负载均衡时候可供选择的负载均衡的算法有很多,包括: 轮循算法(Round Robin).哈希算法(HASH).最少连接算法(Least Connection).响应速度算法(Re ...
理解一致性Hash算法
简介一致性哈希算法在1997年由麻省理工学院的Karger等人在解决分布式Cache中提出的,设计目标是为了解决因特网中的热点(Hot spot)问题,初衷和CARP十分类似.一致性哈希修正了CAR ...

随机推荐

<2013 07 05> 804.15. 4--> TI MSP430+CC2520 调试
这一周,实际参与eCar项目的工作正式展开. 来TUM的第一个月,主要熟悉了eCar的机电结构,特别是熟悉了eCar的IT(Information Technology),包括硬件和代码. 来的时候, ...
解决CSS3多列样式column-width布局时内容被截断、错乱
一.问题使用CSS3的column-width: auto | <length>属性时,出现排列的内容被截断,出现错位的现象. 二.原因需要为图片容器设置高度,不然会崩掉. 三.解决方 ...
IBM WebSphere cannot start in RAD 9.1
Have solved the problem. Solutions follows Step 1: double click on "WebSphere Application Serve ...
MySQL中myisam和innodb的主键索引有什么区别？
MyISAM引擎使用B+Tree作为索引结构,叶节点的data域存放的是数据记录的地址.下图是MyISAM索引的原理图: 这里设表一共有三列,假设我们以Col1为主键,则上图是一个MyISAM表的主索 ...
python函数回顾：hex()
描述 hex() 函数用于将10进制整数转换成16进制,以字符串形式表示. 语法 hex 语法: hex(x) 参数说明: x -- 10进制整数返回值返回16进制数,以字符串形式表示. 实例 & ...
DOM 常见事件
onclick //当用户点击某个对象时调用的事件句柄. ondblclick //当用户双击某个对象时调用的事件句柄. onfocus //元素获得焦点. onblur //元素失去焦点. 应用场景 ...
《Tensorflow技术解析与实战》第四章
Tensorflow基础知识 Tensorflow设计理念 (1)将图的定义和图的运行完全分开,因此Tensorflow被认为是一个"符合主义"的库 (2)Tensorflow中涉 ...
asp.net mvc webform和razor的page基类区别
接触过asp.net mvc的都知道,在传统的webform的模式下,page页面的基类是这样声明的: <%@ Page Language="C#" MasterPageFi ...
面向对象高级编程——使用__slots__
正常情况下,我们定义了一个class,创建了一个class的实例后,我们可以给该实例绑定任何属性和方法,这就是动态语言的灵活性.先定义class: >>> class Student ...
LeetCode：二叉树的锯齿形层次遍历【103】
LeetCode:二叉树的锯齿形层次遍历[103] 题目描述给定一个二叉树,返回其节点值的锯齿形层次遍历.(即先从左往右,再从右往左进行下一层遍历,以此类推,层与层之间交替进行). 例如:给定二叉树 ...

jedis中的一致性hash算法

jedis中的一致性hash算法的更多相关文章

随机推荐

热门专题