JAVA8 HashMap 源码阅读

　　序

　　阅读java源码可能是每一个java程序员的必修课，只有知其所以然，才能更好的使用java，写出更优美的程序，阅读java源码也为我们后面阅读java框架的源码打下了基础。阅读源代码其实就像再看一篇长篇推理小说一样，不能急于求成，需要慢慢品味才行。这一系列的文章，记录了我阅读源码的收获与思路，读者也可以借鉴一下，也仅仅是借鉴，问渠那得清如许，绝知此事要躬行!要想真正的成为大神，还是需要自己亲身去阅读源码而不是看几篇分析源码的博客就可以的。

　　正文

　　HashMap是我们经常用的的一个集合类，其中java对于Hash散列表的维护、大小的动态扩展以及解决Hash冲突的方法都是值得我们借鉴的。如何更好地使用HashMap，建议大家把JAVA API文档拿来读读，其中对于如何很好的使用HashMap做了详细的说明，在一个是将HashMap的源代码自行分析一遍。

　　总结

　　JAVA8中对HashMap的优化

　　通过阅读源码，我们可以了解到，在java1.8这个版本中，SUN大神们为hashmap的查询进行了进一步优化，原来hashmap是hash表+链表的形式，在1.8中变为了hash表+链表/树的形式，即在一定条件下同一hash值对应的链表会被转化为树,进而优化了查询。通过这次学习Hashmap的源码实现，我们可以学习到如何利用树来对数组查找进行优化。那么何时树化?何时调整表的大小?

　　在HashMap的类成员中，有一个叫做MIN_TREEIFY_CAPACITY的常量，它规定了当HashMap被使用的空间大小超过这个常量的值时，才会开始树化。而针对每一个hash值对应的链表，有一个叫TREEIFY_THRESHOLD

　　常量，规定了当链表的大小超过其时，就对此链表进行树化。

　　源码分析

　　HashMap关键的变量:

　　/**

　　* The default initial capacity - MUST be a power of two.

　　* HashMap中哈希表的初始容量默认值

　　static final int DEFAULT_INITIAL_CAPACITY = 1 4; // aka 16

　　/**

　　* The maximum capacity, used if a higher value is implicitly specified

　　* by either of the constructors with arguments.

　　* MUST be a power of two = 130.

　　* 哈希表的最大容量

　　static final int MAXIMUM_CAPACITY = 1 30;

　　/**

　　* The load factor used when none specified in constructor.

　　* 负载因子的默认值

　　static final float DEFAULT_LOAD_FACTOR = 0.75f;

　　/**

　　* The bin count threshold for using a tree rather than list for a

　　* bin. Bins are converted to trees when adding an element to a

　　* bin with at least this many nodes. The value must be greater

　　* than 2 and should be at least 8 to mesh with assumptions in

　　* tree removal about conversion back to plain bins upon

　　* shrinkage.

　　* 将链表树化的阀值，即当链表存储量达到多大时，将其转化为树

　　static final int TREEIFY_THRESHOLD = 8;

　　/**

　　* The bin count threshold for untreeifying a (split) bin during a

　　* resize operation. Should be less than TREEIFY_THRESHOLD, and at

　　* most 6 to mesh with shrinkage detection under removal.

　　* 将树转化为链表的阀值。

　　static final int UNTREEIFY_THRESHOLD = 6;

　　/**

　　* The smallest table capacity for which bins may be treeified.

　　* (Otherwise the table is resized if too many nodes in a bin.)

　　* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts

　　* between resizing and treeification thresholds.

　　* 这个字段决定了当hash表的至少大小为多少时，链表才能进行树化。这个设计时合理的，

　　* 因为当hash表的大小很小时，这时候表所需的空间还不多，可以牺牲空间减少时间，所以这个情况下

　　* 当存储的节点过多时，最好的办法是调整表的大小，使其增大，而不是将链表树化。

　　static final int MIN_TREEIFY_CAPACITY = 64;

　　/**

　　* The table, initialized on first use, and resized as

　　* necessary. When allocated, length is always a power of two.

　　* (We also tolerate length zero in some operations to allow

　　* bootstrapping mechanics that are currently not needed.)

　　* hash表

　　transient Node[] table;

　　/**

　　* The number of key-value mappings contained in this map.

　　* Hashmap当前的大小

　　transient int size;

　　/**

　　* The number of times this HashMap has been structurally modified

　　* Structural modifications are those that change the number of mappings in

　　* the HashMap or otherwise modify its internal structure (e.g.,

　　* rehash). This field is used to make iterators on Collection-views of

　　* the HashMap fail-fast. (See ConcurrentModificationException).

　　transient int modCount;

　　/**

　　* The next size value at which to resize (capacity * load factor).

　　* @serial

　　// (The javadoc description is true upon serialization.

　　// Additionally, if the table array has not been allocated, this

　　// field holds the initial array capacity, or zero signifying

　　// DEFAULT_INITIAL_CAPACITY.)

　　//阀值，它决定了hashmap何时进行扩容。

　　int threshold;

　　/**

　　* The load factor for the hash table.

　　* 负载因子，用于计算阀值，它等于threshold与hashmap当前容量的比例。

　　* @serial

　　final float loadFactor;

　　再介绍了hashmap的重要变量之后，我们就可以看看其最关键的put()方法与resize()方法了：

　　put()：

　　public V put(K key, V value) {

　　return putVal(hash(key), key, value, false, true);

　　}

　　/**

　　* Implements Map.put and related methods

　　* @param hash hash for key

　　* @param key the key

　　* @param value the value to put

　　* @param onlyIfAbsent if true, don't change existing value

　　* @param evict if false, the table is in creation mode.

　　* @return previous value, or null if none

　　final V putVal(int hash, K key, V value, boolean onlyIfAbsent,

　　boolean evict) {

　　Node[] tab; Nodep; int n, i;

　　//如果还没有为hash表申请空间，那么就使用resize()方法初始化hash表。

　　if ((tab = table) == null || (n = tab.length) == 0)

　　n = (tab = resize()).length;

　　//如果没有发生hash冲突，则直接将数据存入hash表中

　　if ((p = tab[i = (n - 1) hash]) == null)

　　tab[i] = newNode(hash, key, value, null);

　　//如果发生了冲突

　　else {

　　Nodee; K k;

　　//判断是否与链表头结点相同

　　if (p.hash == hash

　　((k = p.key) == key || (key != null key.equals(k))))

　　e = p;

　　//判断当前要插入的链表表示的结构是否是树，如果是则交给putTreeVal()处理

　　else if (p instanceof TreeNode)

　　e = ((TreeNode)p).putTreeVal(this, tab, hash, key, value);

　　//当要插入的数据不是头结点，并且链表没有被树化的情况下

　　else {

　　//遍历链表，判断是做修改还是插入操作

　　for (int binCount = 0; ; ++binCount) {

　　//如果是插入操作

　　if ((e = p.next) == null) {

　　p.next = newNode(hash, key, value, null);

　　//判断是否达到树化的标准

　　if (binCount = TREEIFY_THRESHOLD - 1) // -1 for 1st

　　treeifyBin(tab, hash); //树化链表

　　break;

　　}

　　//发现当前链表中的结点有与要插入的数据相同的key，跳出循环进行修改操作

　　if (e.hash == hash

　　((k = e.key) == key || (key != null key.equals(k))))

　　break;

　　p = e;

　　}

　　//修改操作，，进行Value数据的修改

　　if (e != null) { // existing mapping for key

　　V oldValue = e.value;

　　if (!onlyIfAbsent || oldValue == null)

　　e.value = value;

　　//此方法是在LinkedHashMap中实现的，在HashMap中为空

　　afterNodeAccess(e);

　　return oldValue;

　　}

　　++modCount;

　　//判断put()后是否需要扩容

　　if (++size threshold)

　　resize();

　　//此方法是在LinkedHashMap中实现的，在HashMap中为空

　　afterNodeInsertion(evict);

　　return null;

　　}

　　这里需要注意几点:

　　1.为什么在查找插入数据在hash表中相应位置时，使用的是hash(key)(length-1)而不是hash(key)?

　　因为hash(key)的值是随机的，无法确定其范围，通过操作，相当于对hash表的长度取模，能够在保证数据随机均匀的分布在hash表中，并且限制hash值的范围。

　　2.因为链表的存在，所以理论上hashmap的容量是没有上限的，但是当hash表无法继续扩充时，随着存储数据的增加，其查找效率会逐渐降低。

　　3.负载因子的作用：负载因子，其实表达的都是HashMap容量空间的占有程度，它存在的意义是为了协调查找效率与空间利用率之间的平衡。Capacity*loadFactor=threshold，threshold其实就表示了hashmap的真实容量大小，而Capacity则是hash表的长度。负载因子越大则代表容量空间的占有程度高，也就是能容纳更多的元素，元素多了，链表大了，所以此时查找效率就会降低。反之，负载因子越小则链表中的数据量就越稀疏，此时会对空间造成烂费，但是此时查找效率高。

　　resize():

　　final Node[] resize() {

　　Node[] oldTab = table;

　　int oldCap = (oldTab == null) ? 0 : oldTab.length;

　　int oldThr = threshold;

　　int newCap, newThr = 0;

　　//先确定新表的hash表长度与阀值。

　　if (oldCap 0) {

　　//判断当前hash表的长度是否已经达到上限，如果是则将阀值设置为Integer.MAX_VALUE并返回

　　if (oldCap = MAXIMUM_CAPACITY) {

　　threshold = Integer.MAX_VALUE;

　　return oldTab;

　　}

　　//将容量大小增加为原来的二倍，并计算相应的阀值

　　else if ((newCap = oldCap 1) MAXIMUM_CAPACITY

　　oldCap = DEFAULT_INITIAL_CAPACITY)

　　newThr = oldThr 1; // double threshold

　　}

　　else if (oldThr 0) // initial capacity was placed in threshold

　　newCap = oldThr;

　　else { // zero initial threshold signifies using defaults

　　newCap = DEFAULT_INITIAL_CAPACITY;

　　newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);

　　}

　　if (newThr == 0) {

　　float ft = (float)newCap * loadFactor;

　　newThr = (newCap MAXIMUM_CAPACITY ft (float)MAXIMUM_CAPACITY ?

　　(int)ft : Integer.MAX_VALUE);

　　}

　　threshold = newThr;

　　@SuppressWarnings({rawtypes,unchecked})

　　//在确定hash表长度后，创建新表，将旧表中的数据进行迁移到新表。

　　Node[] newTab = (Node[])new Node[newCap];

　　table = newTab;

　　if (oldTab != null) {

　　for (int j = 0; j oldCap; ++j) {

　　Nodee;

　　if ((e = oldTab[j]) != null) {

　　oldTab[j] = null;

　　if (e.next == null)

　　newTab[e.hash (newCap - 1)] = e;

　　//判断是否已经树化，如果是则调用树化的相应方法进行数据迁移

　　else if (e instanceof TreeNode)

　　((TreeNode)e).split(this, newTab, j, oldCap);

　　//非树化链表的情况

　　else { // preserve order

　　NodeloHead = null, loTail = null;

　　NodehiHead = null, hiTail = null;

　　Nodenext;

　　//这里的代码需要好好看，就是因为此处实现方式的原因，

　　//导致了hashmap遍历时不能保证数据能够一直按照插入或者修改的顺序访问。

　　do {

　　next = e.next;

　　if ((e.hash oldCap) == 0) {

　　if (loTail == null)

　　loHead = e;

　　else

　　loTail.next = e;

　　loTail = e;

　　}

　　else {

　　if (hiTail == null)

　　hiHead = e;

　　else

　　hiTail.next = e;

　　hiTail = e;

　　}

　　} while ((e = next) != null);

　　if (loTail != null) {

　　loTail.next = null;

　　newTab[j] = loHead;

　　}

　　if (hiTail != null) {

　　hiTail.next = null;

　　newTab[j + oldCap] = hiHead;

　　}

　　return newTab;

　　}

　　关于resize需要注意的是：

　　1.resize扩容后的容量是原来的两倍，直到容量达到最大，这时就会更改阈值来继续扩容。

　　2.正是因为resize的扩容原理，导致了Hashmap不能保证插入数据的顺序性，当然如果一定要保证，我们可以使用LinkedHashMap

JAVA8 HashMap 源码阅读的更多相关文章

java8 ArrayList源码阅读
转载自 java8 ArrayList源码阅读本文基于jdk1.8 JavaCollection库中有三类:List,Queue,Set 其中List,有三个子实现类:ArrayList,Vecto ...
HashMap源码阅读笔记
HashMap源码阅读笔记本文在此博客的内容上进行了部分修改,旨在加深笔者对HashMap的理解,暂不讨论红黑树相关逻辑概述 HashMap作为经常使用到的类,大多时候都是只知道大概原理,比如 ...
HashMap源码阅读与解析
目录结构导入语 HashMap构造方法 put()方法解析 addEntry()方法解析 get()方法解析 remove()解析 HashMap如何进行遍历一.导入语 HashMap是我们最常见 ...
【JAVA】HashMap源码阅读
目录 1.关键的几个static参数 2.内部类定义Node节点 3.成员变量 4.静态方法 5.HashMap的四个构造方法 6.put方法 7.扩容resize方法 8.get方法 9.remov ...
Java8 HashMap源码分析
java.util.HashMap是最常用的java容器类之一, 它是一个线程不安全的容器. 本文对JDK1.8.0中的HashMap实现源码进行分析. HashMap使用位运算巧妙的进行散列并使用链 ...
java8 HashMap源码详细研读
HashMap原理目的: 单纯分析和学习hashmap的实现,不多说与Hashtable.ConcurrentHashMap等的区别. 基于 jdk1.8 在面试中有些水平的公司比较喜欢问HashM ...
HashMap源码阅读笔记（基于jdk1.8）
1.HashMap概述: HashMap是基于Map接口的一个非同步实现,此实现提供key-value形式的数据映射,支持null值. HashMap的常量和重要变量如下: DEFAULT_INITI ...
HashMap源码阅读
HashMap是Map家族中使用频度最高的一个,下文主要结合源码来讲解HashMap的工作原理. 1. 数据结构 HashMap的数据结构主要由数组+链表+红黑树(JDK1.8后新增)组成,如下图所示 ...
HashMap 源码阅读
前言之前读过一些类的源码,近来发现都忘了,再读一遍整理记录一下.这次读的是 JDK 11 的代码,贴上来的源码会去掉大部分的注释, 也会加上一些自己的理解. Map 接口这里提一下 Map 接口与 ...

随机推荐

foo ?
我们经常看到一些基础教程,面试题中经经常使用foo来命名,甚至有时候我们也会用过,可是你是否又知道foo是什么意思?(实际上,知道不知道又不会对你编码有不论什么影响~) 从编程黑马的王轶男的话来解释, ...
mysql不乱码的思想总结
不乱码的思想:中文环境下建议选择utf-8 1.linux服务器端的设置: 1 [root@localhost app]# cat /etc/sysconfig/i18n 2 LANG="e ...
OS知识点总结
转自:https://blog.csdn.net/csdn_chai/article/details/78002202 1.什么是操作系统? OS是用户与硬件之间的接口,管理计算机的软件和硬件资源. ...
genymotion——在虚拟机中当中安装genymotion，启动已经新增好的设备时，提示：the virtual device got no ip address
1.启动已经新增好的设备时,提示:the virtual device got no ip address,于是在网上搜索该问题,便得到提示,先启动virtual box中的该模拟设备,于是便启动,出 ...
xphrof出现502问题
This is an xhprof bug and not a devel bug, but I thought I'd throw the workaround up here in case pe ...
fake-useragent，python爬虫伪装请求头
在编写爬虫进行网页数据的时候,大多数情况下,需要在请求是增加请求头,下面介绍一个python下非常好用的伪装请求头的库:fake-useragent,具体使用说明如下: 1.在scrapy中的使用第 ...
SqlHelper简单实现（通过Expression和反射）10.使用方式
以下是整个SqlHelper的Demo: public Result<List<ArticleDTO>> GetIndexArticleList(int count, int ...
eclipse 创建jsp报错
Visual C++的DLL
动态链接库 (DLL) 是作为共享函数库的可执行文件. 动态链接提供了一种方法,使进程可以调用不属于其可执行代码的函数. 函数的可执行代码位于一个 DLL 中,该 DLL 包含一个或多个已被编译.链接 ...
Python学习示例源码
函数和函数式编程函数定义: 函数调用: 过程定义: 过程调用: 面向过程的编程方法: """ 面向对象-----类------class 面向过程-----过程---- ...

JAVA8 HashMap 源码阅读

JAVA8 HashMap 源码阅读的更多相关文章

随机推荐

热门专题