Java数据结构: java.util.BitSet源码学习

　　接着上一篇Blog：一道面试题与Java位操作和 BitSet 库的使用，分析下Java源码中BitSet类的源码。

　　位图（Bitmap），即位（Bit）的集合，是一种常用的数据结构，可用于记录大量的0-1状态，在很多地方都会用到，比如Linux内核（如inode，磁盘块）、Bloom Filter算法等，其优势是可以在一个非常高的空间利用率下保存大量0-1状态。在Java中，直接面向程序员的最小数据操作粒度是byte，并不提供能够直接操作某个bit的途径，但是程序员可以通过使用位运算符（& | ~ << >> 等等）自己封装位操作。如果不想自己动手，可以使用Java内置的BitSet类，其实现了位图数据结构并提供了一系列有用的接口。

　　java.util.BitSet这个类不大，代码不到1200行，理解起来也不困难，下面分析一下关键的几处代码。（注意下面的代码是基于Oracle jdk1.7.0_45，或者点击这里看源码）：

1.一些属性

     /*

      * BitSets are packed into arrays of "words."  Currently a word is

      * a long, which consists of 64 bits, requiring 6 address bits.

      * The choice of word size is determined purely by performance concerns.

      */

     private final static int ADDRESS_BITS_PER_WORD = 6;

     private final static int BITS_PER_WORD = 1 << ADDRESS_BITS_PER_WORD;

     private final static int BIT_INDEX_MASK = BITS_PER_WORD - 1;

     /* Used to shift left or right for a partial word mask */

     private static final long WORD_MASK = 0xffffffffffffffffL;

　　其实注释已经写得很清楚，BitSet是用long[]来存储数据，一个long是64个bit，所以ADDRESS_BITS_PER_WORD就是6（2^6=64，即表示64个值需要6个地址线的意思）。BITS_PER_WORD是1算数左移6位，即1 × 2^6 = 64，意为一个“字”（long）包含64个bit。BIT_INDEX_MASK是63，即16进制的0x3f，可理解成低6位全为1。WORD_MASK，全1，用于掩码。

　　至于为什么选择long这种数据类型，注释的解析是基于性能的原因，现在64位CPU已经非常普及，可以一次把一个64bit长度的long放进寄存器作计算。

     **

      * The internal field corresponding to the serialField "bits".

      */

     private long[] words;

　　属性words即为实际存储数据的地方.

2.一些公共函数

     /**

      * Given a bit index, return word index containing it.

      */

     private static int wordIndex(int bitIndex) {

         return bitIndex >> ADDRESS_BITS_PER_WORD;

     }

　　这个静态函数在很多其它函数中会用到，用途是传入一个bit的索引值bitIndex，返回这个bit所在的那个long在long[]中的索引值。就是把bitIndex算数右移6位，也就是bitIndex除以64，因为long长度是64bit。比如第50个bit所对应的long就是50 / 64 = 0，即words中的第0个long。

3.构造函数

     /**

      * Creates a new bit set. All bits are initially {@code false}.

      */

     public BitSet() {

         initWords(BITS_PER_WORD);

         sizeIsSticky = false;

     }

     /**

      * Creates a bit set whose initial size is large enough to explicitly

      * represent bits with indices in the range {@code 0} through

      * {@code nbits-1}. All bits are initially {@code false}.

      *

      * @param  nbits the initial size of the bit set

      * @throws NegativeArraySizeException if the specified initial size

      *         is negative

      */

     public BitSet(int nbits) {

         // nbits can't be negative; size 0 is OK

         if (nbits < 0)

             throw new NegativeArraySizeException("nbits < 0: " + nbits);

         initWords(nbits);

         sizeIsSticky = true;

     }

     private void initWords(int nbits) {

         words = new long[wordIndex(nbits-1) + 1];

     }

　　如果用户调用默认构造函数，则会分配一个长度为64bit的BitSet，如果BitSet(int nbits)，则会分配一个大于等于nbits并且是64的整数倍的BitSet，比如调用BitSet(100)，则会分配长度为128的BitSet（即2个long）。

public static BitSet valueOf(long[] longs)

public static BitSet valueOf(LongBuffer lb)

public static BitSet valueOf(byte[] bytes)

public static BitSet valueOf(ByteBuffer bb)

　　BitSet也提供了一些静态函数让用户从一些已有的数据结构中直接构造BitSet。注意上面4个函数都是会把传入参数拷贝一个副本以供BitSet自己使用，所以并不会改变传入参数的数据。

4.动态扩展容量

　　上一篇Blog提到过，BitSet能够在一些操作（如Set()）的时候，如果传入参数大于BitSet本身已有的长度，则它会自动扩展到所需长度。主要以来下面的函数：

     /**

      * Ensures that the BitSet can hold enough words.

      * @param wordsRequired the minimum acceptable number of words.

      */

     private void ensureCapacity(int wordsRequired) {

         if (words.length < wordsRequired) {

             // Allocate larger of doubled size or required size

             int request = Math.max(2 * words.length, wordsRequired);

             words = Arrays.copyOf(words, request);

             sizeIsSticky = false;

         }

     }

　　这个函数的传入参数wordsRequired表示需要多少个“字”，它会与当前words的长度作比较，如果wordsRequired比较大的话，则会新建一个long[]，长度取当前words长度的2倍与wordsRequired中较大的那个值，最后把当前words的内容拷贝到新long[]中，并把这个words指向这个新long[]。这就完成了动态扩容，跟ArrayList的实现方式非常类似，另一方面也看到这份代码不是线程安全的，多线程竞争下必须用户手动同步。

5.flip反转某一位

     /**

      * Sets the bit at the specified index to the complement of its

      * current value.

      *

      * @param  bitIndex the index of the bit to flip

      * @throws IndexOutOfBoundsException if the specified index is negative

      * @since  1.4

      */

     public void flip(int bitIndex) {

         if (bitIndex < 0)

             throw new IndexOutOfBoundsException("bitIndex < 0: " + bitIndex);

         int wordIndex = wordIndex(bitIndex);

         expandTo(wordIndex);

         words[wordIndex] ^= (1L << bitIndex);

         recalculateWordsInUse();

         checkInvariants();

     }

　　flip函数提供反转某一个位的功能。做法是先找到bitIndex所在的long，然后把这个long跟（1L << bitIndex）做“异或”操作（XOR）。注意bitIndex是可以大于63的，左移运算符(<<)支持循环移位，即实际左移位数为（bitIndex%64）这么多。假设用户调用flip(66)，则代码先找到wordIndex = 1，即words[1]这个long。然后（1L << bitIndex）就是（1L << (66%64)）即（1L << 2）= 0b0100，从低位数起第3个位为1，其余都为0。最后把words[1]跟0b0100做XOR，因为布尔运算中一个值与1做XOR的结果就是这个值的反，而与0做异或则不变，所以words[1]的第3位被取反了。

6.clear清除某一个位的值

     /**

      * Sets the bit specified by the index to {@code false}.

      *

      * @param  bitIndex the index of the bit to be cleared

      * @throws IndexOutOfBoundsException if the specified index is negative

      * @since  JDK1.0

      */

     public void clear(int bitIndex) {

         if (bitIndex < 0)

             throw new IndexOutOfBoundsException("bitIndex < 0: " + bitIndex);

         int wordIndex = wordIndex(bitIndex);

         if (wordIndex >= wordsInUse)

             return;

         words[wordIndex] &= ~(1L << bitIndex);

         recalculateWordsInUse();

         checkInvariants();

     }

　　其实也就是把某一个位设为0。过程与上面flip类似，但进行的位运算不一样，这里是把(1L << bitIndex)取反再跟words[wordIndex]进行“与”运算（AND）。原理其实很简单，布尔运算中一个值和1做AND运算，则其值不变；而如果和0做AND运算，则结果为0。比如：1100 & ~(0100) 等于 1100 & 1011 = 1000.

　　另外BitSet还提供了get, set接口、跟另一个BitSet对象做AND/OR/XOR运算的接口,这些都是用到位运算,比较好理解,不再赘述,请自行参考API.