JDK（四）JDK1.8源码分析【排序】DualPivotQuicksort

本文转载自于晓飞93，原文链接 DualPivotQuickSort 双轴快速排序源码笔记

DualPivotQuicksort是Arrays类中提供的给基本类型的数据排序的算法。它针对每种基本数据类型都有对应的实现，实现方式有细微差异，但思路都是相同的，所以这里只挑选int类型的排序。

整个实现中的思路是：首先检查数组的长度，比一个阈值小的时候直接使用双轴快排。其它情况下，先检查数组中数据的顺序连续性。把数组中连续升序或者连续降序的信息记录下来，顺便把连续降序的部分倒置。这样数据就被切割成一段段连续升序的数列。

如果顺序连续性好，直接使用TimSort算法。TimSort算法的核心在于利用数列中的原始顺序，所以可以提高很多效率。

顺序连续性不好的数组直接使用了双轴快排 + 成对插入排序。成对插入排序是插入排序的改进版，它采用了同时插入两个元素的方式调高效率。双轴快排是从传统的单轴快排到3-way快排演化过来的。参考：QUICKSORTING - 3-WAY AND DUAL PIVOT

final class DualPivotQuicksort {

    /**

     * Prevents instantiation.

     */

    private DualPivotQuicksort() {}

    /**

     * 待合并的序列的最大数量

     * The maximum number of runs in merge sort.

     */

    private static final int MAX_RUN_COUNT = 67;

    /**

     * 待合并的序列的最大长度

     * The maximum length of run in merge sort.

     */

    private static final int MAX_RUN_LENGTH = 33;

    /**

     * 如果参与排序的数组长度小于这个值，优先使用快速排序而不是归并排序

     * If the length of an array to be sorted is less than this

     * constant, Quicksort is used in preference to merge sort.

     */

    private static final int QUICKSORT_THRESHOLD = 286;

    /**

     * 如果参与排序的数组长度小于这个值，优先考虑插入排序，而不是快速排序

     * If the length of an array to be sorted is less than this

     * constant, insertion sort is used in preference to Quicksort.

     */

    private static final int INSERTION_SORT_THRESHOLD = 47;

    /**

     * Sorts the specified range of the array using the given

     * workspace array slice if possible for merging

     *

     * @param a the array to be sorted

     * @param left the index of the first element, inclusive, to be sorted

     * @param right the index of the last element, inclusive, to be sorted

     * @param work a workspace array (slice)

     * @param workBase origin of usable space in work array

     * @param workLen usable size of work array

     */

    static void sort(int[] a, int left, int right,

                     int[] work, int workBase, int workLen) {

        // Use Quicksort on small arrays

        if (right - left < QUICKSORT_THRESHOLD) {

            sort(a, left, right, true);

            return;

        }

        /*

         * run[i] 意味着第i个有序数列开始的位置，（升序或者降序）

         * Index run[i] is the start of i-th run

         * (ascending or descending sequence).

         */

        int[] run = new int[MAX_RUN_COUNT + 1];

        int count = 0; run[0] = left;

        // 检查数组是不是已经接近有序状态

        // Check if the array is nearly sorted

        for (int k = left; k < right; run[count] = k) {

            if (a[k] < a[k + 1]) { // ascending 升序

                while (++k <= right && a[k - 1] <= a[k]);

            } else if (a[k] > a[k + 1]) { // descending 降序

                while (++k <= right && a[k - 1] >= a[k]);

                // 如果是降序的，找出k之后，把数列倒置

                for (int lo = run[count] - 1, hi = k; ++lo < --hi; ) {

                    int t = a[lo]; a[lo] = a[hi]; a[hi] = t;

                }

            } else { // equal 相等

                for (int m = MAX_RUN_LENGTH; ++k <= right && a[k - 1] == a[k]; ) {

                    // 数列中有至少MAX_RUN_LENGTH的数据相等的时候，直接使用快排

                    if (--m == 0) {

                        sort(a, left, right, true);

                        return;

                    }

                }

            }

            /*

             * 数组并非高度有序，使用快速排序，因为数组中有序数列的个数超过了MAX_RUN_COUNT

             * The array is not highly structured,

             * use Quicksort instead of merge sort.

             */

            if (++count == MAX_RUN_COUNT) {

                sort(a, left, right, true);

                return;

            }

        }

        // 检查特殊情况

        // Check special cases

        // Implementation note: variable "right" is increased by 1.

        if (run[count] == right++) { // The last run contains one element   // 最后一个有序数列只有最后一个元素

            run[++count] = right;   // 那给最后一个元素的后面加一个哨兵

        } else if (count == 1) { // The array is already sorted // 整个数组中只有一个有序数列，说明数组已经有序啦，不需要排序了

            return;

        }

        // Determine alternation base for merge

        byte odd = 0;

        for (int n = 1; (n <<= 1) < count; odd ^= 1);

        // 创建合并用的临时数组

        // Use or create temporary array b for merging

        int[] b;                 // temp array; alternates with a

        int ao, bo;              // array offsets from 'left'

        int blen = right - left; // space needed for b

        if (work == null || workLen < blen || workBase + blen > work.length) {

            work = new int[blen];

            workBase = 0;

        }

        if (odd == 0) {

            System.arraycopy(a, left, work, workBase, blen);

            b = a;

            bo = 0;

            a = work;

            ao = workBase - left;

        } else {

            b = work;

            ao = 0;

            bo = workBase - left;

        }

        // 合并

        // 最外层循环，直到count为1，也就是栈中待合并的序列只有一个的时候，标志合并成功

        // a 做原始数组，b 做目标数组

        // Merging

        for (int last; count > 1; count = last) {

            // 遍历数组，合并相邻的两个升序序列

            for (int k = (last = 0) + 2; k <= count; k += 2) {

                // 合并run[k-2] 与 run[k-1]两个序列

                int hi = run[k], mi = run[k - 1];

                for (int i = run[k - 2], p = i, q = mi; i < hi; ++i) {

                    if (q >= hi || p < mi && a[p + ao] <= a[q + ao]) {

                        b[i + bo] = a[p++ + ao];

                    } else {

                        b[i + bo] = a[q++ + ao];

                    }

                }

                // 这里把合并之后的数列往前移动

                run[++last] = hi;

            }

            // 如果栈的长度为奇数，那么把最后落单的有序数列copy过对面

            if ((count & 1) != 0) {

                for (int i = right, lo = run[count - 1]; --i >= lo;

                    b[i + bo] = a[i + ao]

                );

                run[++last] = right;

            }

            // 临时数组，与原始数组对调，保持a做原始数组，b 做目标数组

            int[] t = a; a = b; b = t;

            int o = ao; ao = bo; bo = o;

        }

    }

    /**

     * Sorts the specified range of the array by Dual-Pivot Quicksort.

     *

     * @param a the array to be sorted

     * @param left the index of the first element, inclusive, to be sorted

     * @param right the index of the last element, inclusive, to be sorted

     * @param leftmost indicates if this part is the leftmost in the range

     */

    private static void sort(int[] a, int left, int right, boolean leftmost) {

        int length = right - left + 1;

        // 小数组使用插入排序

        // Use insertion sort on tiny arrays

        if (length < INSERTION_SORT_THRESHOLD) {

            if (leftmost) {

                /*

                 * 经典的插入排序算法，不带哨兵。做了优化，在leftmost情况下使用

                 * Traditional (without sentinel) insertion sort,

                 * optimized for server VM, is used in case of

                 * the leftmost part.

                 */

                for (int i = left, j = i; i < right; j = ++i) {

                    int ai = a[i + 1];

                    while (ai < a[j]) {

                        a[j + 1] = a[j];

                        if (j-- == left) {

                            break;

                        }

                    }

                    a[j + 1] = ai;

                }

            } else {

                /*

                 * 首先跨过开头的升序的部分

                 * Skip the longest ascending sequence.

                 */

                do {

                    if (left >= right) {

                        return;

                    }

                } while (a[++left] >= a[left - 1]);

                /*

                 * 这里用到了成对插入排序方法，它比简单的插入排序算法效率要高一些

                 * 因为这个分支执行的条件是左边是有元素的

                 * 所以可以直接从left开始往前查找

                 *

                 * Every element from adjoining part plays the role

                 * of sentinel, therefore this allows us to avoid the

                 * left range check on each iteration. Moreover, we use

                 * the more optimized algorithm, so called pair insertion

                 * sort, which is faster (in the context of Quicksort)

                 * than traditional implementation of insertion sort.

                 */

                for (int k = left; ++left <= right; k = ++left) {

                    int a1 = a[k], a2 = a[left];

                    // 保证a1>=a2

                    if (a1 < a2) {

                        a2 = a1; a1 = a[left];

                    }

                    // 先把两个数字中较大的那个移动到合适的位置

                    while (a1 < a[--k]) {

                        a[k + 2] = a[k];    // 这里每次需要向左移动两个元素

                    }

                    a[++k + 1] = a1;

                    // 再把两个数字中较小的那个移动到合适的位置

                    while (a2 < a[--k]) {

                        a[k + 1] = a[k];    // 这里每次需要向左移动一个元素

                    }

                    a[k + 1] = a2;

                }

                int last = a[right];

                while (last < a[--right]) {

                    a[right + 1] = a[right];

                }

                a[right + 1] = last;

            }

            return;

        }

        // length / 7 的一种低复杂度的实现, 近似值(length * 9 / 64 + 1)

        // Inexpensive approximation of length / 7

        int seventh = (length >> 3) + (length >> 6) + 1;

        /*

         * 对5段靠近中间位置的数列排序，这些元素最终会被用来做轴(下面会讲)

         * 他们的选定是根据大量数据积累经验确定的

         *

         * Sort five evenly spaced elements around (and including) the

         * center element in the range. These elements will be used for

         * pivot selection as described below. The choice for spacing

         * these elements was empirically determined to work well on

         * a wide variety of inputs.

         */

        int e3 = (left + right) >>> 1; // The midpoint // 中间值

        int e2 = e3 - seventh;

        int e1 = e2 - seventh;

        int e4 = e3 + seventh;

        int e5 = e4 + seventh;

        // 插入排序

        // Sort these elements using insertion sort

        if (a[e2] < a[e1]) { int t = a[e2]; a[e2] = a[e1]; a[e1] = t; }

        if (a[e3] < a[e2]) { int t = a[e3]; a[e3] = a[e2]; a[e2] = t;

            if (t < a[e1]) { a[e2] = a[e1]; a[e1] = t; }

        }

        if (a[e4] < a[e3]) { int t = a[e4]; a[e4] = a[e3]; a[e3] = t;

            if (t < a[e2]) { a[e3] = a[e2]; a[e2] = t;

                if (t < a[e1]) { a[e2] = a[e1]; a[e1] = t; }

            }

        }

        if (a[e5] < a[e4]) { int t = a[e5]; a[e5] = a[e4]; a[e4] = t;

            if (t < a[e3]) { a[e4] = a[e3]; a[e3] = t;

                if (t < a[e2]) { a[e3] = a[e2]; a[e2] = t;

                    if (t < a[e1]) { a[e2] = a[e1]; a[e1] = t; }

                }

            }

        }

        // 指针

        // Pointers

        int less  = left;  // The index of the first element of center part // 中间区域的首个元素的位置

        int great = right; // The index before the first element of right part //右边区域的首个元素的位置

        if (a[e1] != a[e2] && a[e2] != a[e3] && a[e3] != a[e4] && a[e4] != a[e5]) {

            /*

             * 使用5个元素中的2，4两个位置，他们两个大致处在四分位的位置上

             * 需要注意的是pivot1 <= pivot2

             *

             * Use the second and fourth of the five sorted elements as pivots.

             * These values are inexpensive approximations of the first and

             * second terciles of the array. Note that pivot1 <= pivot2.

             */

            int pivot1 = a[e2];

            int pivot2 = a[e4];

            /*

             * The first and the last elements to be sorted are moved to the

             * locations formerly occupied by the pivots. When partitioning

             * is complete, the pivots are swapped back into their final

             * positions, and excluded from subsequent sorting.

             * 第一个和最后一个元素被放到两个轴所在的位置。当阶段性的分段结束后

             * 他们会被分配到最终的位置并从子排序阶段排除

             */

            a[e2] = a[left];

            a[e4] = a[right];

            /*

             * Skip elements, which are less or greater than pivot values.

             * 跳过一些队首的小于pivot1的值，跳过队尾的大于pivot2的值

             */

            while (a[++less] < pivot1);

            while (a[--great] > pivot2);

            /*

             * Partitioning:

             *

             *   left part           center part                   right part

             * +--------------------------------------------------------------+

             * |  < pivot1  |  pivot1 <= && <= pivot2  |    ?    |  > pivot2  |

             * +--------------------------------------------------------------+

             *               ^                          ^       ^

             *               |                          |       |

             *              less                        k     great

             *

             * Invariants:

             *

             *              all in (left, less)   < pivot1

             *    pivot1 <= all in [less, k)     <= pivot2

             *              all in (great, right) > pivot2

             *

             * Pointer k is the first index of ?-part.

             */

            outer:

            for (int k = less - 1; ++k <= great; ) {

                int ak = a[k];

                if (ak < pivot1) { // Move a[k] to left part

                    a[k] = a[less];

                    /*

                     * Here and below we use "a[i] = b; i++;" instead

                     * of "a[i++] = b;" due to performance issue.

                     * 这里考虑的好细致，"a[i] = b; i++"的效率要好过

                     * 'a[i++] = b'

                     */

                    a[less] = ak;

                    ++less;

                } else if (ak > pivot2) { // Move a[k] to right part

                    while (a[great] > pivot2) {

                        if (great-- == k) { // k遇到great本次分割

                            break outer;

                        }

                    }

                    if (a[great] < pivot1) { // a[great] <= pivot2

                        a[k] = a[less];

                        a[less] = a[great];

                        ++less;

                    } else { // pivot1 <= a[great] <= pivot2

                        a[k] = a[great];

                    }

                    /*

                     * Here and below we use "a[i] = b; i--;" instead

                     * of "a[i--] = b;" due to performance issue.

                     * 同上，用"a[i]=b;i--"代替"a[i--] = b"

                     */

                    a[great] = ak;

                    --great;

                }

            } // 分割阶段结束出来的位置,上一个outer结束的位置

            // 把两个放在外面的轴放回他们应该在的位置上

            // Swap pivots into their final positions

            a[left]  = a[less  - 1]; a[less  - 1] = pivot1;

            a[right] = a[great + 1]; a[great + 1] = pivot2;

            // 把左边和右边递归排序，跟普通的快速排序差不多

            // Sort left and right parts recursively, excluding known pivots

            sort(a, left, less - 2, leftmost);

            sort(a, great + 2, right, false);

            /*

             * If center part is too large (comprises > 4/7 of the array),

             * swap internal pivot values to ends.

             * 如果中心区域太大，超过数组长度的 4/7。就先进行预处理，再参与递归排序

             * 预处理的方法是把等于pivot1的元素统一放到左边，等于pivot2的元素统一

             * 放到右边，最终产生一个不包含pivot1和pivot2的数列，再拿去参与快排中的递归

             */

            if (less < e1 && e5 < great) {

                /*

                 * Skip elements, which are equal to pivot values.

                 */

                while (a[less] == pivot1) {

                    ++less;

                }

                while (a[great] == pivot2) {

                    --great;

                }

                /*

                 * Partitioning:

                 *

                 *   left part         center part                  right part

                 * +----------------------------------------------------------+

                 * | == pivot1 |  pivot1 < && < pivot2  |    ?    | == pivot2 |

                 * +----------------------------------------------------------+

                 *              ^                        ^       ^

                 *              |                        |       |

                 *             less                      k     great

                 *

                 * Invariants:

                 *

                 *              all in (*,  less) == pivot1

                 *     pivot1 < all in [less,  k)  < pivot2

                 *              all in (great, *) == pivot2

                 *

                 * Pointer k is the first index of ?-part.

                 */

                outer:

                for (int k = less - 1; ++k <= great; ) {

                    int ak = a[k];

                    if (ak == pivot1) { // Move a[k] to left part

                        a[k] = a[less];

                        a[less] = ak;

                        ++less;

                    } else if (ak == pivot2) { // Move a[k] to right part

                        while (a[great] == pivot2) {

                            if (great-- == k) {

                                break outer;

                            }

                        }

                        if (a[great] == pivot1) { // a[great] < pivot2

                            a[k] = a[less];

                            /*

                             * Even though a[great] equals to pivot1, the

                             * assignment a[less] = pivot1 may be incorrect,

                             * if a[great] and pivot1 are floating-point zeros

                             * of different signs. Therefore in float and

                             * double sorting methods we have to use more

                             * accurate assignment a[less] = a[great].

                             */

                            a[less] = pivot1;

                            ++less;

                        } else { // pivot1 < a[great] < pivot2

                            a[k] = a[great];

                        }

                        a[great] = ak;

                        --great;

                    }

                } // outer结束的位置

            }

            // Sort center part recursively

            sort(a, less, great, false);

        } else { // Partitioning with one pivot // 这里选取的5个元素刚好相等，使用传统的3-way快排

            /*

             * Use the third of the five sorted elements as pivot.

             * This value is inexpensive approximation of the median.

             * 在5个元素中取中值

             */

            int pivot = a[e3];

            /*

             * Partitioning degenerates to the traditional 3-way

             * (or "Dutch National Flag") schema:

             *

             *   left part    center part              right part

             * +-------------------------------------------------+

             * |  < pivot  |   == pivot   |     ?    |  > pivot  |

             * +-------------------------------------------------+

             *              ^              ^        ^

             *              |              |        |

             *             less            k      great

             *

             * Invariants:

             *

             *   all in (left, less)   < pivot

             *   all in [less, k)     == pivot

             *   all in (great, right) > pivot

             *

             * Pointer k is the first index of ?-part.

             */

            for (int k = less; k <= great; ++k) {

                if (a[k] == pivot) {

                    continue;

                }

                int ak = a[k];

                if (ak < pivot) { // Move a[k] to left part // 把a[k]移动到左边去，把center区向右滚动一个单位

                    a[k] = a[less];

                    a[less] = ak;

                    ++less;

                } else { // a[k] > pivot - Move a[k] to right part // 把a[k]移动到右边

                    while (a[great] > pivot) {  // 先找到右边最后一个比pivot小的值

                        --great;

                    }

                    if (a[great] < pivot) { // a[great] <= pivot    把他移到左边

                        a[k] = a[less];

                        a[less] = a[great];

                        ++less;

                    } else { // a[great] == pivot   //如果相等，中心区直接扩展

                        /*

                         * Even though a[great] equals to pivot, the

                         * assignment a[k] = pivot may be incorrect,

                         * if a[great] and pivot are floating-point

                         * zeros of different signs. Therefore in float

                         * and double sorting methods we have to use

                         * more accurate assignment a[k] = a[great].

                         * 这里因为是整型值，所以a[k] == a[less] == pivot

                         */

                        a[k] = pivot;

                    }

                    a[great] = ak;

                    --great;

                }

            }

            /*

             * Sort left and right parts recursively.

             * All elements from center part are equal

             * and, therefore, already sorted.

             * 左右两边还没有完全排序，所以递归解决

             * 中心区只有一个值，不再需要排序

             */

            sort(a, left, less - 1, leftmost);

            sort(a, great + 1, right, false);

        }

    }

}

JDK（四）JDK1.8源码分析【排序】DualPivotQuicksort的更多相关文章

【集合框架】JDK1.8源码分析之IdentityHashMap（四）
一.前言前面已经分析了HashMap与LinkedHashMap,现在我们来分析不太常用的IdentityHashMap,从它的名字上也可以看出来用于表示唯一的HashMap,仔细分析了其源码,发现 ...
【集合框架】JDK1.8源码分析之HashMap（一）转载
[集合框架]JDK1.8源码分析之HashMap(一) 一.前言在分析jdk1.8后的HashMap源码时,发现网上好多分析都是基于之前的jdk,而Java8的HashMap对之前做了较大的优化 ...
集合之TreeSet（含JDK1.8源码分析）
一.前言前面分析了Set接口下的hashSet和linkedHashSet,下面接着来看treeSet,treeSet的底层实现是基于treeMap的. 四个关注点在treeSet上的答案二.tr ...
【集合框架】JDK1.8源码分析之TreeMap（五）
一.前言当我们需要把插入的元素进行排序的时候,就是时候考虑TreeMap了,从名字上来看,TreeMap肯定是和树是脱不了干系的,它是一个排序了的Map,下面我们来着重分析其源码,理解其底层如何实现 ...
【集合框架】JDK1.8源码分析HashSet && LinkedHashSet（八）
一.前言分析完了List的两个主要类之后,我们来分析Set接口下的类,HashSet和LinkedHashSet,其实,在分析完HashMap与LinkedHashMap之后,再来分析HashSet ...
【JUC】JDK1.8源码分析之ArrayBlockingQueue（三）
一.前言在完成Map下的并发集合后,现在来分析ArrayBlockingQueue,ArrayBlockingQueue可以用作一个阻塞型队列,支持多任务并发操作,有了之前看源码的积累,再看Arra ...
集合之LinkedHashSet（含JDK1.8源码分析）
一.前言上篇已经分析了Set接口下HashSet,我们发现其操作都是基于hashMap的,接下来看LinkedHashSet,其底层实现都是基于linkedHashMap的. 二.linkedHas ...
集合之HashSet（含JDK1.8源码分析）
一.前言我们已经分析了List接口下的ArrayList和LinkedList,以及Map接口下的HashMap.LinkedHashMap.TreeMap,接下来看的是Set接口下HashSet和 ...
集合之TreeMap（含JDK1.8源码分析）
一.前言前面所说的hashMap和linkedHashMap都不具备统计的功能,或者说它们的统计性能的时间复杂度都不是很好,要想对两者进行统计,需要遍历所有的entry,时间复杂度比较高,此时,我们 ...
集合之LinkedHashMap（含JDK1.8源码分析）
一.前言大多数的情况下,只要不涉及线程安全问题,map都可以使用hashMap,不过hashMap有一个问题,hashMap的迭代顺序不是hashMap的存储顺序,即hashMap中的元素是无序的. ...

随机推荐

工厂方法模式(GOF23)
耦合关系直接决定着软件面对变化时的行为主要对模块之间的关系进行整理,依赖关系倒置(依赖反转),变化快的东西不能影响到变化慢的东西用封装机制来隔离易变的对象,抽象部分(不易变)和细节部分(可能容易变 ...
ES6学习笔记（六）-数组扩展
canvas toDataURL() 方法如何生成部分画布内容的图片
HTMLCanvasElement.toDataURL() 方法返回一个包含图片展示的 data URI .可以使用 type参数其类型,默认为 PNG 格式.图片的分辨率为96dpi. 如果画布的高 ...
CentOS网卡显示为__tmpxxxxxxxx
一台服务器做了2组端口绑定(bonding),其中一组bond总是不成功,发现少了eth0/eth5 两个网卡,后来通过ifconfig -a 发现多了两个__tmpxxx的网卡 ifconfig - ...
Selenium之TestNG安装
一.在Eclipse中安装TestNG 1.打开eclipse-->help-->Install New Software-->Add,输入Name和Location后,点击OK. ...
IOC和AOP的个人理解
IOC,依赖倒置的意思,所谓依赖,从程序的角度看,就是比如A要调用B的方法,那么A就依赖于B,反正A要用到B,则A依赖于B. 所谓倒置,你必须理解如果不倒置,会怎么着,因为A必须要有B,才可以调用B, ...
PRINCE2的优势有哪些？
PRINCE2之所以迅速发展的原因之一是许多企业认识到建立适合自己企业的项目管理标准是一项耗时耗财的工作. 他们至少要花费6-12个月.成千上万个工时来建立一套方法,而这只是最初的成本. 之后他们必须 ...
google学习
https://developers.google.com/machine-learning/crash-course/ https://developers.google.com/machine-l ...
Project Euler 44: Find the smallest pair of pentagonal numbers whose sum and difference is pentagonal.
In Problem 42 we dealt with triangular problems, in Problem 44 of Project Euler we deal with pentago ...
windows程序设置开机自动启动
//调用方法:设置开机启动 SetAutoRun(Process.GetCurrentProcess().ProcessName, true, Application.StartupPath + @& ...

JDK（四）JDK1.8源码分析【排序】DualPivotQuicksort

JDK（四）JDK1.8源码分析【排序】DualPivotQuicksort的更多相关文章

随机推荐

热门专题