一种最坏情况线性运行时间的选择算法 - The missing worst-case linear-time Select algorithm in CLRS.

选择算法也就是求一个无序数组中第K大（小）的元素的值的算法，同通常的Top K等算法密切相关。

在CLRS中提到了一种最坏情况线性运行时间的选择算法，在书中给出了如下的文字描述（没有直接给出伪代码）。

1.Divide n elements into groups of 5

2.Find median of each group (How?  How long?)

3.Use Select() recursively to find median x of the n/5 medians

4.Partition the n elements around x.  Let k = rank(x)

5.if (i == k) then return x

  if (i < k) then use Select() recursively to find ith smallest element in first partition
    else (i > k) use Select() recursively to find (i-k)th smallest element in last partition

后来在MIT公开课网站上找到了这个课件。给出了几段伪代码，但是比上面的文字描述强不了多少；

得了还是自己动手丰衣足食吧；

代码中给出了同STL标准Select算法、一般SELECT和随机化的SELECT算法的运行效率比较分析；

现以C++实现了该选择算法，以供参考。

//determines the ith smallest element of an input array.

#include <iostream>

#include <algorithm>

#include <ctime>

#include <cassert>

using namespace std;

#include "randomized_select.cpp"

size_t my_ceil(size_t m, size_t n)

{

    size_t mul, rem;

    mul = m / n;

    rem = m % n;

    if (rem > 0)

        mul++;

    return mul;

}

size_t my_floor(size_t m, size_t n)

{

    return m / n;

}

template <class T>

void exchange(T& x, T& y)

{

    T t = x; x = y; y = t;

}

template <class T>

void insertion_sort(T *a, size_t n)

{

    int i, j;

    for (j = 1; j < (int) n; ++j) {

        T key = a[j];

        // insert a[j] into the sorted sequence a[0..j-1].

        i = j - 1;

        while (i >= 0 && a[i] > key) { //>= or >

            a[i + 1] = a[i];

            i--;

        }

        a[i + 1] = key;

    }

}

template <class T>

size_t partition(T *a, size_t p, size_t r)

{

    T x = a[r];

    int i = p - 1, j; //XXX in case of negative.

    for (j = p; j < (int) r; ++j)

    if (a[j] <= x) //XXX <= or <

        exchange(a[++i], a[j]);

    exchange(a[i + 1], a[r]);

    return i + 1;

}

// find index of element x in array a of size n.

template <class T>

size_t index_of(T *a, size_t n, T x)

{

    size_t i;

    for (i = 0; i < n; ++i)

    if (a[i] == x)

        return i;

    return i;

}

// partition around pivot on index i

template <class T>

size_t partition_x(T *a, size_t n, T x)

{

    if (n == 1) //XXX zero based index.

        return 0;

    size_t p, r, i; // [p, r]

    p = 0;

    r = n - 1;

    i = index_of(a, n, x);

    if (i >= n)

        cout << "i=" << i << "; n=" << n << endl;

    assert(i < n); //XXX not found!

    exchange(a[i], a[r]);

    return partition(a, p, r);

}

#ifndef NDEBUG

#define NDEBUG

#endif

// select from [p,r]

template <class T>

T select(T *a, size_t n, size_t i)

{

    if (n < 2) // 0,1

        return a[0]; //XXX

    // 1: divide n elements into groups of 5.

    size_t ng = my_floor(n, 5); // ng group with 5 elements.

    size_t ng1 = my_ceil(n, 5); // ng1 >= ng.

    size_t rem = n - ng * 5; // elements in last group.

#ifndef NDEBUG

    cout << "i=" << i << "; n=" << n << "; ng=" << ng << "; ng1=" << ng1 << endl;

#endif

    // 2: find median of each group.

    // sort each group.

    for (size_t j = 0; j < ng; ++j)

        insertion_sort(&a[5 * j], 5);

    //   sort last group(or only group).

    if (rem > 0)

        insertion_sort(&a[5 * ng], rem);

    // picking medians

    T *medians = new T[ng1];

    for (size_t j = 0; j < ng; ++j)

        medians[j] = a[5 * j + 2];

    //   picking the last median.

    if (rem > 0) {

        size_t last_mi = ng * 5 - 1 + my_ceil(rem, 2); // (start+end)/2 floor.

        medians[ng] = a[last_mi];

    }

    // 3: use select recursively to find median x of the medians.

    // partition on median-of-medians.

    size_t mmi = my_ceil(n, 10) - 1;

    T x = select(medians, ng1, mmi);

    delete [] medians; //XXX break;

#ifndef NDEBUG

    cout << "(before partition); x=" << x.value << ";" << endl;

    for (size_t y = 0; y < 5; y++) {

        for (size_t x = 0; x < ng; x++)

        if (gp[x].e[y].is_infinite)

            cout << "*" << "\t";

        else

            cout << gp[x].e[y].value << "\t";

        cout << endl;

    }

#endif

    // 4: partition input array around pivot x.

    size_t k = partition_x(a, n, x);

#ifndef NDEBUG

    cout << "(after partition); k=" << k << endl;

    for (size_t y = 0; y < 5; y++) {

        for (size_t x = 0; x < ng; x++)

        if (gp[x].e[y].is_infinite)

            cout << "*" << "\t";

        else

            cout << gp[x].e[y].value << "\t";

        cout << endl;

    }

    cout << endl;

#endif

    if (i == k) // DONE!

        return a[k]; // value of pivot.

    else if (i < k) // L array.

        return select(a, k, i);

    else // if (i > k) // G array,

        return select(a + k + 1, n - k - 1, i - k - 1);

}

// ith from 0.

template <class T>

T try_select(T *a, size_t n, size_t ith)

{

    clock_t beg = clock();

    T x = select(a, n, ith); // [0,n-1]

    clock_t end = clock();

    cout << "Select take " << (double) (end - beg) / CLOCKS_PER_SEC << " sec." << endl;

    return x;

}

// ith start from 0.

int try_std_select(int *a, size_t n, size_t ith)

{

    clock_t beg = clock();

    nth_element(a, a + ith + 1, a + n);

    clock_t end = clock();

    cout << "STD take " << (double) (end - beg) / CLOCKS_PER_SEC << " sec." << endl;

    return a[ith];

}

// ith start from 1.

int try_rand_select(int *a, size_t n, size_t ith)

{

    clock_t beg = clock();

    int ie = randsel::randomized_select(a, 0, n - 1, ith + 1);

    clock_t end = clock();

    cout << "random select take " << (double) (end - beg) / CLOCKS_PER_SEC << " sec." << endl;

    return ie;

}

void select_test()

{

    cout << "please choose the select algorithm: " << endl

        << "1: my select(default); 2: std nth_element; 3: random select;" << endl;

    int choice;

    cin >> choice;

#define _1K (1024)

#define _1M (_1K*_1K)

#define _1G (_1K*_1M)

#define _256M (256*_1M) //2^26

    // 1k, 1m, 256m

    for (size_t n = _1K; n <= _256M; ) {

        int *a = new int[n];

        for (size_t j = 0; j < n; ++j)

            a[j] = j * 10;

        random_shuffle(a, a + n);

        int ith = n / 2;

        cout << endl << "=============== n=" << n << ", i=" << ith << " ==============" << endl;

        int ie;

        switch (choice)

        {

        case 1:

            ie = try_select(a, n, ith); // zero based index.

            break;

        case 2:

            ie = try_std_select(a, n, ith);

            break;

        case 3:

            ie = try_rand_select(a, n, ith);

            break;

        default:

            ie = try_select(a, n, ith); // zero based index.

            break;

        }

        cout << ith << "-th:" << ie << endl;

        assert(ie == ith * 10);

        delete [] a;

        // level.

        if (n == _1K)

            n = _1M;

        else if (n == _1M)

            n = _256M;

        else if (n == _256M)

            n = _1G;

        else

            n = _1G;

    }

}

标签: 算法

一种最坏情况线性运行时间的选择算法 - The missing worst-case linear-time Select algorithm in CLRS.的更多相关文章

快速排序以及第k小元素的线性选择算法
简要介绍下快速排序的思想:通过一趟排序将要排序的数据分割成独立的两部分,其中一部分的所有数据都比另外一部分的所有数据都要小,然后再按此方法对这两部分数据分别进行快速排序,整个排序过程可以递归进行,以此 ...
QString内部仍采用UTF-16存储数据且不会改变（一共10种不同情况下的编码）
出处:https://blog.qt.io/cn/2012/05/16/source-code-must-be-utf-8-and-qstring-wants-it/ 但是注意,这只是QT运行(Run ...
POJ3783Balls[DP 最坏情况最优解]
Balls Time Limit: 1000MS Memory Limit: 65536K Total Submissions: 907 Accepted: 598 Description T ...
移动Web开发图片自适应两种常见情况解决方案
本文主要说的是Web中图片根据手机屏幕大小自适应居中显示,图片自适应两种常见情况解决方案.开始吧在做配合手机客户端的Web wap页面时,发现文章对图片显示的需求有两种特别重要的情况,一是对于图集, ...
三种计算c#程序运行时间的方法
三种计算c#程序运行时间的方法第一种: 利用 System.DateTime.Now // example1: System.DateTime.Now method DateTime dt1 = S ...
react组件之间的几种通信情况
组件之间的几种通信情况父组件向子组件通信子组件向父组件通信跨级组件通信没有嵌套关系组件之间的通信 1,父组件向子组件传递 React数据流动是单向的,父组件向子组件通信也是最常见的;父组件通过 ...
移动站Web开发图片自适应两种常见情况解决方案
本文主要说的是Web中图片根据手机屏幕大小自适应居中显示,图片自适应两种常见情况解决方案.开始吧在做配合手机客户端的Web wap页面时,发现文章对图片显示的需求有两种特别重要的情况,一是对于图集, ...
coredump产生的几种可能情况
coredump产生的几种可能情况造成程序coredump的原因有很多,这里总结一些比较常用的经验吧: 1,内存访问越界 a) 由于使用错误的下标,导致数组访问越界. b) 搜索字符串时,依靠字符串 ...
k个鸡蛋从N楼层摔，如果确定刚好摔碎的那个楼层，最坏情况下最少要试验x次？
题目 k个鸡蛋从N楼层摔,如果确定刚好摔碎的那个楼层,最坏情况下最少要试验x次? 换个说法: k个鸡蛋试验x次最多可以检测N层楼.计算出N? 逆向思维和数学公式解. 分析定义N(k,x) 如果第k个 ...

随机推荐

ASP.NET MVC+EF框架+EasyUI实现权限管理系列(14)-主框架搭建
原文:ASP.NET MVC+EF框架+EasyUI实现权限管理系列(14)-主框架搭建 ASP.NET MVC+EF框架+EasyUI实现权限管系列 (开篇) (1):框架搭建 (2 ...
validateRequest="false"属性及xss攻击
validateRequest="false" 指是否要IIS验证页面提交的非法字符,比如:>,<号等,当我们需要将一定格式得html代码获得,插入数据库时候,就要 ...
jQuery遍历table中间tr td并获得td价值
jQuery遍历table中间tr td并获得td中间值 $(function(){ $("#tableId tr").find("td").each(func ...
C++ 对象模型具体评论（特别easy理解力）
c++对象模型系列转一.指针与引用一概括指针和引用,在C++的软件开发中非经常见,假设能恰当的使用它们可以极大的提高整个软件的效率,可是非常多的C++学习者对它们的各种使用情况并非都了解, ...
bootstrap标准模板
<!DOCTYPE html> <html lang="en"> <head> <meta cha ...
java中HashSet详解
HashSet 的实现对于 HashSet 而言,它是基于 HashMap 实现的,HashSet 底层采用 HashMap 来保存所有元素,因此 HashSet 的实现比较简单,查看 HashSe ...
POJ 2777 Count Color（线段树+位运算）
题目链接:http://poj.org/problem?id=2777 Description Chosen Problem Solving and Program design as an opti ...
installshield 32位打包和64位打包的注意事项
原文:installshield 32位打包和64位打包的注意事项 32/64位问题要把握几点:1. 明确你的产品是否需要区分32/64位2. 明确你的产品中是否有32/64位的服务注册3. 了解In ...
LINQ To SQL在N层应用程序中的CUD操作、批量删除、批量更新
原文:LINQ To SQL在N层应用程序中的CUD操作.批量删除.批量更新 0. 说明 Linq to Sql,以下简称L2S. 以下文中所指的两层和三层结构,分别如下图所示: 准确的说,这里 ...
ActiveReports 9实战教程（2）：准备数据源（设计时、运行时）
原文:ActiveReports 9实战教程(2): 准备数据源(设计时.运行时) 在上讲中<ActiveReports 9实战教程(1): 手把手搭建环境Visual Studio 2013 ...

一种最坏情况线性运行时间的选择算法 - The missing worst-case linear-time Select algorithm in CLRS.

一种最坏情况线性运行时间的选择算法 - The missing worst-case linear-time Select algorithm in CLRS.

一种最坏情况线性运行时间的选择算法 - The missing worst-case linear-time Select algorithm in CLRS.的更多相关文章

随机推荐

热门专题