Recently i was doing some study on algorithms. A classic problem is to find the K largest(smallest) numbers from an array. I mainly studyed two methods, one is directly methold. It is an extension of select sort, always select the largest number from the array. The pseudo code is as below. The algorithm complexity is O(kn).

 function select(list[1..n], k)
for i from 1 to k
minIndex = i
minValue = list[i]
for j from i+1 to n
if list[j] < minValue
minIndex = j
minValue = list[j]
swap list[i] and list[minIndex]
return list[k] The C++ implementation is
template<typename T>
std::vector<T> SelectLargestKItem(const std::vector<T> &vecInput, size_t K, std::vector<int> &vecIndex)
{
if (K > vecInput.size())
return vecInput; std::vector<T> vecLocal(vecInput);
std::vector<T> vecResult;
for (size_t k = ; k < K; ++ k)
{
T maxValue = vecLocal[k];
int maxIndex = k;
for (size_t i = k + ; i < vecLocal.size(); ++i) {
if (vecLocal[i] > maxValue) {
maxValue = vecLocal[i];
maxIndex = i;
}
}
if (maxIndex != k)
std::swap(vecLocal[maxIndex], vecLocal[k]);
vecResult.push_back( maxValue );
vecIndex.push_back( maxIndex );
}
return vecResult;
}

When the total number of N is very large, such as N > 200,000. And the numbers need to select K is larger than 20, then the above algorithm will become time consuming. After do some research, i choose another algorithm to do the job. This method is a extension of heap sort. The steps work as below:

1) Build a Min Heap MH of the first k elements (arr[0] to arr[k-1]) of the given array. O(k)

2) For each element, after the kth element (arr[k] to arr[n-1]), compare it with root of MH.
……a) If the element is greater than the root then make it root and call heapifyfor MH
……b) Else ignore it.
// The step 2 is O((n-k)*logk)

3) Finally, MH has k largest elements and root of the MH is the kth largest element.

Time Complexity: O(k + (n-k)Logk) without sorted output. If sorted output is needed then O(k + (n-k)Logk + kLogk).

The C++ implementation of the method is as below:

// To heapify a subtree rooted with node i which is
// an index in arr[]. n is size of heap
template<typename T>
void heapifyMinToRoot(std::vector<T> &vecInput, const int n, const int i, std::vector<int> &vecIndex)
{
int smallestIndex = i; // Initialize largest as root
int l = * i + ; // left = 2*i + 1
int r = * i + ; // right = 2*i + 2 // If left child is larger than root
if (l < n && vecInput[l] < vecInput[smallestIndex])
smallestIndex = l; // If right child is larger than largest so far
if (r < n && vecInput[r] < vecInput[smallestIndex])
smallestIndex = r; // If largest is not root
if (smallestIndex != i)
{
std::swap(vecInput[i], vecInput[smallestIndex]);
std::swap(vecIndex[i], vecIndex[smallestIndex]); // Recursively heapify the affected sub-tree
heapifyMinToRoot(vecInput, n, smallestIndex, vecIndex);
}
} template<typename T>
std::vector<T> SelectLargestKItemHeap(const std::vector<T> &vecInput, const size_t K, std::vector<int> &vecIndex)
{
if (K > vecInput.size()) {
std::vector<T> vecResult(vecInput);
std::sort(vecResult.begin(), vecResult.end());
std::reverse(vecResult.begin(), vecResult.end());
for (size_t i = ; i < vecInput.size(); ++i)
vecIndex.push_back(i);
return vecResult;
} std::vector<T> vecLocal(vecInput);
std::vector<T> vecResult(vecInput.begin(), vecInput.begin() + K);
vecIndex.clear();
for (size_t i = ; i < K; ++ i) vecIndex.push_back(i); for (int K1 = K / - ; K1 >= ; -- K1)
heapifyMinToRoot(vecResult, K, K1, vecIndex); for (size_t i = K; i < vecLocal.size(); ++ i) {
if (vecLocal[i] > vecResult[]) {
vecResult[] = vecLocal[i];
vecIndex[] = i; for (int K1 = K / - ; K1 >= ; -- K1)
heapifyMinToRoot(vecResult, K, K1, vecIndex);
}
}
for (int k = K - ; k >= ; -- k )
{
std::swap(vecResult[k], vecResult[]);
std::swap(vecIndex[k], vecIndex[]); heapifyMinToRoot(vecResult, k, , vecIndex);
} return vecResult;
}

Here is the code to test these two methods.

void SelectionAlgorithmBenchMark()
{
int N = ;
std::vector<int> vecInput; std::minstd_rand0 generator();
for (int i = ; i < N; ++i)
{
int nValue = generator();
vecInput.push_back(nValue );
}
std::vector<int> vecResult, vecIndex;
int K = ;
CStopWatch stopWatch;
vecResult = SelectLargestKItem<int>(vecInput, K, vecIndex);
std::cout << "Standard algorithm SelectLargestKItem takes " << stopWatch.Now() << " ms" << std::endl;
for (int k = ; k < K; ++k)
{
std::cout << "Index " << vecIndex[k] << ", value " << vecResult[k] << std::endl;
}
std::cout << std::endl; stopWatch.Start();
vecResult = SelectLargestKItemHeap<int>(vecInput, K, vecIndex);
std::cout << "Heap algorithm SelectLargestKItem takes " << stopWatch.Now() << " ms" << std::endl;
for (int k = ; k < K; ++k)
{
std::cout << "Index " << vecIndex[k] << ", value " << vecResult[k] << std::endl;
}
}

When N is 200000, K is 20, the first method takes 353ms, the second method takes 31ms. The difference is more than 10 times.

Find the largest K numbers from array (找出数组中最大的K个值)的更多相关文章

  1. 215. Kth Largest Element in an Array找出数组中第k大的值

    堆排序做的,没有全部排序,找到第k个就结束 public int findKthLargest(int[] nums, int k) { int num = 0; if (nums.length &l ...

  2. [LeetCode] Find All Numbers Disappeared in an Array 找出数组中所有消失的数字

    Given an array of integers where 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and ot ...

  3. [LeetCode] Find All Duplicates in an Array 找出数组中所有重复项

    Given an array of integers, 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and others ...

  4. 442. Find All Duplicates in an Array找出数组中所有重复了两次的元素

    [抄题]: Given an array of integers, 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and o ...

  5. 前端算法题:找出数组中第k大的数字出现多少次

    题目:给定一个一维数组,如[1,2,4,4,3,5],找出数组中第k大的数字出现多少次. 例如:第2大的数是4,出现2次,最后输出 4,2 function getNum(arr, k){ // 数组 ...

  6. 【Java】 剑指offer(1) 找出数组中重复的数字

    本文参考自<剑指offer>一书,代码采用Java语言. 更多:<剑指Offer>Java实现合集 题目 在一个长度为n的数组里的所有数字都在0到n-1的范围内.数组中某些数字 ...

  7. 《剑指offer》第三_一题(找出数组中重复的数字,可改变数组)

    // 面试题3(一):找出数组中重复的数字 // 题目:在一个长度为n的数组里的所有数字都在0到n-1的范围内.数组中某些数字是重复的,但不知道有几个数字重复了, // 也不知道每个数字重复了几次.请 ...

  8. 1. 找出数组中的单身狗OddOccurrencesInArray Find value that occurs in odd number of elements.

    找出数组中的单身狗: 1. OddOccurrencesInArray Find value that occurs in odd number of elements. A non-empty ze ...

  9. 【Offer】[3-1] 【找出数组中重复的数字】

    题目描述 思路 Java代码 代码链接 题目描述 在一个长度为n的数组里的所有数字都在0~n-1的范围内.数组中某些数字是重复的,但不知道有几个数字重复了,也不知道每个数字重复了几次. 请找出数组中任 ...

随机推荐

  1. Mybatis Generator自动生成的mapper只有insert方法

    – Mybatis Generator 生成的mapper只有insert方法 – 首先检查generatorConfig.xml中table项中的属性 enableSelectByPrimaryKe ...

  2. 在工作有时候centos6.5系统使用rpm包安装mysql5.7出现的问题

    首先说明一下,我用的CentOS版本是6.6,64位.打印机驱动程序是两个rpm安装包:cndrvcups-common-2.60-1.x86_64.rpm和cndrvcups-capt-2.60-1 ...

  3. Linux的Service/Daemon你真的懂了吗?

    一 service与一般的程序的区别 service(也称为daemon)表示后台运行的程序,一般随系统的启动自动地启动且在用户logoff后仍然能够继续运行.该daemon进程一般在启动后需要与父进 ...

  4. linux下关于svn提交的时候强制写注释

    在svn版本库的hooks文件夹下面,复制模版pre-commit.tmpl cp pre-commit.tmpl pre-commit chmod 777 pre-commit 1 2 1 2 na ...

  5. SVN代码回滚命令之---"svn up ./ -r 版本号"---OK

    一.改动还没被提交的情况(未commit) 这种情况下,见有的人的做法是删除work copy中文件,然后重新update,恩,这种做法达到了目的,但不优雅,因为这种事没必要麻烦服务端. 其实一个命令 ...

  6. Error #2044: 未处理的 IOErrorEvent:。 text=Error #2035: 找不到 URL这是flash加载外部资源时有时会遇到的问题,对于此问题解决如下

    导致这个错误的主要原因是未添加IOErrorEvent事件监听,或者添加了监听,但是加载时使用了unload() 参考资料: http://blog.csdn.net/chjh0540237/arti ...

  7. MVC中关于Membership类跟数据库的问题

    Membership它们用的是ASPNETDB这个数据库,但我们可以使用我们自定义的数据库,然而除非我们自定义的数据库有着跟这个ASPNETDB一样的模式,否则ASP.NET提供的默认的SqlMemb ...

  8. Windows服务器上使用bat定时执行php

    windows上和linux上有一个类似的cmd和bat文件,bat文件类似于shell文件,执行这个bat文件,就相当于依次执行里面的命令(当然,还可以通过逻辑来实现编程),所以,我们可以利用bat ...

  9. javascript中日期格式与时间戳之间的转化

    日期格式与时间戳之间的转化 一:日期格式转化为时间戳 function timeTodate(date) { var new_str = date.replace(/:/g,'-'); new_str ...

  10. dubbo工作原理

    part -- 外挂1.dubbo借助spring的schema启动和初始化 1.1 spring扫描所有jar下META-INF的spring.handlers和spring.schemas. 1. ...