Find the largest K numbers from array (找出数组中最大的K个值)
Recently i was doing some study on algorithms. A classic problem is to find the K largest(smallest) numbers from an array. I mainly studyed two methods, one is directly methold. It is an extension of select sort, always select the largest number from the array. The pseudo code is as below. The algorithm complexity is O(kn).
function select(list[1..n], k)
for i from 1 to k
minIndex = i
minValue = list[i]
for j from i+1 to n
if list[j] < minValue
minIndex = j
minValue = list[j]
swap list[i] and list[minIndex]
return list[k] The C++ implementation is
template<typename T>
std::vector<T> SelectLargestKItem(const std::vector<T> &vecInput, size_t K, std::vector<int> &vecIndex)
{
if (K > vecInput.size())
return vecInput; std::vector<T> vecLocal(vecInput);
std::vector<T> vecResult;
for (size_t k = ; k < K; ++ k)
{
T maxValue = vecLocal[k];
int maxIndex = k;
for (size_t i = k + ; i < vecLocal.size(); ++i) {
if (vecLocal[i] > maxValue) {
maxValue = vecLocal[i];
maxIndex = i;
}
}
if (maxIndex != k)
std::swap(vecLocal[maxIndex], vecLocal[k]);
vecResult.push_back( maxValue );
vecIndex.push_back( maxIndex );
}
return vecResult;
}
When the total number of N is very large, such as N > 200,000. And the numbers need to select K is larger than 20, then the above algorithm will become time consuming. After do some research, i choose another algorithm to do the job. This method is a extension of heap sort. The steps work as below:
1) Build a Min Heap MH of the first k elements (arr[0] to arr[k-1]) of the given array. O(k)
2) For each element, after the kth element (arr[k] to arr[n-1]), compare it with root of MH.
……a) If the element is greater than the root then make it root and call heapifyfor MH
……b) Else ignore it.
// The step 2 is O((n-k)*logk)
3) Finally, MH has k largest elements and root of the MH is the kth largest element.
Time Complexity: O(k + (n-k)Logk) without sorted output. If sorted output is needed then O(k + (n-k)Logk + kLogk).
The C++ implementation of the method is as below:
// To heapify a subtree rooted with node i which is
// an index in arr[]. n is size of heap
template<typename T>
void heapifyMinToRoot(std::vector<T> &vecInput, const int n, const int i, std::vector<int> &vecIndex)
{
int smallestIndex = i; // Initialize largest as root
int l = * i + ; // left = 2*i + 1
int r = * i + ; // right = 2*i + 2 // If left child is larger than root
if (l < n && vecInput[l] < vecInput[smallestIndex])
smallestIndex = l; // If right child is larger than largest so far
if (r < n && vecInput[r] < vecInput[smallestIndex])
smallestIndex = r; // If largest is not root
if (smallestIndex != i)
{
std::swap(vecInput[i], vecInput[smallestIndex]);
std::swap(vecIndex[i], vecIndex[smallestIndex]); // Recursively heapify the affected sub-tree
heapifyMinToRoot(vecInput, n, smallestIndex, vecIndex);
}
} template<typename T>
std::vector<T> SelectLargestKItemHeap(const std::vector<T> &vecInput, const size_t K, std::vector<int> &vecIndex)
{
if (K > vecInput.size()) {
std::vector<T> vecResult(vecInput);
std::sort(vecResult.begin(), vecResult.end());
std::reverse(vecResult.begin(), vecResult.end());
for (size_t i = ; i < vecInput.size(); ++i)
vecIndex.push_back(i);
return vecResult;
} std::vector<T> vecLocal(vecInput);
std::vector<T> vecResult(vecInput.begin(), vecInput.begin() + K);
vecIndex.clear();
for (size_t i = ; i < K; ++ i) vecIndex.push_back(i); for (int K1 = K / - ; K1 >= ; -- K1)
heapifyMinToRoot(vecResult, K, K1, vecIndex); for (size_t i = K; i < vecLocal.size(); ++ i) {
if (vecLocal[i] > vecResult[]) {
vecResult[] = vecLocal[i];
vecIndex[] = i; for (int K1 = K / - ; K1 >= ; -- K1)
heapifyMinToRoot(vecResult, K, K1, vecIndex);
}
}
for (int k = K - ; k >= ; -- k )
{
std::swap(vecResult[k], vecResult[]);
std::swap(vecIndex[k], vecIndex[]); heapifyMinToRoot(vecResult, k, , vecIndex);
} return vecResult;
}
Here is the code to test these two methods.
void SelectionAlgorithmBenchMark()
{
int N = ;
std::vector<int> vecInput; std::minstd_rand0 generator();
for (int i = ; i < N; ++i)
{
int nValue = generator();
vecInput.push_back(nValue );
}
std::vector<int> vecResult, vecIndex;
int K = ;
CStopWatch stopWatch;
vecResult = SelectLargestKItem<int>(vecInput, K, vecIndex);
std::cout << "Standard algorithm SelectLargestKItem takes " << stopWatch.Now() << " ms" << std::endl;
for (int k = ; k < K; ++k)
{
std::cout << "Index " << vecIndex[k] << ", value " << vecResult[k] << std::endl;
}
std::cout << std::endl; stopWatch.Start();
vecResult = SelectLargestKItemHeap<int>(vecInput, K, vecIndex);
std::cout << "Heap algorithm SelectLargestKItem takes " << stopWatch.Now() << " ms" << std::endl;
for (int k = ; k < K; ++k)
{
std::cout << "Index " << vecIndex[k] << ", value " << vecResult[k] << std::endl;
}
}
When N is 200000, K is 20, the first method takes 353ms, the second method takes 31ms. The difference is more than 10 times.
Find the largest K numbers from array (找出数组中最大的K个值)的更多相关文章
- 215. Kth Largest Element in an Array找出数组中第k大的值
堆排序做的,没有全部排序,找到第k个就结束 public int findKthLargest(int[] nums, int k) { int num = 0; if (nums.length &l ...
- [LeetCode] Find All Numbers Disappeared in an Array 找出数组中所有消失的数字
Given an array of integers where 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and ot ...
- [LeetCode] Find All Duplicates in an Array 找出数组中所有重复项
Given an array of integers, 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and others ...
- 442. Find All Duplicates in an Array找出数组中所有重复了两次的元素
[抄题]: Given an array of integers, 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and o ...
- 前端算法题:找出数组中第k大的数字出现多少次
题目:给定一个一维数组,如[1,2,4,4,3,5],找出数组中第k大的数字出现多少次. 例如:第2大的数是4,出现2次,最后输出 4,2 function getNum(arr, k){ // 数组 ...
- 【Java】 剑指offer(1) 找出数组中重复的数字
本文参考自<剑指offer>一书,代码采用Java语言. 更多:<剑指Offer>Java实现合集 题目 在一个长度为n的数组里的所有数字都在0到n-1的范围内.数组中某些数字 ...
- 《剑指offer》第三_一题(找出数组中重复的数字,可改变数组)
// 面试题3(一):找出数组中重复的数字 // 题目:在一个长度为n的数组里的所有数字都在0到n-1的范围内.数组中某些数字是重复的,但不知道有几个数字重复了, // 也不知道每个数字重复了几次.请 ...
- 1. 找出数组中的单身狗OddOccurrencesInArray Find value that occurs in odd number of elements.
找出数组中的单身狗: 1. OddOccurrencesInArray Find value that occurs in odd number of elements. A non-empty ze ...
- 【Offer】[3-1] 【找出数组中重复的数字】
题目描述 思路 Java代码 代码链接 题目描述 在一个长度为n的数组里的所有数字都在0~n-1的范围内.数组中某些数字是重复的,但不知道有几个数字重复了,也不知道每个数字重复了几次. 请找出数组中任 ...
随机推荐
- Mac10.9用brew搭建Eclipse4.4+Maven3.2.3+JDK1.8运行环境
--------------------------------------- 博文作者:迦壹 博客标题:Mac10.9用brew搭建Eclipse4.4+Maven3.2.3+JDK1.8运行环境 ...
- SpringMVC 拦截器不拦截静态资源的三种处理方式
SpringMVC提供<mvc:resources>来设置静态资源,但是增加该设置如果采用通配符的方式增加拦截器的话仍然会被拦截器拦截,可采用如下方案进行解决: 方案一.拦截器中增加针对静 ...
- Shell编程菜鸟基础入门笔记
Shell编程基础入门 1.shell格式:例 shell脚本开发习惯 1.指定解释器 #!/bin/bash 2.脚本开头加版权等信息如:#DATE:时间,#author(作者)#mail: ...
- MySql 存储过程及调用方法
存储过程实例: DELIMITER $$drop procedure if exists ff $$CREATE /*[DEFINER = { user | CURRENT_USER }]*/ PRO ...
- bzoj1091: [SCOI2003]切割多边形
Description 有一个凸p边形(p<=8),我们希望通过切割得到它.一开始的时候,你有一个n*m的矩形,即它的四角的坐标分别为(0,0), (0,m), (n,0), (n,m).每次你 ...
- OA项目笔记-从建立接口 dao impl action jsp等框架实现crud
1,设计 BaseDao 与 BaseDaoImpl 1,设计接口 BaseDao 1,每个实体都应有一个对应的Dao接口,封装了对这个实体的数据库操作.例 实体 Dao接口 实现类 ======== ...
- IIS 7.0 部署MVC
开发的MVC 3.0 项目,在部署服务上还是与需要花一点功夫,这里把遇到的问题罗列出来. 本文主要介绍IIS 7.5中安装配置MVC 3.0的具体办法! 部署必备: Microsoft .net Fr ...
- java.lang.NoClassDefFoundError 解决方案
http://stackoverflow.com/questions/9870995/android-java-lang-noclassdeffounderror 像网络了上说的一般这种问题是 运行时 ...
- IOPS-百度百科
IOPS (Input/Output Operations Per Second),即每秒进行读写(I/O)操作的次数,多用于数据库等场合,衡量随机访问的性能.存储端的IOPS性能和主机端的IO是不同 ...
- 01-C#入门(分支控制语句)
说实话,<C#入门经典>这本书对入门的同学来说真的太棒了,先不说内容如何,就作者先以控制台(命令行)调试程序的方法,就能够最大限度地让你关注学习的内容,而不是花哨的界面调试. 现在学习是下 ...