Find the largest K numbers from array (找出数组中最大的K个值)
Recently i was doing some study on algorithms. A classic problem is to find the K largest(smallest) numbers from an array. I mainly studyed two methods, one is directly methold. It is an extension of select sort, always select the largest number from the array. The pseudo code is as below. The algorithm complexity is O(kn).
function select(list[1..n], k)
for i from 1 to k
minIndex = i
minValue = list[i]
for j from i+1 to n
if list[j] < minValue
minIndex = j
minValue = list[j]
swap list[i] and list[minIndex]
return list[k] The C++ implementation is
template<typename T>
std::vector<T> SelectLargestKItem(const std::vector<T> &vecInput, size_t K, std::vector<int> &vecIndex)
{
if (K > vecInput.size())
return vecInput; std::vector<T> vecLocal(vecInput);
std::vector<T> vecResult;
for (size_t k = ; k < K; ++ k)
{
T maxValue = vecLocal[k];
int maxIndex = k;
for (size_t i = k + ; i < vecLocal.size(); ++i) {
if (vecLocal[i] > maxValue) {
maxValue = vecLocal[i];
maxIndex = i;
}
}
if (maxIndex != k)
std::swap(vecLocal[maxIndex], vecLocal[k]);
vecResult.push_back( maxValue );
vecIndex.push_back( maxIndex );
}
return vecResult;
}
When the total number of N is very large, such as N > 200,000. And the numbers need to select K is larger than 20, then the above algorithm will become time consuming. After do some research, i choose another algorithm to do the job. This method is a extension of heap sort. The steps work as below:
1) Build a Min Heap MH of the first k elements (arr[0] to arr[k-1]) of the given array. O(k)
2) For each element, after the kth element (arr[k] to arr[n-1]), compare it with root of MH.
……a) If the element is greater than the root then make it root and call heapifyfor MH
……b) Else ignore it.
// The step 2 is O((n-k)*logk)
3) Finally, MH has k largest elements and root of the MH is the kth largest element.
Time Complexity: O(k + (n-k)Logk) without sorted output. If sorted output is needed then O(k + (n-k)Logk + kLogk).
The C++ implementation of the method is as below:
// To heapify a subtree rooted with node i which is
// an index in arr[]. n is size of heap
template<typename T>
void heapifyMinToRoot(std::vector<T> &vecInput, const int n, const int i, std::vector<int> &vecIndex)
{
int smallestIndex = i; // Initialize largest as root
int l = * i + ; // left = 2*i + 1
int r = * i + ; // right = 2*i + 2 // If left child is larger than root
if (l < n && vecInput[l] < vecInput[smallestIndex])
smallestIndex = l; // If right child is larger than largest so far
if (r < n && vecInput[r] < vecInput[smallestIndex])
smallestIndex = r; // If largest is not root
if (smallestIndex != i)
{
std::swap(vecInput[i], vecInput[smallestIndex]);
std::swap(vecIndex[i], vecIndex[smallestIndex]); // Recursively heapify the affected sub-tree
heapifyMinToRoot(vecInput, n, smallestIndex, vecIndex);
}
} template<typename T>
std::vector<T> SelectLargestKItemHeap(const std::vector<T> &vecInput, const size_t K, std::vector<int> &vecIndex)
{
if (K > vecInput.size()) {
std::vector<T> vecResult(vecInput);
std::sort(vecResult.begin(), vecResult.end());
std::reverse(vecResult.begin(), vecResult.end());
for (size_t i = ; i < vecInput.size(); ++i)
vecIndex.push_back(i);
return vecResult;
} std::vector<T> vecLocal(vecInput);
std::vector<T> vecResult(vecInput.begin(), vecInput.begin() + K);
vecIndex.clear();
for (size_t i = ; i < K; ++ i) vecIndex.push_back(i); for (int K1 = K / - ; K1 >= ; -- K1)
heapifyMinToRoot(vecResult, K, K1, vecIndex); for (size_t i = K; i < vecLocal.size(); ++ i) {
if (vecLocal[i] > vecResult[]) {
vecResult[] = vecLocal[i];
vecIndex[] = i; for (int K1 = K / - ; K1 >= ; -- K1)
heapifyMinToRoot(vecResult, K, K1, vecIndex);
}
}
for (int k = K - ; k >= ; -- k )
{
std::swap(vecResult[k], vecResult[]);
std::swap(vecIndex[k], vecIndex[]); heapifyMinToRoot(vecResult, k, , vecIndex);
} return vecResult;
}
Here is the code to test these two methods.
void SelectionAlgorithmBenchMark()
{
int N = ;
std::vector<int> vecInput; std::minstd_rand0 generator();
for (int i = ; i < N; ++i)
{
int nValue = generator();
vecInput.push_back(nValue );
}
std::vector<int> vecResult, vecIndex;
int K = ;
CStopWatch stopWatch;
vecResult = SelectLargestKItem<int>(vecInput, K, vecIndex);
std::cout << "Standard algorithm SelectLargestKItem takes " << stopWatch.Now() << " ms" << std::endl;
for (int k = ; k < K; ++k)
{
std::cout << "Index " << vecIndex[k] << ", value " << vecResult[k] << std::endl;
}
std::cout << std::endl; stopWatch.Start();
vecResult = SelectLargestKItemHeap<int>(vecInput, K, vecIndex);
std::cout << "Heap algorithm SelectLargestKItem takes " << stopWatch.Now() << " ms" << std::endl;
for (int k = ; k < K; ++k)
{
std::cout << "Index " << vecIndex[k] << ", value " << vecResult[k] << std::endl;
}
}
When N is 200000, K is 20, the first method takes 353ms, the second method takes 31ms. The difference is more than 10 times.
Find the largest K numbers from array (找出数组中最大的K个值)的更多相关文章
- 215. Kth Largest Element in an Array找出数组中第k大的值
堆排序做的,没有全部排序,找到第k个就结束 public int findKthLargest(int[] nums, int k) { int num = 0; if (nums.length &l ...
- [LeetCode] Find All Numbers Disappeared in an Array 找出数组中所有消失的数字
Given an array of integers where 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and ot ...
- [LeetCode] Find All Duplicates in an Array 找出数组中所有重复项
Given an array of integers, 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and others ...
- 442. Find All Duplicates in an Array找出数组中所有重复了两次的元素
[抄题]: Given an array of integers, 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and o ...
- 前端算法题:找出数组中第k大的数字出现多少次
题目:给定一个一维数组,如[1,2,4,4,3,5],找出数组中第k大的数字出现多少次. 例如:第2大的数是4,出现2次,最后输出 4,2 function getNum(arr, k){ // 数组 ...
- 【Java】 剑指offer(1) 找出数组中重复的数字
本文参考自<剑指offer>一书,代码采用Java语言. 更多:<剑指Offer>Java实现合集 题目 在一个长度为n的数组里的所有数字都在0到n-1的范围内.数组中某些数字 ...
- 《剑指offer》第三_一题(找出数组中重复的数字,可改变数组)
// 面试题3(一):找出数组中重复的数字 // 题目:在一个长度为n的数组里的所有数字都在0到n-1的范围内.数组中某些数字是重复的,但不知道有几个数字重复了, // 也不知道每个数字重复了几次.请 ...
- 1. 找出数组中的单身狗OddOccurrencesInArray Find value that occurs in odd number of elements.
找出数组中的单身狗: 1. OddOccurrencesInArray Find value that occurs in odd number of elements. A non-empty ze ...
- 【Offer】[3-1] 【找出数组中重复的数字】
题目描述 思路 Java代码 代码链接 题目描述 在一个长度为n的数组里的所有数字都在0~n-1的范围内.数组中某些数字是重复的,但不知道有几个数字重复了,也不知道每个数字重复了几次. 请找出数组中任 ...
随机推荐
- jquery属性的操作
HTML示例代码: <!DOCTYPE html> <html lang="en"> <head> <meta charset=" ...
- gRaphael——JavaScript 矢量图表库:两行代码实现精美图表
gRaphael 是一个致力于帮助开发人员在网页中绘制各种精美图表的 Javascript 库,基于强大的 Raphael 矢量图形库.你只需要编写几行简单的代码就能创建出精美的条形图.饼图.点图和曲 ...
- 支付宝接入文档中TRADE_SUCCESS和TRADE_FINISHED的本质区别
之前一直不知道这2种状态到底有什么不同.支付宝中担保交易和即时到账交易对其的描述为: TRADE_SUCCESS 交易成功(或支付成功) TRADE_FINISHED 交易完成 一头雾水... ...
- C++中vector的remove用法
我将从remove的复习开始这个条款,因为remove是STL中最糊涂的算法.误解remove很容易,驱散所有关于remove行为的疑虑——为什么它这么做,它是怎么做的——是很重要的. 这是rem ...
- combox 同时写入和获取 text ,value
c# combox 同时写入和获取 text ,value 2007-10-10 16:33:44| 分类: c# 知识|举报|字号 订阅 public class ComboBoxItem ...
- Laravel项目目录结构说明
Laravel项目目录结构说明: |- vendor 目录包含你的 Composer 依赖模块及laravel框架. |- bootstrap 目录包含几个框架启动跟自动加载配置的文件. |- app ...
- mysql数据导出excel格式+乱码解决
1:导出的SQL命令,只需要加上“FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' ” ...
- 双系统 fedora 恢复引导
因为硬盘坏了,所以买了个固态的用用. 先装windows,再装fedora及常用必备的驱动软件装上. 快要完成了心情都挺好,可是在一次关机时window7关机很慢一直在删索引,我嫌时间太长,直接按电源 ...
- 流量咪教你挖到5G免费流量
自从出现了“抢红包”这个伟大的发明,身边的小伙伴们人人都练就了“左手右手一个快动作”的技能.然而,抢红包只有反应快还不够,还要避免下面这样的悲剧! 为了避免各位小主,因为流量不足而输在抢红包的起跑线上 ...
- SQL Server备份脚本
declare @bakfile varchar(30), @bakfilediff varchar(30),@pathfull varchar(50),@pathdiff varchar(50)se ...