Recently i was doing some study on algorithms. A classic problem is to find the K largest(smallest) numbers from an array. I mainly studyed two methods, one is directly methold. It is an extension of select sort, always select the largest number from the array. The pseudo code is as below. The algorithm complexity is O(kn).

 function select(list[1..n], k)
for i from 1 to k
minIndex = i
minValue = list[i]
for j from i+1 to n
if list[j] < minValue
minIndex = j
minValue = list[j]
swap list[i] and list[minIndex]
return list[k] The C++ implementation is
template<typename T>
std::vector<T> SelectLargestKItem(const std::vector<T> &vecInput, size_t K, std::vector<int> &vecIndex)
{
if (K > vecInput.size())
return vecInput; std::vector<T> vecLocal(vecInput);
std::vector<T> vecResult;
for (size_t k = ; k < K; ++ k)
{
T maxValue = vecLocal[k];
int maxIndex = k;
for (size_t i = k + ; i < vecLocal.size(); ++i) {
if (vecLocal[i] > maxValue) {
maxValue = vecLocal[i];
maxIndex = i;
}
}
if (maxIndex != k)
std::swap(vecLocal[maxIndex], vecLocal[k]);
vecResult.push_back( maxValue );
vecIndex.push_back( maxIndex );
}
return vecResult;
}

When the total number of N is very large, such as N > 200,000. And the numbers need to select K is larger than 20, then the above algorithm will become time consuming. After do some research, i choose another algorithm to do the job. This method is a extension of heap sort. The steps work as below:

1) Build a Min Heap MH of the first k elements (arr[0] to arr[k-1]) of the given array. O(k)

2) For each element, after the kth element (arr[k] to arr[n-1]), compare it with root of MH.
……a) If the element is greater than the root then make it root and call heapifyfor MH
……b) Else ignore it.
// The step 2 is O((n-k)*logk)

3) Finally, MH has k largest elements and root of the MH is the kth largest element.

Time Complexity: O(k + (n-k)Logk) without sorted output. If sorted output is needed then O(k + (n-k)Logk + kLogk).

The C++ implementation of the method is as below:

// To heapify a subtree rooted with node i which is
// an index in arr[]. n is size of heap
template<typename T>
void heapifyMinToRoot(std::vector<T> &vecInput, const int n, const int i, std::vector<int> &vecIndex)
{
int smallestIndex = i; // Initialize largest as root
int l = * i + ; // left = 2*i + 1
int r = * i + ; // right = 2*i + 2 // If left child is larger than root
if (l < n && vecInput[l] < vecInput[smallestIndex])
smallestIndex = l; // If right child is larger than largest so far
if (r < n && vecInput[r] < vecInput[smallestIndex])
smallestIndex = r; // If largest is not root
if (smallestIndex != i)
{
std::swap(vecInput[i], vecInput[smallestIndex]);
std::swap(vecIndex[i], vecIndex[smallestIndex]); // Recursively heapify the affected sub-tree
heapifyMinToRoot(vecInput, n, smallestIndex, vecIndex);
}
} template<typename T>
std::vector<T> SelectLargestKItemHeap(const std::vector<T> &vecInput, const size_t K, std::vector<int> &vecIndex)
{
if (K > vecInput.size()) {
std::vector<T> vecResult(vecInput);
std::sort(vecResult.begin(), vecResult.end());
std::reverse(vecResult.begin(), vecResult.end());
for (size_t i = ; i < vecInput.size(); ++i)
vecIndex.push_back(i);
return vecResult;
} std::vector<T> vecLocal(vecInput);
std::vector<T> vecResult(vecInput.begin(), vecInput.begin() + K);
vecIndex.clear();
for (size_t i = ; i < K; ++ i) vecIndex.push_back(i); for (int K1 = K / - ; K1 >= ; -- K1)
heapifyMinToRoot(vecResult, K, K1, vecIndex); for (size_t i = K; i < vecLocal.size(); ++ i) {
if (vecLocal[i] > vecResult[]) {
vecResult[] = vecLocal[i];
vecIndex[] = i; for (int K1 = K / - ; K1 >= ; -- K1)
heapifyMinToRoot(vecResult, K, K1, vecIndex);
}
}
for (int k = K - ; k >= ; -- k )
{
std::swap(vecResult[k], vecResult[]);
std::swap(vecIndex[k], vecIndex[]); heapifyMinToRoot(vecResult, k, , vecIndex);
} return vecResult;
}

Here is the code to test these two methods.

void SelectionAlgorithmBenchMark()
{
int N = ;
std::vector<int> vecInput; std::minstd_rand0 generator();
for (int i = ; i < N; ++i)
{
int nValue = generator();
vecInput.push_back(nValue );
}
std::vector<int> vecResult, vecIndex;
int K = ;
CStopWatch stopWatch;
vecResult = SelectLargestKItem<int>(vecInput, K, vecIndex);
std::cout << "Standard algorithm SelectLargestKItem takes " << stopWatch.Now() << " ms" << std::endl;
for (int k = ; k < K; ++k)
{
std::cout << "Index " << vecIndex[k] << ", value " << vecResult[k] << std::endl;
}
std::cout << std::endl; stopWatch.Start();
vecResult = SelectLargestKItemHeap<int>(vecInput, K, vecIndex);
std::cout << "Heap algorithm SelectLargestKItem takes " << stopWatch.Now() << " ms" << std::endl;
for (int k = ; k < K; ++k)
{
std::cout << "Index " << vecIndex[k] << ", value " << vecResult[k] << std::endl;
}
}

When N is 200000, K is 20, the first method takes 353ms, the second method takes 31ms. The difference is more than 10 times.

Find the largest K numbers from array (找出数组中最大的K个值)的更多相关文章

  1. 215. Kth Largest Element in an Array找出数组中第k大的值

    堆排序做的,没有全部排序,找到第k个就结束 public int findKthLargest(int[] nums, int k) { int num = 0; if (nums.length &l ...

  2. [LeetCode] Find All Numbers Disappeared in an Array 找出数组中所有消失的数字

    Given an array of integers where 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and ot ...

  3. [LeetCode] Find All Duplicates in an Array 找出数组中所有重复项

    Given an array of integers, 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and others ...

  4. 442. Find All Duplicates in an Array找出数组中所有重复了两次的元素

    [抄题]: Given an array of integers, 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and o ...

  5. 前端算法题:找出数组中第k大的数字出现多少次

    题目:给定一个一维数组,如[1,2,4,4,3,5],找出数组中第k大的数字出现多少次. 例如:第2大的数是4,出现2次,最后输出 4,2 function getNum(arr, k){ // 数组 ...

  6. 【Java】 剑指offer(1) 找出数组中重复的数字

    本文参考自<剑指offer>一书,代码采用Java语言. 更多:<剑指Offer>Java实现合集 题目 在一个长度为n的数组里的所有数字都在0到n-1的范围内.数组中某些数字 ...

  7. 《剑指offer》第三_一题(找出数组中重复的数字,可改变数组)

    // 面试题3(一):找出数组中重复的数字 // 题目:在一个长度为n的数组里的所有数字都在0到n-1的范围内.数组中某些数字是重复的,但不知道有几个数字重复了, // 也不知道每个数字重复了几次.请 ...

  8. 1. 找出数组中的单身狗OddOccurrencesInArray Find value that occurs in odd number of elements.

    找出数组中的单身狗: 1. OddOccurrencesInArray Find value that occurs in odd number of elements. A non-empty ze ...

  9. 【Offer】[3-1] 【找出数组中重复的数字】

    题目描述 思路 Java代码 代码链接 题目描述 在一个长度为n的数组里的所有数字都在0~n-1的范围内.数组中某些数字是重复的,但不知道有几个数字重复了,也不知道每个数字重复了几次. 请找出数组中任 ...

随机推荐

  1. C#调试心经续(转)

    断点篇 命中次数(Hit Counts) 右击断点,可以设置Hit Counts(命中次数),会弹出如下的对话框 当条件满足的时候断点会被命中(即即将被执行),这个命中次数是断点被命中的次数.默认是始 ...

  2. MSSQL 批量Update

    UPDATE dbo.WX_TWODIMENCODE_INFO SET wti_scan_views=t.count FROM ( SELECT COUNT(*) AS 'count',lci_ere ...

  3. Creating Classes 创建类

    The dojo/_base/declare module is the foundation of class creation within the Dojo Toolkit. declare a ...

  4. navicat连接My SQL时忘记root密码处理方法

    前端时间安装完My SQL5.6以后很长时间没用过,用navicat连接时有错误提示 应该是密码错误了,但是忘记了root的密码. 在网上找了很久,终于找到修改root密码的方法并修改成功. 1. 关 ...

  5. [转载]ARM协处理器CP15寄存器详解

    用于系统存储管理的协处理器CP15  原地址:http://blog.csdn.net/gameit/article/details/13169405 MCR{cond}     coproc,opc ...

  6. Eclipse svn插件包

    SVN插件下载地址及更新地址,你根据需要选择你需要的版本.现在最新是1.8.x Links for 1.8.x Release: Eclipse update site URL: http://sub ...

  7. [Mongodb] Relica Set复制集集群简单搭建--持续更新

    这里我们搭建一个没有访问控制,奇数个Set没有仲裁者的集群 1.三个配置文件添加 replication.replSetName:"Replica_test" 并启动三个实例2.连 ...

  8. 浅谈C语言变量声明的解析

    C语言本身提供了一种不甚明确的变量声明方式——基于使用的声明,如int *a,本质上是声明了*a的类型为int,所以得到了a的类型为指向int的指针.对于简单类型,这样声明并不会对代码产生多大的阅读障 ...

  9. jvm是如何管理内存的

    1.JVM是如何管理内存的 Java中,内存管理是JVM自动进行的,无需人为干涉. 了解Java内存模型看这里:java内存模型是什么样的 了解jvm实例结构看这里:jvm实例的结构是什么样的 创建对 ...

  10. Android笔记:invalidate()和postInvalidate() 的区别及使用

    http://blog.csdn.net/mars2639/article/details/6650876 Android提供了Invalidate方法实现界面刷新,但是Invalidate不能直接在 ...