Recently i was doing some study on algorithms. A classic problem is to find the K largest(smallest) numbers from an array. I mainly studyed two methods, one is directly methold. It is an extension of select sort, always select the largest number from the array. The pseudo code is as below. The algorithm complexity is O(kn).

 function select(list[1..n], k)
for i from 1 to k
minIndex = i
minValue = list[i]
for j from i+1 to n
if list[j] < minValue
minIndex = j
minValue = list[j]
swap list[i] and list[minIndex]
return list[k] The C++ implementation is
template<typename T>
std::vector<T> SelectLargestKItem(const std::vector<T> &vecInput, size_t K, std::vector<int> &vecIndex)
{
if (K > vecInput.size())
return vecInput; std::vector<T> vecLocal(vecInput);
std::vector<T> vecResult;
for (size_t k = ; k < K; ++ k)
{
T maxValue = vecLocal[k];
int maxIndex = k;
for (size_t i = k + ; i < vecLocal.size(); ++i) {
if (vecLocal[i] > maxValue) {
maxValue = vecLocal[i];
maxIndex = i;
}
}
if (maxIndex != k)
std::swap(vecLocal[maxIndex], vecLocal[k]);
vecResult.push_back( maxValue );
vecIndex.push_back( maxIndex );
}
return vecResult;
}

When the total number of N is very large, such as N > 200,000. And the numbers need to select K is larger than 20, then the above algorithm will become time consuming. After do some research, i choose another algorithm to do the job. This method is a extension of heap sort. The steps work as below:

1) Build a Min Heap MH of the first k elements (arr[0] to arr[k-1]) of the given array. O(k)

2) For each element, after the kth element (arr[k] to arr[n-1]), compare it with root of MH.
……a) If the element is greater than the root then make it root and call heapifyfor MH
……b) Else ignore it.
// The step 2 is O((n-k)*logk)

3) Finally, MH has k largest elements and root of the MH is the kth largest element.

Time Complexity: O(k + (n-k)Logk) without sorted output. If sorted output is needed then O(k + (n-k)Logk + kLogk).

The C++ implementation of the method is as below:

// To heapify a subtree rooted with node i which is
// an index in arr[]. n is size of heap
template<typename T>
void heapifyMinToRoot(std::vector<T> &vecInput, const int n, const int i, std::vector<int> &vecIndex)
{
int smallestIndex = i; // Initialize largest as root
int l = * i + ; // left = 2*i + 1
int r = * i + ; // right = 2*i + 2 // If left child is larger than root
if (l < n && vecInput[l] < vecInput[smallestIndex])
smallestIndex = l; // If right child is larger than largest so far
if (r < n && vecInput[r] < vecInput[smallestIndex])
smallestIndex = r; // If largest is not root
if (smallestIndex != i)
{
std::swap(vecInput[i], vecInput[smallestIndex]);
std::swap(vecIndex[i], vecIndex[smallestIndex]); // Recursively heapify the affected sub-tree
heapifyMinToRoot(vecInput, n, smallestIndex, vecIndex);
}
} template<typename T>
std::vector<T> SelectLargestKItemHeap(const std::vector<T> &vecInput, const size_t K, std::vector<int> &vecIndex)
{
if (K > vecInput.size()) {
std::vector<T> vecResult(vecInput);
std::sort(vecResult.begin(), vecResult.end());
std::reverse(vecResult.begin(), vecResult.end());
for (size_t i = ; i < vecInput.size(); ++i)
vecIndex.push_back(i);
return vecResult;
} std::vector<T> vecLocal(vecInput);
std::vector<T> vecResult(vecInput.begin(), vecInput.begin() + K);
vecIndex.clear();
for (size_t i = ; i < K; ++ i) vecIndex.push_back(i); for (int K1 = K / - ; K1 >= ; -- K1)
heapifyMinToRoot(vecResult, K, K1, vecIndex); for (size_t i = K; i < vecLocal.size(); ++ i) {
if (vecLocal[i] > vecResult[]) {
vecResult[] = vecLocal[i];
vecIndex[] = i; for (int K1 = K / - ; K1 >= ; -- K1)
heapifyMinToRoot(vecResult, K, K1, vecIndex);
}
}
for (int k = K - ; k >= ; -- k )
{
std::swap(vecResult[k], vecResult[]);
std::swap(vecIndex[k], vecIndex[]); heapifyMinToRoot(vecResult, k, , vecIndex);
} return vecResult;
}

Here is the code to test these two methods.

void SelectionAlgorithmBenchMark()
{
int N = ;
std::vector<int> vecInput; std::minstd_rand0 generator();
for (int i = ; i < N; ++i)
{
int nValue = generator();
vecInput.push_back(nValue );
}
std::vector<int> vecResult, vecIndex;
int K = ;
CStopWatch stopWatch;
vecResult = SelectLargestKItem<int>(vecInput, K, vecIndex);
std::cout << "Standard algorithm SelectLargestKItem takes " << stopWatch.Now() << " ms" << std::endl;
for (int k = ; k < K; ++k)
{
std::cout << "Index " << vecIndex[k] << ", value " << vecResult[k] << std::endl;
}
std::cout << std::endl; stopWatch.Start();
vecResult = SelectLargestKItemHeap<int>(vecInput, K, vecIndex);
std::cout << "Heap algorithm SelectLargestKItem takes " << stopWatch.Now() << " ms" << std::endl;
for (int k = ; k < K; ++k)
{
std::cout << "Index " << vecIndex[k] << ", value " << vecResult[k] << std::endl;
}
}

When N is 200000, K is 20, the first method takes 353ms, the second method takes 31ms. The difference is more than 10 times.

Find the largest K numbers from array (找出数组中最大的K个值)的更多相关文章

  1. 215. Kth Largest Element in an Array找出数组中第k大的值

    堆排序做的,没有全部排序,找到第k个就结束 public int findKthLargest(int[] nums, int k) { int num = 0; if (nums.length &l ...

  2. [LeetCode] Find All Numbers Disappeared in an Array 找出数组中所有消失的数字

    Given an array of integers where 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and ot ...

  3. [LeetCode] Find All Duplicates in an Array 找出数组中所有重复项

    Given an array of integers, 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and others ...

  4. 442. Find All Duplicates in an Array找出数组中所有重复了两次的元素

    [抄题]: Given an array of integers, 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and o ...

  5. 前端算法题:找出数组中第k大的数字出现多少次

    题目:给定一个一维数组,如[1,2,4,4,3,5],找出数组中第k大的数字出现多少次. 例如:第2大的数是4,出现2次,最后输出 4,2 function getNum(arr, k){ // 数组 ...

  6. 【Java】 剑指offer(1) 找出数组中重复的数字

    本文参考自<剑指offer>一书,代码采用Java语言. 更多:<剑指Offer>Java实现合集 题目 在一个长度为n的数组里的所有数字都在0到n-1的范围内.数组中某些数字 ...

  7. 《剑指offer》第三_一题(找出数组中重复的数字,可改变数组)

    // 面试题3(一):找出数组中重复的数字 // 题目:在一个长度为n的数组里的所有数字都在0到n-1的范围内.数组中某些数字是重复的,但不知道有几个数字重复了, // 也不知道每个数字重复了几次.请 ...

  8. 1. 找出数组中的单身狗OddOccurrencesInArray Find value that occurs in odd number of elements.

    找出数组中的单身狗: 1. OddOccurrencesInArray Find value that occurs in odd number of elements. A non-empty ze ...

  9. 【Offer】[3-1] 【找出数组中重复的数字】

    题目描述 思路 Java代码 代码链接 题目描述 在一个长度为n的数组里的所有数字都在0~n-1的范围内.数组中某些数字是重复的,但不知道有几个数字重复了,也不知道每个数字重复了几次. 请找出数组中任 ...

随机推荐

  1. C# 下载搜狗词库

    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url); string[] userAgent = new string[]{& ...

  2. Angular $scope和$rootScope事件机制之$emit、$broadcast和$on

    Angular按照发布/订阅模式设计了其事件系统,使用时需要“发布”事件,并在适当的位置“订阅”或“退订”事件,就像邮箱里面大量的订阅邮件一样,当我们不需要时就可以将其退订了.具体到开发中,对应着$s ...

  3. C#Random函数在循环中每次获取一样的值

    首先需要了解一点Random函数的随机生成是和当前时间有关系,如果在短时间生成随机数,就会导致随机数生成出来是相同的. 不过我们可以在每次随机时指定一个Seed种子值,这样在循环里就可以每次获取不一样 ...

  4. Swift 简介

    1.swift支持所有C和Obeject-c的基本类型,支持面向过程和面向对象的编程机制. 2.swift提供了2种功能强劲的集合类型:数组和字典 3.元祖 4.可选类型 5.swift 是一种类型安 ...

  5. codeforces 83 D. Numbers

    题意: 给出l,r,k,(1 ≤ l ≤ r ≤ 2·109, 2 ≤ k ≤ 2·109) 求在区间[l,r]内有多少个数i满足 k | i,且[2,k-1]的所有数都不可以被i整除 首先,如果k不 ...

  6. python(28)获得网卡的IP地址

    获得第几块网卡的ip地址: def get_ip_address(self,ifname): # ifname = 'eth0' s = socket.socket(socket.AF_INET, s ...

  7. 浏览器内核控制Meta标签

    国内的主流浏览器都是双核浏览器:基于Webkit内核用于常用网站的高速浏览.基于IE的内核用于兼容网银.旧版网站.以360的几款浏览器为例,我们优先通过Webkit内核渲染主流的网站,只有小量的网站通 ...

  8. C++ vector erase函数的使用注意事项

    最近使用了顺序容器的删除元素操作,特此记录下该函数的注意事项. 在C++primer中对c.erase(p) 这样解释的:  c.erase(p)    删除迭代器p所指向的元素,返回一个指向被删元素 ...

  9. 考虑virtual函数以外的选择

    在C++中,有四种选择可以替代virtual函数的功能: 1.non-virtual interface(NVI)手法,这是一种template method模式.它以public non-virtu ...

  10. 基于Grunt&Mocha 搭建Nodejs自动化单元测试框架(含代码覆盖率统计)

    Introduction Grunt 是一个基于任务的JavaScript 世界的构建工具 Mocha 是具有丰富特性的 JavaScript 测试框架,可以运行在 Node.js 和浏览器中,使得异 ...