STL中map与hash_map容器的选择收藏

这篇文章来自我今天碰到的一个问题，一个朋友问我使用map和hash_map的效率问题，虽然我也了解一些，但是我不敢直接告诉朋友，因为我怕我说错了，通过我查询一些帖子，我这里做一个总结！内容分别来自
alvin_lee ，codeproject,codeguru.baidu等等！

先看看alvin_lee 朋友做的解析，我觉得还是很正确的，从算法角度阐述了他们之间的问题！

实际上这个问题不光C++会遇到，其他所有语言的标准容器的实现及选择上都是要考虑的。做应用程序你可能觉得影响不大，但是写算法或者核心代码就要小心了。今天改进代码，顺便又来温习基础功课了。

　　还记得Herb Sutter那极有味道的《C++对话系列》么，在其中《产生真正的hash对象》这个故事里就讲了map的选择。顺便回顾一下，也讲一下我在实用中的理解。

　　选择map容器，是为了更快的从关键字查找到相关的对象。与使用list这样的线性表容器相比，一可以简化查找的算法，二可以使任意的关键字做索引，并与目标对象配对，优化查找算法。在C++的STL中map是使用树来做查找算法，这种算法差不多相当与list线性容器的折半查找的效率一样，都是O (log2N)，而list就没有map这样易定制和操作了。

　　相比hash_map，hash_map使用hash表来排列配对，hash表是使用关键字来计算表位置。当这个表的大小合适，并且计算算法合适的情况下，hash表的算法复杂度为O(1)的，但是这是理想的情况下的，如果hash表的关键字计算与表位置存在冲突，那么最坏的复杂度为O(n)。

　　那么有了这样的认识，我们应该怎么样选用算法呢？前两天看Python文章的时候，不知道哪个小子说Python的map比c++的map快，如何如何的。但是他并不知道Python是默认使用的 hash_map，而且这些语言特征本质上是使用c/c++写出来的，问题在与算法和手段，而不是在于语言本身的优劣，你熟悉了各种算法，各种语言的细节、设计思想，还能在这偏激的嚷嚷孰好孰坏(片面与偏激的看待事物只能表明愚昧与无知，任何事物都有存在的价值，包括技术)。显然C++的STL默认使用树结构来实现map，是有考究的。

　　树查找，在总查找效率上比不上hash表，但是它很稳定，它的算法复杂度不会出现波动。在一次查找中，你可以断定它最坏的情况下其复杂度不会超过O(log2N)。而hash表就不一样，是O(1)，还是O(N)，或者在其之间，你并不能把握。假若你在开发一个供外部调用的接口，其内部有关键字的查找，但是这个接口调用并不频繁，你是会希望其调用速度快、但不稳定呢，还是希望其调用时间平均、且稳定呢。反之假若你的程序需要查找一个关键字，这个操作非常频繁，你希望这些操作在总体上的时间较短，那么hash表查询在总时间上会比其他要短，平均操作时间也会短。这里就需要权衡了。

　　这里总结一下，选用map还是hash_map，关键是看关键字查询操作次数，以及你所需要保证的是查询总体时间还是单个查询的时间。如果是要很多次操作，要求其整体效率，那么使用hash_map，平均处理时间短。如果是少数次的操作，使用 hash_map可能造成不确定的O(N)，那么使用平均处理时间相对较慢、单次处理时间恒定的map，考虑整体稳定性应该要高于整体效率，因为前提在操作次数较少。如果在一次流程中，使用hash_map的少数操作产生一个最坏情况O(N)，那么hash_map的优势也因此丧尽了。

下面先看一段代码，从Codeproject的 Jay Kint：

// familiar month example used
// mandatory contrived example to show a simple point
// compiled using MinGW gcc 3.2.3 with gcc -c -o file.o
// file.cpp

#include <string>
#include <ext/hash_map>
#include <iostream>

using namespace std;
// some STL implementations do not put hash_map in std
using namespace __gnu_cxx;

hash_map<const char*, int> days_in_month;

class MyClass {
static int totalDaysInYear;
public:
void add_days( int days ) { totalDaysInYear += days; }
static void printTotalDaysInYear(void)
{
cout << "Total Days in a year are "
<< totalDaysInYear << endl;
}
};

int MyClass::totalDaysInYear = 0;

int main(void)
{
days_in_month["january"] = 31;
days_in_month["february"] = 28;
days_in_month["march"] = 31;
days_in_month["april"] = 30;
days_in_month["may"] = 31;
days_in_month["june"] = 30;
days_in_month["july"] = 31;
days_in_month["august"] = 31;
days_in_month["september"] = 30;
days_in_month["october"] = 31;
days_in_month["november"] = 30;
days_in_month["december"] = 31;

// ERROR: This line doesn't compile.
accumulate( days_in_month.begin(), days_in_month.end(),
mem_fun( &MyClass::add_days ));

MyClass::printTotalDaysInYear();

return 0;
}

当然上面的代码完全可以使用STL来实现：

引用

Standard C++ Solutions
The Standard C++ Library defines certain function adaptors, select1st, select2nd and compose1, that can be used to call a single parameter function with either the key or the data element of a pair associative container.

select1st and select2nd do pretty much what their respective names say they do. They return either the first or second parameter from a pair.

compose1 allows the use of functional composition, such that the return value of one function can be used as the argument to another. compose1(f,g) is the same as f(g(x)).

Using these function adaptors, we can use for_each to call our function.

hash_map my_map;
for_each( my_map.begin(), my_map.end(),
compose1( mem_fun( &MyType::do_something ),
select2nd MyType>::value_type>()));
Certainly, this is much better than having to define helper functions for each pair, but it still seems a bit cumbersome, especially when compared with the clarity that a comparable for loop has.

for( hash_map::iterator i =
my_map.begin();
i != my_map.end(), ++i ) {

i->second.do_something();
}
Considering it was avoiding the for loop for clarity's sake that inspired the use of the STL algorithms in the first place, it doesn't help the case of algorithms vs. hand written loops that the for loop is more clear and concise.

with_data and with_key
with_data and with_key are function adaptors that strive for clarity while allowing the easy use of the STL algorithms with pair associative containers. They have been parameterized much the same way mem_fun has been. This is not exactly rocket science, but it is quickly easy to see that they are much cleaner than the standard function adaptor expansion using compose1 and select2nd.

Using with_data and with_key, any function can be called and will use the data_type or key_type as the function's argument respectively. This allows hash_map, map, and any other pair associative containers in the STL to be used easily with the standard algorithms. It is even possible to use it with other function adaptors, such as mem_fun.

hash_map my_vert_buffers;

void ReleaseBuffers(void)
{
// release the vertex buffers created so far.
std::for_each( my_vert_buffers.begin(),
my_vert_buffers.end(),
with_data( boost::mem_fn(
&IDirect3DVertexBuffer9::Release )));
}
Here boost::mem_fn is used instead of mem_fun since it recognizes the __stdcall methods used by COM, if the BOOST_MEM_FN_ENABLE_STDCALL macro is defined.

另外添加一些实战的例子：
连接是:
http://blog.sina.com.cn/u/4755b4ee010004hm

摘录如下:

引用

一直都用的STL的map,直到最近库里数据量急剧增大,听别的做检索的同学说到hash_map,一直都打算换回来,今天好好做了个实验测试了哈hash_map的功能,效果以及与map比较的性能.
首先,要说的是这两种数据结构的都提供了KEY-VALUE的存储和查找的功能.但是实现是不一样的,map是用的红黑树,查询时间复杂度为log (n),而hash_map是用的哈希表.查询时间复杂度理论上可以是常数,但是消耗内存大,是一种以存储换时间的方法.
就应用来说,map已经是STL标准库的东西,可是hash_map暂时还未进入标准库,但也是非常常用也非常重要的库.
这次所做的测试是对于100W及的文件列表,去重的表现,即是对文件名string,做map!
用到的头文件:

#include <time.h> //计算时间性能用
#include <ext/hash_map> //包含hash_map 的头文件
#include <map> //stl的map
using namespace std; //std 命名空间
using namespace __gnu_cxx; //而hash_map是在__gnu_cxx的命名空间里的

//测试3个环节:用map的效率,hash_map系统hash函数的效率及自写hash函数的效率.

11 struct str_hash{ //自写hash函数
12 size_t operator()(const string& str) const
13 {
14 unsigned long __h = 0;
15 for (size_t i = 0 ; i < str.size() ; i ++)
16 {
17 __h = 107*__h + str[i];
18 }
19 return size_t(__h);
20 }
21 };

23 //struct str_hash{ //自带的string hash函数
24 // size_t operator()(const string& str) const
25 // {
26 // return __stl_hash_string(str.c_str());
27 // }
28 //};

30 struct str_equal{ //string 判断相等函数
31 bool operator()(const string& s1,const string& s2) const
32 {
33 return s1==s2;
34 }
35 };

//用的时候
37 int main(void)
38 {
39 vector<string> filtered_list;
40 hash_map<string,int,str_hash,str_equal> file_map;
41 map<string,int> file2_map;
42 ifstream in("/dev/shm/list");
43 time_t now1 = time(NULL);
44 struct tm * curtime;
45 curtime = localtime ( &now1 );
46 cout<<now1<<endl;
47 char ctemp[20];
48 strftime(ctemp, 20, "%Y-%m-%d %H:%M:%S" , curtime);
49 cout<<ctemp<<endl;
50 string temp;
51 int i=0;
52 if(!in)
53 {
54 cout<<"open failed!~"<<endl;
55 }
56 while(in>>temp)
57 {
58 string sub=temp.substr(0,65);
59 if(file_map.find(sub)==file_map.end())
60 // if(file2_map.find(sub)==file2_map.end())
61 {
62 file_map[sub]=i;
63 // file2_map[sub]=i;
64 filtered_list.push_back(temp);
65 i++;
66 // cout<<sub<<endl;
67 }
68 }
69 in.close();
70 cout<<"the total unique file number is:"<<i<<endl;
71 ofstream out("./file_list");
72 if(!out)
73 {
74 cout<<"failed open"<<endl;
75 }
76 for(int j=0;j<filtered_list.size();j++)
77 {
78 out<<filtered_list[j]<<endl;
79 }
80 time_t now2=time(NULL);
81 cout<<now2<<endl;
82 curtime = localtime ( &now2 );
83 strftime(ctemp, 20, "%Y-%m-%d %H:%M:%S" , curtime);
84 cout<<now2-now1<<"/t"<<ctemp<<endl;
85 return 0;
86 }

引用

得出来的结论是:(文件list有106W,去重后有51W)
1.map完成去重耗时34秒
2.hash_map用系统自带的函数,耗时22秒
3.hash_map用自己写的函数,耗时14秒
测试结果充分说明了hash_map比map的优势,另外,不同的hash函数对性能的提升也是不同的,上述hash函数为一同学,测试N多数据后得出的经验函数.
可以预见,当数量级越大时越能体现出hash_map的优势来!~

当然最后作者的结论是错误的，hash_map的原理理解错误！从第一个朋友的回答就可以体会到这个问题！

最后对于C++Builder用户，应该通过以下方法添加:
#include "stlport/hash_map"
才可以正确的使用hash_map

STL中map与hash_map容器的选择收藏的更多相关文章

STL中map与hash_map的比较
1. map : C++的STL中map是使用树来做查找算法; 时间复杂度:O(log2N) 2. hash_map : 使用hash表来排列配对,hash表是使用关键字来计算表位置; 时间复杂度:O ...
C++ STL 中 map 容器
C++ STL 中 map 容器 Map是STL的一个关联容器,它提供一对一(其中第一个可以称为关键字,每个关键字只能在map中出现一次,第二个可能称为该关键字的值)的数据处理能力,由于这个特性,它 ...
C++ STL中Map的按Key排序和按Value排序
map是用来存放<key, value>键值对的数据结构,可以很方便快速的根据key查到相应的value.假如存储学生和其成绩(假定不存在重名,当然可以对重名加以区分),我们用map来进 ...
C++ STL中Map的相关排序操作：按Key排序和按Value排序 - 编程小径 - 博客频道 - CSDN.NET
C++ STL中Map的相关排序操作:按Key排序和按Value排序 - 编程小径 - 博客频道 - CSDN.NET C++ STL中Map的相关排序操作:按Key排序和按Value排序分类: C ...
C++中的STL中map用法详解(转)
原文地址: https://www.cnblogs.com/fnlingnzb-learner/p/5833051.html C++中的STL中map用法详解 Map是STL的一个关联容器,它提供 ...
C++ STL中Map的按Key排序跟按Value排序
C++ STL中Map的按Key排序和按Value排序 map是用来存放<key, value>键值对的数据结构,可以很方便快速的根据key查到相应的value.假如存储学生和其成绩(假定 ...
STL中map的使用
知识点 C++中map提供的是一种键值对容器,里面的数据都是成对出现的.map内部自建一颗红黑树(一种非严格意义上的平衡二叉树),这颗树具有对数据自动排序的功能,所以在map内部所有的数据都是有序的. ...
stl中map的四种插入方法总结
stl中map的四种插入方法总结方法一:pair例:map<int, string> mp;mp.insert(pair<int,string>(1,"aaaaa&q ...
STL中map用法
Map是 STL的一个关联容器,它提供一对一(其中第一个可以称为关键字,每个关键字只能在map中出现一次,第二个可能称为该关键字的值)的数据处理能力,由于这个特性,它完成有可能在我们处理一对一数据的 ...

随机推荐

hdu 4585 Shaolin_set用法
题目链接题意:有n个人想成为少林,但是成为少林必须跟少林的大师大一场,当然要选择战斗力很近的,有两大师战斗力跟那人相近程度一样就选战斗力小的那个,按输入顺序,先输入的人先成为少林大师,后面输入的人, ...
How to run OFBiz as a Service on linux
Windows See this specific guide: How to Run OFBiz as Windows Service with Java Service Wrapper Linux ...
虚函数virtual
简单地说,那些被virtual关键字修饰的成员函数,就是虚函数.虚函数的作用,用专业术语来解释就是实现多态性(Polymorphism),多态性是将接口与实现进行分离:用形象的语言来解释就是实现以共同 ...
JavaScript - 基于原型的面向对象
JavaScript - 基于原型的面向对象 1. 引言 JavaScript 是一种基于原型的面向对象语言,而不是基于类的!!! 基于类的面向对象语言,比如 Java,是构建在两个不同实体的概念之上 ...
SQL生成随机字符串
1.SQLserve生成随机字符串 SELECT replace(newid(), '-', '')
AC Milan VS Juventus（模拟）
AC Milan VS Juventus Time Limit: 3000/1000MS (Java/Others) Memory Limit: 65535/65535KB (Java/Oth ...
codeforces 166C Median - from lanshui_Yang
C. Median time limit per test 2 seconds memory limit per test 256 megabytes input standard input out ...
ping不通的几种可能原因
平时使用中常常会碰到ping不通的情况,ping不通的原因有非常多,比方路由设置问题,比方网络问题,下面列出几点原因: 1.太心急.即网线刚插到交换机上就想Ping通网关,忽略了生成树的收敛 ...
Dalvik虚拟机的启动过程分析
文章转载至CSDN社区罗升阳的安卓之旅,原文地址:http://blog.csdn.net/luoshengyang/article/details/8885792 在Android系统中,应用程序进 ...
玩转Nodejs日志管理log4js(转)
转自:http://blog.fens.me/nodejs-log4js/ 前言日志对任何的应用来说都是至关重要的.在Nodejs中使用express框架并没有自带的日志模块,我们可以选择log4j ...

STL中map与hash_map容器的选择收藏

STL中map与hash_map容器的选择收藏的更多相关文章

随机推荐

热门专题