Indexing and Hashing

DATABASE SYSTEM CONCEPTS, SIXTH EDITION
11.1 Basic Concepts
An index for a ﬁle in a database system works in much the same way as the index
in this textbook. If we want to learn about a particular topic (speciﬁed by a word
or a phrase) in this textbook, we can search for the topic in the index at the back
of the book, ﬁnd the pages where it occurs, and then read the pages to ﬁnd the
information for which we are looking. The words in the index are in sorted order,
making it easy to ﬁnd the word we want. Moreover, the index is much smaller
than the book, further reducing the effort needed.
Database-system indices play the same role as book indices in libraries. For
example, to retrieve a student record given an
ID
, the database system would look
up an index to ﬁnd on which disk block the corresponding record resides, and
then fetch the disk block, to get the appropriate student record.
Keeping a sorted list of students’
ID
would not work well on very large
databases with thousands of students, since the index would itself be very big;
further, even though keeping the index sorted reduces the search time, ﬁnding a
student can still be rather time-consuming. Instead, more sophisticated indexing
techniques may be used. We shall discuss several of these techniques in this
chapter.
There are two basic kinds of indices:

•
Ordered indices. Based on a sorted ordering of the values.

•
Hash indices. Based on a uniform distribution of values across a range of
buckets. The bucket to which a value is assigned is determined by a function,
called a hash function.

We shall consider several techniques for both ordered indexing and hashing.
No one technique is the best. Rather, each technique is best suited to particular
database applications. Each technique must be evaluated on the basis of these
factors:

•
Access types: The types of access that are supported efﬁciently. Access types
can include ﬁnding records with a speciﬁed attribute value and ﬁnding
records whose attribute values fall in a speciﬁed range.
•
Access time: The time it takes to ﬁnd a particular data item, or set of items,
using the technique in question.
•
Insertion time: The time it takes to insert a new data item. This value includes
the time it takes to ﬁnd the correct place to insert the new data item, as well
as the time it takes to update the index structure.
•
Deletion time: The time it takes to delete a data item. This value includes
the time it takes to ﬁnd the item to be deleted, as well as the time it takes to
update the index structure.
•
Space overhead: The additional space occupied by an index structure. Pro-
vided that the amount of additional space is moderate, it is usually worth-
while to sacriﬁce the space to achieve improved performance.
We often want to have more than one index for a ﬁle. For example, we may
wish to search for a book by author, by subject, or by title.
An attribute or set of attributes used to look up records in a ﬁle is called a
search key. Note that this deﬁnition of key differs from that used in primary key,
candidate key, and superkey. This duplicate meaning for key is (unfortunately) well
established in practice. Using our notion of a search key, we see that if there are
several indices on a ﬁle, there are several search keys.

Indexing and Hashing的更多相关文章

局部敏感哈希-Locality Sensitive Hashing
局部敏感哈希转载请注明http://blog.csdn.net/stdcoutzyx/article/details/44456679 在检索技术中,索引一直须要研究的核心技术.当下,索引技术主要分 ...
局部敏感哈希(Locality-Sensitive Hashing, LSH)方法介绍
局部敏感哈希(Locality-Sensitive Hashing, LSH)方法介绍本文主要介绍一种用于海量高维数据的近似近期邻高速查找技术--局部敏感哈希(Locality-Sensitive ...
局部敏感哈希(Locality-Sensitive Hashing, LSH)
本文主要介绍一种用于海量高维数据的近似最近邻快速查找技术——局部敏感哈希(Locality-Sensitive Hashing, LSH),内容包括了LSH的原理.LSH哈希函数集.以及LSH的一些参 ...
Post Tuned Hashing，PTH
[ACM 2018] Post Tuned Hashing_A New Approach to Indexing High-dimensional Data [paper] [code] Zhendo ...
哈希学习（2）—— Hashing图像检索资源
CVPR14 图像检索papers——图像检索 1. Triangulation embedding and democratic aggregation for imagesearch (Oral ...
局部敏感哈希 Kernelized Locality-Sensitive Hashing Page
Kernelized Locality-Sensitive Hashing Page Brian Kulis (1) and Kristen Grauman (2)(1) UC Berkeley ...
局部敏感哈希(Locality-Sensitive Hashing, LSH)方法介绍（转）
局部敏感哈希(Locality-Sensitive Hashing, LSH)方法介绍本文主要介绍一种用于海量高维数据的近似最近邻快速查找技术——局部敏感哈希(Locality-Sensitive ...
单细胞分析实录(1): 认识Cell Hashing
这是一个新系列差不多是一年以前,我定导后没多久,接手了读研后的第一个课题.合作方是医院,和我对接的是一名博一的医学生,最开始两边的老师很排斥常规的单细胞文章思路,即各大类细胞分群.注释.描述,所以起 ...
[Algorithm] 局部敏感哈希算法(Locality Sensitive Hashing)
局部敏感哈希(Locality Sensitive Hashing,LSH)算法是我在前一段时间找工作时接触到的一种衡量文本相似度的算法.局部敏感哈希是近似最近邻搜索算法中最流行的一种,它有坚实的理论 ...

随机推荐

JavaScript 笔记 ( Prototype )
这阵子实在好忙 ( 这样说好像也不是一两个月了... ),然后因为工作伙伴都是 JavaScript 神之等级的工程师,从中也学到不少知识,毕竟就是要和强者工作才会成长呀!为了想好好瞭解他们写的程式码 ...
Android 自动化测试—robotium(八) 拖拽
本文来源于:http://xiaomaozi.blog.51cto.com/925779/933056 SeekBar控件代码实现:http://luwenjie.blog.51cto.com/92 ...
Python基础1-Python环境搭建
Python环境搭建首先通过终端窗口输入 "python" 命令来查看本地是否已经安装Python以及Python的安装版本: 若未安装则需要下载安装,下面为linux和windo ...
UVa11082 Matrix Decompressing（最小费用最大流）
题目大概有一个n*m的矩阵,已知各行所有数的和的前缀和和各列所有数的和的前缀和,且矩阵各个数都在1到20的范围内,求该矩阵的一个可能的情况. POJ2396的弱化版本吧..建图的关键在于: 把行.列看 ...
如何在2016年成为一个更好的Node.js开发者
本文主要讨论一些进行Node.js开发的最佳实践和建议,这些建议不仅仅适合开发者,还适合那些管理与维护Node.js基础架构的工作人员.遵循本文提供的这些建议,能够让你更好的进行日常的开发工作. St ...
20145304 Java第四周学习报告
20145304<Java程序设计>第四周学习总结教材学习内容总结 1.继承共同行为: 继承基本上就是避免多个类间重复定义共同行为,关键词为extends. 代码如下: //继承共同行为 ...
3分钟wamp安装redis扩展超级简单
windows10(win8.1等系统应该是一样的) wampserver2.5 -Apache-2.4.9-Mysql-5.6.17-php5.5.12-64b 很简单只需要3步,主要是安装redi ...
Android -- ListView(SimpleAdapter) 自定义适配器
aaarticlea/jpeg;base64,/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBA ...
Linux进程含义知多少
理想情况下,您应该明白在您的系统中运行的每一个进程.要获得所有进程的列表,可以执行命令 ps -ef(POSIX 风格)或 ps ax(BSD 风格).进程名有方括号的是内核级的进程,执行辅助功能(比 ...
linux 运行可执行文件version `GLIBC_2.17' not found
http://www.cnblogs.com/q191201771/p/3875316.html root@socfpga:/media/ram/nfs/dvb# ./a.out ./a.: vers ...

Indexing and Hashing

Indexing and Hashing的更多相关文章

随机推荐

热门专题