Study notes for B-tree and R-tree

B-tree

B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time.
- B-trees are balanced search trees: height $h\le log_t \frac{n+1}{2} \sim O(log_tn)$ for the worst case, where t >2 is the order of tree, i.e., the maximum number of pointers for each node.
- Note that t is typically set so that one node fits into one disk block or page.
B-tree is a generalization of a binary search tree (i.e., a multiway tree) in that a node can have more than two children.
- Similar to red-black trees, but show better performance on disk I/O operations.
- Binary trees may be useful for rapid searching in main memory, but not appropriate for data stored on disks.
- When accessing data on a disk, an entire block (or page) is input at once, so it makes sense to design the tree so that each node essentially occupies one entire block.
B-tree is optimized for systems that read and write large blocks of data. B-trees (and its variants) are commonly used in databases and file systems.
Structure: every node x has four fields
1. The number of keys currently stored in node x, i.e., n, which is between [t/2]-1 and t-1.
2. The n keys themselves, stored in non-decreasing order:
  $key_1\le key_2\le\ldots\le key_{n}$
3. A boolean value
  $leaf[x]=\left\{\begin{array}{ll}True & \mbox{if } x \mbox{ is a leaf,} \\ False & \mbox{if } x \mbox{ is an internal node.}\end{array}\right.$
4. n+1 pointers: $c_1, c_2, \ldots, c_{n+1}$ to its children, represented by:
Properties:
- All leaves have the same height, which is the tree's height h.
- B-tree guarantees a storage utilization of at least 50%, i.e., at least half of each allocated page actually stores index entries.
- There are upper and lower bounds on the number of keys on a node.
  - Lower bound: every node other than root must have at least t-1 keys => at least t children
  - Upper bound: every node can contain at most 2t-1 keys => every internal node has at most 2t children.
  - Example is shown as follows:
Conventions:
- Root of B-tree is always in main memory
- Any node pased as parameter must have had a Disk-Read operation performed on them.

B+-tree

A B-tree is very efficient with respect to search and modification operations that involve a single record.
But it is not particularly suited for sequential operations nor for range searches, B+-tree is to solve this issue.
B+-tree is the most widely used index structure for databases.
Main idea:
- The leaf nodes contain all the key values (and the associated information)
- The internal nodes (organized as a B-tree) store some separators which have the only function of determining the path to follow during searching
- The leaf nodes are linked in a (doubly linked) list, in order to efficiently support range searches or sequential searches.
Comparison with B-trees:
- The search of a single key value is in general more expensive in a B+-tree because we have always to reach a leaf node to fetch the pointer to the data file.
- For operations requiring an ordering of the retrieved records according to the search key values or for range queries, the B+-tree is to be preferred.
- The B-tree requires less storage since the key values are stored only once.

B*-tree

B*-tree is a variation of the B+-tree where the storage utilization for nodes must be at least 66% (2/3) instead of 50%.
The non-root and non-leaf nodes of B*-tree contain pointers to sibling nodes.

R-tree

The B-tree and its variants are useful to index and search data in one-dimensional space (where data is stored on disks rather than main memory). The basic idea is to separate a line into several segments and gradually reduce to the minimum segment where the searched data is located, illustrated as follows (figure is obtained from July et al.'s blog):
However, for high-dimensional data, B-tree and its variants are not efficient. Other tree index structures such as R-tree, kd-tree are more suited in this case.
R-tree is a generalization of B-tree for indexing and searching multi-dimensional data such as geographical coordinates, rectangles or polygons.
A commonly real-world usage for an R-tree might be to store spatial objects such as restaurant locations, or the polygons that typical maps are made of scuh as streets, buildings, outlines of lakes, coastlines, etc, and then find answers quickly to queries such as "Find all museums within 2km of my current location". => It is useful for map.
Main points:
- The key idea is to group nearby objects and represent them with their minimum bounding rectangle in the next higher level of the tree.
- At the leaf level, each rectangle describes a single object; at higher levels, the aggregation of an increasing number of objects.
- R-tree is a balanced search tree (i.e., all leaf nodes are at the same height), organizes the data in pages, and is designed for storage on disk (as used in databases).
- R-tree only guarantees a minimum usage of 30-40%. The reason is the more complex balancing required for spatial data as opposed to linear data stored in B-trees.
- The key difficulty of R-tree is to build an efficient tree that on one hand is balanced, on the other hand the rectangles do not cover too much empty space and do not overlap too much.
- A typical R-tree is represented as follows (figure is originally from Wikipedia).

References

B-tree: http://en.wikipedia.org/wiki/B-tree
R-tree: http://en.wikipedia.org/wiki/R-tree
Lecture notes, CMSC 420, B-trees
Andreas Kaltenbrunner et al., B-trees
Other online tutorials
July et al. 从B 树、B+ 树、B* 树谈到R 树

Study notes for B-tree and R-tree的更多相关文章

SQLite R*Tree 模块测试
目录 SQLite R*Tree 模块测试 1.SQLite R*Tree 模块特性简介 2.SQLite R*Tree 模块简单测试代码 SQLite R*Tree 模块测试相关参考: MySQL ...
Machine Learning Algorithms Study Notes(2)--Supervised Learning
Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 本系列文章是Andrew Ng 在斯坦福的机器学习课程 CS 22 ...
Machine Learning Algorithms Study Notes(3)--Learning Theory
Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 本系列文章是Andrew Ng 在斯坦福的机器学习课程 CS 22 ...
Machine Learning Algorithms Study Notes(1)--Introduction
Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 目录 1 Introduction 1 1.1 ...
B-Tree、B+Tree和B*Tree
B-Tree(这儿可不是减号,就是常规意义的BTree) 是一种多路搜索树: 1.定义任意非叶子结点最多只有M个儿子:且M>2: 2.根结点的儿子数为[2, M]: 3.除根结点以外的非叶子结点 ...
【Luogu1501】Tree（Link-Cut Tree）
[Luogu1501]Tree(Link-Cut Tree) 题面洛谷题解 $LCT$版子题看到了顺手敲一下而已注意一下,别乘爆了 #include<iostream> #in ...
【BZOJ3282】Tree （Link-Cut Tree）
[BZOJ3282]Tree (Link-Cut Tree) 题面 BZOJ权限题呀,良心luogu上有题解 Link-Cut Tree班子提最近因为NOIP考炸了学科也炸了时间显然没有以后 ...
[LeetCode] Encode N-ary Tree to Binary Tree 将N叉树编码为二叉树
Design an algorithm to encode an N-ary tree into a binary tree and decode the binary tree to get the ...
平衡二叉树(Balanced Binary Tree 或 Height-Balanced Tree)又称AVL树
平衡二叉树(Balanced Binary Tree 或 Height-Balanced Tree)又称AVL树 (a)和(b)都是排序二叉树,但是查找(b)的93节点就需要查找6次,查找(a)的93 ...
WPF中的Visual Tree和Logical Tree与路由事件
1.Visual Tree和Logical TreeLogical Tree:逻辑树,WPF中用户界面有一个对象树构建而成,这棵树叫做逻辑树,元素的声明分层结构形成了所谓的逻辑树!!Visual Tr ...

随机推荐

spring bean管理笔记1
轻量级,无侵入 Bean管理 1 创建applicationContext.xml 2 配置被管理的Bean 3 获取Bean pom.xml配置 <dependency> <gro ...
cocos2d-x游戏开发系列教程-超级玛丽05-CMMenuScene
代码下载链接 http://download.csdn.net/detail/yincheng01/6864893 解压密码:c.itcast.cn 背景上一篇博文提到appDelegate,在该类 ...
性能超越 Redis 的 NoSQL 数据库 SSDB
idea's blog - 性能超越 Redis 的 NoSQL 数据库 SSDB 性能超越 Redis 的 NoSQL 数据库 SSDB C/C++语言编程, SSDB Views: 8091 | ...
Going Home（最大匹配km算法）
Going Home Time Limit: 1000MS Memory Limit: 65536K Total Submissions: 20115 Accepted: 10189 Desc ...
sublime2/3自总结经常使用快捷键（2的居多）
Ctrl+D 选词 (重复按快捷键,就可以继续向下同一时候选中下一个同样的文本进行同一时候编辑) Ctrl+鼠标左键能够同一时候选择要编辑的多处文本 Shift+鼠标右键(或使用鼠标中键)能够用鼠标 ...
db2 xml 转 table【XQuery系列】
版本号:DB2 Version 9.1 1.创建測试表,初始化数据 create table emp (doc XML); INSERT INTO EMP VALUES ('<dept bl ...
【 D3.js 入门系列 — 1 】第一个程序 HelloWorld
记得以前刚上大一学 C 语言的时候,写的第一个程序就是在控制台上输出 HelloWorld .当时很纳闷,为什么要输出这个.老师解释说所有学编程入门的第一个程序都是在屏幕上输出 HelloWorld, ...
1.unix网络编程基础知识
接触网络编程一年多了,最近在系统的学习vnp两本书,对基础知识做一些总结,希望理解的更透彻清晰,希望能有更多的沉淀. 1.套接口地址针对IPv4和IPv6地址族,分别定义了两种类型的套接口地址:so ...
笔记之Cyclone IV 第一卷第一章FPGA 器件系列概述
因为本人用的黑金四代开发板,中央芯片采用ALTERA的cycloneIV E,所以就此器件阅读altera官网资料,并做相应的笔记,以便于以后查阅 Cyclone IV 器件系列具有以下特性:■ 低成 ...
处理IIS报“由于 Web 服务器上的“ISAPI 和 CGI 限制”列表设置，无法提供您请求的页面”
“由于 Web 服务器上的“ISAPI 和 CGI 限制”列表设置,无法提供您请求的页面” 详细错误:HTTP 错误 404.2 - Not Found. 由于 Web 服务器上的“ISAPI 和 C ...

Study notes for B-tree and R-tree

B-tree

B+-tree

B*-tree

R-tree

References

Study notes for B-tree and R-tree的更多相关文章

随机推荐

热门专题