source address:http://en.wikipedia.org/wiki/Radix_tree

In computer science, a radix tree (also patricia trie or radix trie or compact prefix tree) is a space-optimized trie data structure where each node with only one child is merged with its child. The result is that every internal node has up to the number of children of the radix r of the radix trie, where r is a positive integer and a power x of 2, having x ≥ 1. Unlike in regular tries, edges can be labeled with sequences of elements as well as single elements. This makes them much more efficient for small sets (especially if the strings are long) and for sets of strings that share long prefixes.

Unlike regular trees (where whole keys are compared en masse from their beginning up to the point of inequality), the key at each node is compared chunk-of-bits by chunk-of-bits, where the quantity of bits in that chunk at that node is the radix r of the radix trie. When the r is 2, the radix trie is binary (i.e., compare that node's 1-bit portion of the key), which minimizes sparseness at the expense of maximizing trie depth—i.e., maximizing up to conflation of nondiverging bit-strings in the key. When r is an integer power of 2 greater or equal to 4, then the radix trie is an r-ary trie, which lessens the depth of the radix trie at the expense of potential sparseness.

As an optimization, edge labels can be stored in constant size by using two pointers to a string (for the first and last elements).[1]

Note that although the examples in this article show strings as sequences of characters, the type of the string elements can be chosen arbitrarily; for example, as a bit or byte of the string representation when using multibyte character encodings or Unicode.

Applications

As mentioned, radix trees are useful for constructing associative arrays with keys that can be expressed as strings. They find particular application in the area of IP routing, where the ability to contain large ranges of values with a few exceptions is particularly suited to the hierarchical organization of IP addresses.[2] They are also used for inverted indexes of text documents in information retrieval.

Operations

Radix trees support insertion, deletion, and searching operations. Insertion adds a new string to the trie while trying to minimize the amount of data stored. Deletion removes a string from the trie. Searching operations include (but are not necessarily limited to) exact lookup, find predecessor, find successor, and find all strings with a prefix. All of these operations are O(k) where k is the maximum length of all strings in the set, where length is measured in the quantity of bits equal to the radix of the radix trie.

Lookup

Finding a string in a Patricia trie

The lookup operation determines if a string exists in a trie. Most operations modify this approach in some way to handle their specific tasks. For instance, the node where a string terminates may be of importance. This operation is similar to tries except that some edges consume multiple elements.

The following pseudo code assumes that these classes exist.

Edge

  • Node targetNode
  • string label

Node

  • Array of Edges edges
  • function isLeaf()
function lookup(string x)
{
// Begin at the root with no elements found
Node traverseNode := root;
int elementsFound := 0; // Traverse until a leaf is found or it is not possible to continue
while (traverseNode != null && !traverseNode.isLeaf() && elementsFound < x.length)
{
// Get the next edge to explore based on the elements not yet found in x
Edge nextEdge := select edge from traverseNode.edges where edge.label is a prefix of x.suffix(elementsFound)
// x.suffix(elementsFound) returns the last (x.length - elementsFound) elements of x // Was an edge found?
if (nextEdge != null)
{
// Set the next node to explore
traverseNode := nextEdge.targetNode; // Increment elements found based on the label stored at the edge
elementsFound += nextEdge.label.length;
}
else
{
// Terminate loop
traverseNode := null;
}
} // A match is found if we arrive at a leaf node and have used up exactly x.length elements
return (traverseNode != null && traverseNode.isLeaf() && elementsFound == x.length);
}

Insertion

To insert a string, we search the tree until we can make no further progress. At this point we either add a new outgoing edge labeled with all remaining elements in the input string, or if there is already an outgoing edge sharing a prefix with the remaining input string, we split it into two edges (the first labeled with the common prefix) and proceed. This splitting step ensures that no node has more children than there are possible string elements.

Several cases of insertion are shown below, though more may exist. Note that r simply represents the root. It is assumed that edges can be labelled with empty strings to terminate strings where necessary and that the root has no incoming edge.

  • Insert 'water' at the root

  • Insert 'slower' while keeping 'slow'

  • Insert 'test' which is a prefix of 'tester'

  • Insert 'team' while splitting 'test' and creating a new edge label 'st'

  • Insert 'toast' while splitting 'te' and moving previous strings a level lower

Deletion

To delete a string x from a tree, we first locate the leaf representing x. Then, assuming x exists, we remove the corresponding leaf node. If the parent of our leaf node has only one other child, then that child's incoming label is appended to the parent's incoming label and the child is removed.

Additional operations

  • Find all strings with common prefix: Returns an array of strings which begin with the same prefix.
  • Find predecessor: Locates the largest string less than a given string, by lexicographic order.
  • Find successor: Locates the smallest string greater than a given string, by lexicographic order.

History

Donald R. Morrison first described what he called "Patricia trees" in 1968;[3] the name comes from the acronym PATRICIA, which stands for "Practical Algorithm To Retrieve Information Coded In Alphanumeric". Gernot Gwehenberger independently invented and described the data structure at about the same time.[4] PATRICIA tries are radix tries with radix equals 2, which means that each bit of the key is compared individually and each node is a two-way (i.e., left versus right) branch.

Comparison to other data structures

(In the following comparisons, it is assumed that the keys are of length k and the data structure contains n members.)

Unlike balanced trees, radix trees permit lookup, insertion, and deletion in O(k) time rather than O(log n). This doesn't seem like an advantage, since normallyk ≥ log n, but in a balanced tree every comparison is a string comparison requiring O(k) worst-case time, many of which are slow in practice due to long common prefixes (in the case where comparisons begin at the start of the string). In a trie, all comparisons require constant time, but it takes m comparisons to look up a string of length m. Radix trees can perform these operations with fewer comparisons, and require many fewer nodes.

Radix trees also share the disadvantages of tries, however: as they can only be applied to strings of elements or elements with an efficiently reversible mapping to strings, they lack the full generality of balanced search trees, which apply to any data type with a total ordering. A reversible mapping to strings can be used to produce the required total ordering for balanced search trees, but not the other way around. This can also be problematic if a data type onlyprovides a comparison operation, but not a (de)serialization operation.

Hash tables are commonly said to have expected O(1) insertion and deletion times, but this is only true when considering computation of the hash of the key to be a constant time operation. When hashing the key is taken into account, hash tables have expected O(k) insertion and deletion times, but may take longer in the worst-case depending on how collisions are handled. Radix trees have worst-case O(k) insertion and deletion. The successor/predecessor operations of radix trees are also not implemented by hash tables.

Variants

A common extension of radix trees uses two colors of nodes, 'black' and 'white'. To check if a given string is stored in the tree, the search starts from the top and follows the edges of the input string until no further progress can be made. If the search-string is consumed and the final node is a black node, the search has failed; if it is white, the search has succeeded. This enables us to add a large range of strings with a common prefix to the tree, using white nodes, then remove a small set of "exceptions" in a space-efficient manner by inserting them using black nodes.

The HAT-trie is a radix tree based cache-conscious data structure that offers efficient string storage and retrieval, and ordered iterations. Performance, with respect to both time and space, is comparable to the cache-conscious hashtable.[5][6] See HAT trie implementation notes at [1]

Radix tree--reference的更多相关文章

  1. Trie / Radix Tree / Suffix Tree

    Trie (字典树) "A", "to", "tea", "ted", "ten", "i ...

  2. 基数树(radix tree)

    原文   基数(radix)树 Linux基数树(radix tree)是将指针与long整数键值相关联的机制,它存储有效率,并且可快速查询,用于指针与整数值的映射(如:IDR机制).内存管理等.ID ...

  3. Linux内核Radix Tree(二)

    1.   并发技术 由于需要页高速缓存是全局的,各进程不停的访问,必须要考虑其并发性能,单纯的对一棵树使用锁导致的大量争用是不能满足速度需要的,Linux中是在遍历树的时候采用一种RCU技术,来实现同 ...

  4. Linux内核Radix Tree(一)

    一.概述 Linux radix树最广泛的用途是用于内存管理,结构address_space通过radix树跟踪绑定到地址映射上的核心页,该radix树允许内存管理代码快速查找标识为dirty或wri ...

  5. Linux 内核中的数据结构:基数树(radix tree)

    转自:https://www.cnblogs.com/wuchanming/p/3824990.html   基数(radix)树 Linux基数树(radix tree)是将指针与long整数键值相 ...

  6. PART(Persistent Adaptive Radix Tree)的Java实现源码剖析

    论文地址 Adaptive Radix Tree: https://db.in.tum.de/~leis/papers/ART.pdf Persistent Adaptive Radix Tree: ...

  7. 一步一步分析Gin框架路由源码及radix tree基数树

    Gin 简介 Gin is a HTTP web framework written in Go (Golang). It features a Martini-like API with much ...

  8. Red–black tree ---reference wiki

    source address:http://en.wikipedia.org/wiki/Red%E2%80%93black_tree A red–black tree is a type of sel ...

  9. Linux内核Radix Tree(三):API介绍

    1.     单值查找radix_tree_lookup 函数radix_tree_lookup执行查找操作,查找方法是:从叶子到树顶,通过数组索引键值值查看数组元素的方法,一层层地查找slot.其列 ...

  10. The router relies on a tree structure which makes heavy use of common prefixes, it is basically a compact prefix tree (or just Radix tree).

    https://github.com/julienschmidt/httprouter/

随机推荐

  1. Android学习笔记 - 开始

    因为项目需求,要在Android上开发一个证件识别软件,项目时间 9/10- 9/30 工作内容: (1)修改证件识别库 (2)移植证件识别库至Android (3)开发一个Android应用程序 学 ...

  2. LogNet4

    ASP.Net MVC 项目中添加LogNet4 1,创建ASP.NET MVC项目 2,NuGet或者直接下载log4net.dll 并安装 3 在配置文件 web.config 加入 如下代码 & ...

  3. 以太坊系列之十七: 使用web3进行合约部署调用以及监听

    以太坊系列之十七: 使用web3进行智能合约的部署调用以及监听事件(Event) 上一篇介绍了使用golang进行智能合约的部署以及调用,但是使用go语言最大的一个问题是没法持续监听事件的发生. 比如 ...

  4. Flink学习笔记:Connectors概述

    本文为<Flink大数据项目实战>学习笔记,想通过视频系统学习Flink这个最火爆的大数据计算框架的同学,推荐学习课程: Flink大数据项目实战:http://t.cn/EJtKhaz ...

  5. win7下钩子失效解决方案

    win7键盘钩子失效解决方法:1.win开始右键+r(运行) 2.将其输入regedit.exe(注册表管理器),回车打开注册表管理器 3.进入HKEY_LOCAL_MACHINE4.进入到SYS ...

  6. OpenStack-Mitaka

    一.Cloud 基础概念 IAAS:Infrastructre As A Service 基础架构及服务,OpenStack,CloudStack PAAS:Platform As A Service ...

  7. ORACLE的查询语句

    oracle的select查询语句(DQL): 语法: select //查询动作关键字 [distinct|all] //描述列表字段中的数据是否去除记录 select_list //需要查询的字段 ...

  8. 【数学】【筛素数】Miller-Rabin素性测试 学习笔记

        Miller-Rabin是一种高效的随机算法,用来检测一个数$p$是否是素数,最坏时间复杂度为$\log^3 p$,正确率约为$1-4^{-k}$,$k$是检验次数. 一.来源     Mil ...

  9. gym101964G Matrix Queries seerc2018k题 cdq分治

    题目传送门 题目大意: 二维平面上有q次操作,每次操作可以是添加一个点,也可以是添加一个矩形,问每次操作后,有多少  点-矩形  这样的pair,pair的条件是点被矩形覆盖(边缘覆盖也算). 思路: ...

  10. bzoj 1189 二分+最大流

    题目传送门 思路: 先预处理出每个人到每扇门的时间,用门作为起点进行bfs处理. 然后二分时间,假设时间为x,将每扇门拆成1到x,x个时间点,表示这扇门有几个时间点是可以出去的.对于一扇门,每个时间点 ...