Trie树
       Trie树,就是字母树。Trie树是多叉树,每个节点为一个字母。其根节点为象征节点(就是说没有含义,但是存在这个节点),从根节点开始建立,每个节点至多为26个子节点(不要我说为什么吧),这样,我们就可以用这种方便快捷的方式存储字符串。其应用也不言而喻,用于保存,统计,排序,查找大量字符串。因为很简单,我们不讲太多,根据图像,自己造几个字符串,慢慢理解,看看代码,一下就懂了。

       如图所示,该字符串保存了say,she,shr,her四个字符串。有个小小的问题:在建树的时候,我们注意到最坏情况可能为二十六叉树,空间复杂度可想而知。所以,如果用指针可能更省空间。
 
 
 

Applications 应用

Trie (we pronounce "try") or prefix tree is a tree data structure, which is used for retrieval of a key in a dataset of strings. There are various applications of this very efficient data structure such as :

1. Autocomplete

Figure 1. Google Suggest in action.

2. Spell checker

Figure 2. A spell checker used in word processor.

3. IP routing (Longest prefix matching)

Figure 3. Longest prefix matching algorithm uses Tries in Internet Protocol (IP) routing to select an entry from a forwarding table.

4. T9 predictive text

Figure 4. T9 which stands for Text on 9 keys, was used on phones to input texts during the late 1990s.

5. Solving word games

Figure 5. Tries is used to solve Boggle efficiently by pruning the search space.

There are several other data structures, like balanced trees and hash tables, which give us the possibility to search for a word in a dataset of strings. Then why do we need trie? Although hash table has O(1)O(1) time complexity for looking for a key, it is not efficient in the following operations :

平衡树根hash表可以实现字符串搜索,为什么还需要trie?在以下操作时很低效

  • Finding all keys with a common prefix.
  • Enumerating a dataset of strings in lexicographical order.
  • 寻找所有keys的共同前缀
  • 以编辑顺序枚举所有的字符串

Another reason why trie outperforms hash table, is that as hash table increases in size, there are lots of hash collisions and the search time complexity could deteriorate to O(n)O(n), where nn is the number of keys inserted. Trie could use less space compared to Hash Table when storing many keys with the same prefix. In this case using trie has only O(m)O(m) time complexity, where mm is the key length. Searching for a key in a balanced tree costs O(m \log n)O(mlogn) time complexity.

另一个原因,key变多之后,hash表会不断变大,导致冲突,时间复杂度会退化到O(n).

Trie node structure

Trie is a rooted tree. Its nodes have the following fields:

  • Maximum of RR links to its children, where each link corresponds to one of RR character values from dataset alphabet. In this article we assume that RR is 26, the number of lowercase latin letters.
  • Boolean field which specifies whether the node corresponds to the end of the key, or is just a key prefix.

Figure 6. Representation of a key "leet" in trie.

Insertion of a key to a trie

We insert a key by searching into the trie. We start from the root and search a link, which corresponds to the first key character. There are two cases :

  • A link exists. Then we move down the tree following the link to the next child level. The algorithm continues with searching for the next key character.
  • A link does not exist. Then we create a new node and link it with the parent's link matching the current key character. We repeat this step until we encounter the last character of the key, then we mark the current node as an end node and the algorithm finishes.

Figure 7. Insertion of keys into a trie.

 

Complexity Analysis

  • Time complexity : O(m)O(m), where m is the key length.

In each iteration of the algorithm, we either examine or create a node in the trie till we reach the end of the key. This takes only mm operations.

  • Space complexity : O(m)O(m).

In the worst case newly inserted key doesn't share a prefix with the the keys already inserted in the trie. We have to add mm new nodes, which takes us O(m)O(m) space.

Search for a key in a trie

Each key is represented in the trie as a path from the root to the internal node or leaf. We start from the root with the first key character. We examine the current node for a link corresponding to the key character. There are two cases :

  • A link exist. We move to the next node in the path following this link, and proceed searching for the next key character.
  • A link does not exist. If there are no available key characters and current node is marked as isEnd we return true. Otherwise there are possible two cases in each of them we return false :

    • There are key characters left, but it is impossible to follow the key path in the trie, and the key is missing.
    • No key characters left, but current node is not marked as isEnd. Therefore the search key is only a prefix of another key in the trie.

Figure 8. Search for a key in a trie.

 

Complexity Analysis

  • Time complexity : O(m)O(m) In each step of the algorithm we search for the next key character. In the worst case the algorithm performs mm operations.

  • Space complexity : O(1)O(1)

Search for a key prefix in a trie

The approach is very similar to the one we used for searching a key in a trie. We traverse the trie from the root, till there are no characters left in key prefix or it is impossible to continue the path in the trie with the current key character. The only difference with the mentioned above search for a key algorithm is that when we come to an end of the key prefix, we always return true. We don't need to consider the isEnd mark of the current trie node, because we are searching for a prefix of a key, not for a whole key.

Figure 9. Search for a key prefix in a trie.

Complexity Analysis

  • Time complexity : O(m)O(m)

  • Space complexity : O(1)O(1)

Practice Problems

Here are some wonderful problems for you to practice which uses the Trie data structure.

  1. Add and Search Word - Data structure design - Pretty much a direct application of Trie.
  2. Word Search II - Similar to Boggle.

Analysis written by: @elmirap.

字典树 trie的更多相关文章

  1. [POJ] #1002# 487-3279 : 桶排序/字典树(Trie树)/快速排序

    一. 题目 487-3279 Time Limit: 2000MS   Memory Limit: 65536K Total Submissions: 274040   Accepted: 48891 ...

  2. 『字典树 trie』

    字典树 (trie) 字典树,又名\(trie\)树,是一种用于实现字符串快速检索的树形数据结构.核心思想为利用若干字符串的公共前缀来节约储存空间以及实现快速检索. \(trie\)树可以在\(O(( ...

  3. 字典树trie学习

    字典树trie的思想就是利用节点来记录单词,这样重复的单词可以很快速统计,单词也可以快速的索引.缺点是内存消耗大 http://blog.csdn.net/chenleixing/article/de ...

  4. 字典树(Trie)详解

    详解字典树(Trie) 本篇随笔简单讲解一下信息学奥林匹克竞赛中的较为常用的数据结构--字典树.字典树也叫Trie树.前缀树.顾名思义,它是一种针对字符串进行维护的数据结构.并且,它的用途超级广泛.建 ...

  5. 字典树(Trie Tree)

    在图示中,键标注在节点中,值标注在节点之下.每一个完整的英文单词对应一个特定的整数.Trie 可以看作是一个确定有限状态自动机,尽管边上的符号一般是隐含在分支的顺序中的.键不需要被显式地保存在节点中. ...

  6. 字典树(Trie树)实现与应用

    一.概述 1.基本概念 字典树,又称为单词查找树,Tire数,是一种树形结构,它是一种哈希树的变种. 2.基本性质 根节点不包含字符,除根节点外的每一个子节点都包含一个字符 从根节点到某一节点.路径上 ...

  7. 字典树(Trie树)的实现及应用

    >>字典树的概念 Trie树,又称字典树,单词查找树或者前缀树,是一种用于快速检索的多叉树结构,如英文字母的字典树是一个26叉树,数字的字典树是一个10叉树.与二叉查找树不同,Trie树的 ...

  8. 字典树trie的学习与练习题

    博客详解: http://www.cnblogs.com/huangxincheng/archive/2012/11/25/2788268.html http://eriol.iteye.com/bl ...

  9. [转载]字典树(trie树)、后缀树

    (1)字典树(Trie树) Trie是个简单但实用的数据结构,通常用于实现字典查询.我们做即时响应用户输入的AJAX搜索框时,就是Trie开始.本质上,Trie是一颗存储多个字符串的树.相邻节点间的边 ...

  10. Codevs 4189 字典(字典树Trie)

    4189 字典 时间限制: 1 s 空间限制: 256000 KB 题目等级 : 大师 Master 传送门 题目描述 Description 最经,skyzhong得到了一本好厉害的字典,这个字典里 ...

随机推荐

  1. 7 -- Spring的基本用法 -- 4... 使用 Spring 容器:Spring 容器BeanFactory、ApplicationContext;ApplicationContext 的国际化支持;ApplicationContext 的事件机制;让Bean获取Spring容器;Spring容器中的Bean

    7.4 使用 Spring 容器 Spring 有两个核心接口:BeanFactory 和 ApplicationContext,其中ApplicationContext 是 BeanFactory ...

  2. Python 数据类型:字典

    一.字典简介 1. 字典由键值对组成,每个键与值用冒号隔开,每对用逗号分割,整体放在花括号中,如 {"name": "Tom", "age" ...

  3. 【MySQL】查询时强制区分大小写的方法

    MySQL默认的查询也不区分大小写.但作为用户信息,一旦用户名重复,又会浪费很多资源.再者,李逵.李鬼的多起来,侦辨起来很困难.要做到这一点,要么在建表时,明确大小写敏感(字段明确大小写敏感) sql ...

  4. 【python系列】SyntaxError:Missing parentheses in call to 'print'

    打印python2和python3的区别 如上图所示,我的 PyCharm安装的是python3.6如果使用print 10会出现语法错误,这是python2.x和python3.x的区别所导致的.

  5. Android 简单案例:可移动的View

    CrossCompatibility.rar 1. VersionedGestureDetector.java import android.content.Context; import andro ...

  6. sencha touch 入门系列 (九) sencha touch 布局layout

    布局用来描述你应用程序中组件的大小和位置,在sencha touch中,为我们提供了下面几种布局: 1.HBox: HBox及horizontal box布局,我们这里将其称为水平布局,下面是一段演示 ...

  7. 【BZOJ4418】[Shoi2013]扇形面积并 扫描线+线段树

    [BZOJ4418][Shoi2013]扇形面积并 Description 给定N个同心的扇形,求有多少面积,被至少K个扇形所覆盖. Input 第一行是三个整数n,m,k.n代表同心扇形的个数,m用 ...

  8. HTML5+CSS3 表格设计(Table)

    <style> body { width: 600px; margin: 40px auto; font-family: 'trebuchet MS', 'Lucida sans', Ar ...

  9. Sql Server 关于列名带中括号"[]"的问题

    1.如果列名为数据库的关键字则自动加上中括号“[]” 例如[level] 2.如果列名中带有特殊符号.[date(a)] 数据存储的过程: 1.在添加数据的时候:要带有中括号,有必要在添加参数的时候不 ...

  10. java设计模式----迭代子模式

    顺序访问聚集中的对象,主要用于集合中.一是需要遍历的对象,即聚集对象,二是迭代器对象,用于对聚集对象进行遍历访问. 迭代子模式为遍历集合提供了统一的接口方法.从而使得客户端不需要知道聚集的内部结构就能 ...