Prefix tree

The trie, or prefix tree, is a data structure for storing strings or other sequences in a way that allows for a fast look-up. In its simplest form it can be used as a list of keywords or a dictionary.
By associating each string with an object it can be used as an alternative to a hashmap. The name 'trie' comes from the word 'retrieval'.

The basic idea behind a trie is that each successive letter is stored as a separate node. To find out if the word 'cat' is in the list you start at the root and look up the 'c' node. Having found
the 'c' node you search the list of c's children for an 'a' node, and so on. To differentiate between 'cat' and 'catalog' each word is ended by a special delimiter.

The figure below shows a schematic representation of a partial trie:

Implementation

The fastest way to implement this is with fixed size arrays. Unfortunately this only works if you know which characters can show up in the sequences. For keywords with 26 letters its a fast but space
consuming option, for unicode strings its pretty much impossible.

Instead of fixed sizes arrays you can use a linked list at each node. This has obvious space advantages, since no more empty spaces are stored. Unfortunately searching a long linked list is rather
slow. For example to find the word 'zzz' you might need 3 times 26 steps.

Faster trie algorithms have been devised that lie somewhere between these two extremes in terms of speed and space consumption. These can be found by searching google.

Fun & games with prefix trees

Prefix trees are a bit of an overlooked data structure with lots of interesting possibilities.

Storage

By storing values at each leaf node you can use them as a kind of alternative hashmap, although when working with unicode strings a hashmap will greatly outperform a trie.

As a dictionary

Looking up if a word is in a trie takes O(n) operations, where n is the length of the word. Thus - for array implementations - the lookup speed doesn't change with increasing trie size.

Word completion

Word completion is straightforward to implement using a trie: simply find the node corresponding to the first few letters, and then collape the subtree into a list of possible endings.

This can be used in autocompleting user input in text editors or the T9 dictionary on your phone

Censoring strings

Given a large list of swear words and a string to censor a trie offers a speed advantage over a simple array of strings. If the swear word can appear anywhere in the string you'll need to attempt
to match it from any possible starting offset. With a string of m characters and a list of n words this would mean m*n string comparisons.

Using a trie you can attempt to find a match from each given offset in the string, this means m trie lookups. Since the speed of a trie lookup scales well with an increasing number of words this is
considerably faster than the array lookup.

Java linked list implementation

Just for fun, here's a java linked list implementation. Keep in mind that this is a fairly slow implementation. For serious speed boosts you'll need to investigate double or triple-array tries.

Please note: the version below is a simplified version intended only to give some insight into the workings of the Trie. For the full version please see theDownloads
section
.

publicclass Trie

{

    /**

     * The delimiter used in this word to tell where words end. Without a proper delimiter either A.

     * a lookup for 'win' would return false if the list also contained 'windows', or B. a lookup

     * for 'mag' would return true if the only word in the list was 'magnolia'

     *

     * The delimiter should never occur in a word added to the trie.

     */

    public
final static
char DELIMITER = '\u0001';



    /**

     * Creates a new Trie.

     */

    public Trie()

    {

        root =
new Node('r');

        size = 0;

    }



    /**

     * Adds a word to the list.

     * @param word The word to add.

     * @return True if the word wasn't in the list yet

     */

    public
boolean add(String word)

    {

        if (add(root, word+ DELIMITER,
0))

        {

            size++;

            int n
= word.length();

            if
(n > maxDepth) maxDepth
= n;

            return
true;

        }

        return
false;

    }



    /*

     * Does the real work of adding a word to the trie

     */

    private
boolean add(Node root, String word,int offset)

    {

        if (offset== word.length())return
false;

        int c
= word.charAt(offset);



        // Search for node to add to

        Node last =
null, next = root.firstChild;

        while
(next !=
null)

        {

            if
(next.value < c)

            {

                // Not found yet, continue searching

                last = next;

                next = next.nextSibling;

            }

            else
if (next.value
== c)

            {

                // Match found, add remaining word to this node

                return add(next, word, offset+
1);

            }

            // Because of the ordering of the list getting here means we won't

            // find a match

            else
break;

        }



        // No match found, create a new node and insert

        Node node =
new Node(c);

        if (last==
null)

        {

            // Insert node at the beginning of the list (Works for next == null

            // too)

            root.firstChild = node;

            node.nextSibling = next;

        }

        else

        {

            // Insert between last and next

            last.nextSibling = node;

            node.nextSibling = next;

        }



        // Add remaining letters

        for (int i= offset
+ 1; i< word.length(); i++)

        {

            node.firstChild =new Node(word.charAt(i));

            node = node.firstChild;

        }

        return
true;

    }



    /**

     * Searches for a word in the list.

     *

     * @param word The word to search for.

     * @return True if the word was found.

     */

    public
boolean isEntry(String word)

    {

        if (word.length()==
0)

            throw
new IllegalArgumentException("Word can't be empty");

        return isEntry(root, w+ DELIMITER,
0);

    }



    /*

     * Does the real work of determining if a word is in the list

     */

    private
boolean isEntry(Node root,
String word, int offset)

    {

        if (offset== word.length())return
true;

        int c
= word.charAt(offset);



        // Search for node to add to

        Node next = root.firstChild;

        while
(next !=
null)

        {

            if
(next.value < c) next= next.nextSibling;

            else
if (next.value
== c)
return isEntry(next, word, offset +1);

            else
return false;

        }

        return
false;

    }



    /**

     * Returns the size of this list;

     */

    public
int size()

    {

        return size;

    }



    /**

     * Returns all words in this list starting with the given prefix

     *

     * @param prefix The prefix to search for.

     * @return All words in this list starting with the given prefix, or if no such words are found,

     *         an array containing only the suggested prefix.

     */

    public
String[] suggest(String prefix)

    {

        return suggest(root, prefix,0);

    }



    /*

     * Recursive function for finding all words starting with the given prefix

     */

    private
String[] suggest(Node root,String word,
int offset)

    {

        if (offset== word.length())

        {

            ArrayList<String> words
= new ArrayList<String>(size);

            char[] chars=
new
char[maxDepth];

            for
(int i
= 0; i < offset; i++)

                chars[i]
= word.charAt(i);

            getAll(root, words, chars, offset);

            return words.toArray(newString[words.size()]);

        }

        int c
= word.charAt(offset);



        // Search for node to add to

        Node next = root.firstChild;

        while
(next !=
null)

        {

            if
(next.value < c) next= next.nextSibling;

            else
if (next.value
== c)
return suggest(next, word, offset +1);

            else
break;

        }

        return
new String[]{ word
};

    }



    /**

     * Searches a string for words present in the trie and replaces them with stars (asterixes).

     * @param z The string to censor

     */

    public
String censor(String s)

    {

        if (size==
0)
return s;

        String z = s.toLowerCase();

        int n
= z.length();

        StringBuilder buffer =
new StringBuilder(n);

        int match;

        char star
= '*';

        for (int i=
0; i < n;)

        {

            match = longestMatch(root, z, i,0,
0);

            if
(match > 0)

            {

                for
(int j
= 0; j < match; j++)

                {

                    buffer.append(star);

                    i++;

                }

            }

            else

            {

                buffer.append(s.charAt(i++));

            }

        }

        return buffer.toString();

    }



    /*

     * Finds the longest matching word in the trie that starts at the given offset...

     */

    private
int longestMatch(Node root,
String word, int offset,int depth,
int maxFound)

    {

        // Uses delimiter = first in the list!

        Node next = root.firstChild;

        if (next.value== DELIMITER) maxFound
= depth;

        if (offset== word.length())return
maxFound;

        int c
= word.charAt(offset);



        while
(next !=
null)

        {

            if
(next.value < c) next= next.nextSibling;

            else
if (next.value
== c)
return longestMatch(next, word,

                offset + 1, depth
+ 1, maxFound);

            else
return maxFound;

        }

        return maxFound;

    }



    /*

     * Represents a node in the trie. Because a node's children are stored in a linked list this

     * data structure takes the odd structure of node with a firstChild and a nextSibling.

     */

    private
class Node

    {

        public
int value;

        public Node firstChild;

        public Node nextSibling;



        public Node(int value)

        {

            this.value= value;

            firstChild =
null;

            nextSibling =
null;

        }

    }





    private Node root;

    private
int size;

    private
int maxDepth; // Not exact, but bounding for the maximum

}

Please note: the code given above is intended only to give some insight into the workings of the Trie. For the full version of the class please see theDownloads
section
.

Prefix tree的更多相关文章

  1. Leetcode: Implement Trie (Prefix Tree) && Summary: Trie

    Implement a trie with insert, search, and startsWith methods. Note: You may assume that all inputs a ...

  2. leetcode面试准备:Implement Trie (Prefix Tree)

    leetcode面试准备:Implement Trie (Prefix Tree) 1 题目 Implement a trie withinsert, search, and startsWith m ...

  3. 【LeetCode】208. Implement Trie (Prefix Tree)

    Implement Trie (Prefix Tree) Implement a trie with insert, search, and startsWith methods. Note:You ...

  4. [LeetCode] 208. Implement Trie (Prefix Tree) ☆☆☆

    Implement a trie with insert, search, and startsWith methods. Note:You may assume that all inputs ar ...

  5. 笔试算法题(39):Trie树(Trie Tree or Prefix Tree)

    议题:TRIE树 (Trie Tree or Prefix Tree): 分析: 又称字典树或者前缀树,一种用于快速检索的多叉树结构:英文字母的Trie树为26叉树,数字的Trie树为10叉树:All ...

  6. Trie树(Prefix Tree)介绍

    本文用尽量简洁的语言介绍一种树形数据结构 -- Trie树. 一.什么是Trie树 Trie树,又叫字典树.前缀树(Prefix Tree).单词查找树 或 键树,是一种多叉树结构.如下图: 上图是一 ...

  7. 字典树(查找树) leetcode 208. Implement Trie (Prefix Tree) 、211. Add and Search Word - Data structure design

    字典树(查找树) 26个分支作用:检测字符串是否在这个字典里面插入.查找 字典树与哈希表的对比:时间复杂度:以字符来看:O(N).O(N) 以字符串来看:O(1).O(1)空间复杂度:字典树远远小于哈 ...

  8. LeetCode208 Implement Trie (Prefix Tree). LeetCode211 Add and Search Word - Data structure design

    字典树(Trie树相关) 208. Implement Trie (Prefix Tree) Implement a trie with insert, search, and startsWith  ...

  9. 【leetcode】208. Implement Trie (Prefix Tree 字典树)

    A trie (pronounced as "try") or prefix tree is a tree data structure used to efficiently s ...

随机推荐

  1. Combiners和Partitioner编程

    Combiners的作用: 每一个map可能会产生大量的输出,combiner的作用就是在map端对输出先做一次合并,以减少传输到reducer的数据量. combiner最基本是实现本地key的归并 ...

  2. hive元数据库表分析及操作

    在安装Hive时,需要在hive-site.xml文件中配置元数据相关信息.与传统关系型数据库不同的是,hive表中的数据都是保存的HDFS上,也就是说hive中的数据库.表.分区等都可以在HDFS找 ...

  3. Android简易实战教程--第十九话《手把手教您监听EditText文本变化,实现抖动和震动的效果》

    昨晚写博客太仓促,代码结构有问题,早上测试发现没法监听文本变化!今日更改一下.真心见谅啦,哈哈!主活动的代码已经改好了,看截图这次的确实现了文本监听变化情况. 监听文本输入情况,仅仅限于土司略显low ...

  4. Android必知必会-Fragment监听返回键事件

    如果移动端访问不佳,请尝试 Github版<–点击左侧 背景 项目要求用户注册成功后进入修改个人资料的页面,且不允许返回到上一个页面,资料修改完成后结束当前页面,进入APP主页. 由于是使用多个 ...

  5. JS 遍历对象 jQuery遍历对象

    jquery for 循环遍历对象的属性: //对象的定义如下: var person={id:"1",name:"springok",age:25}; for ...

  6. Tomcat内核之ASCII解码的表驱动模式

    我们知道Tomcat通信是建立在Socket的基础上,而套接字在服务器端和客户端传递的报文都是未经过编码的字节流,每8位组成1个字节,计算机以二进制为基础,这是由于使用晶体管的开合状态表示1和0,这样 ...

  7. Coroutine协同程序介绍(Unity3D开发之三)

    猴子原创,欢迎转载.转载请注明: 转载自Cocos2D开发网–Cocos2Dev.com,谢谢! 原文地址: http://www.cocos2dev.com/?p=496 Coroutine在Uni ...

  8. Activity之间的数据传递-android学习之旅(四十七)

    activity之间的数据传递主要有两种,一种是直接发送数据,另一种接受新启动的activity返回的数据,本质是一样的 使用Bundle传递数据 Intent使用Bundle在activity之间传 ...

  9. JAVA之旅(二十六)——装饰设计模式,继承和装饰的区别,LineNumberReader,自定义LineNumberReader,字节流读取操作,I/O复制图片

    JAVA之旅(二十六)--装饰设计模式,继承和装饰的区别,LineNumberReader,自定义LineNumberReader,字节流读取操作,I/O复制图片 一.装饰设计模式 其实我们自定义re ...

  10. 《java入门第一季》之泛型引入

    泛型的引入: 首先看一段代码体会自动报错. // 看下面这个代码 自动报错 String[] strArray = new String[3]; strArray[0] = "hello&q ...