如何加速朴素查找算法? KMP,当然还有其他算法,后续介绍.

Knuth–Morris–Pratt string search algorithm

Start at LHS of string, string[0], trying to match pattern, working right.
Trying to match string[i] == pattern[j].

Given a search pattern, pre-build a table, next[j], showing, when there is a mismatch at pattern position j, where to reset j to.

If match fails, keep i same, reset j to position next[j].

How to build the table

Everything else below is just how to build the table.

Construct a table showing where to reset j to

If mismatch string[i] != pattern[0], just move string to i+1, j = 0
If mismatch string[i] != pattern[1], we leave i the same, j = 0
pattern = 10
string = ... 1100000
If mismatch string[i] != pattern[2], we leave i the same, and change j, but we need to consider repeats in pattern[0] .. pattern[1]
pattern = 110
string = ... 11100000
i stays same, j goes from 2 back to 1

pattern = 100
string = ... 10100000
i stays same, j goes from 2 back to 0
If mismatch string[i] != pattern[j], we leave i the same, and change j, but we need to consider repeats in pattern[0] .. pattern[j-1]

Given a certain pattern, construct a table showing where to reset j to.

Construct a table of next[j]

For each j, figure out:
next[j] = length of longest prefix in "pattern[0] .. pattern[j-1]" that matches the suffix of "pattern[1] .. pattern[j]”

next[j] = “最大匹配的子串的长度"
That is:

prefix must include pattern[0]
suffix must include pattern[j]
prefix and suffix are different

key

Example for pattern “ABABAC":

next[j] = length of longest prefix in "pattern[0] .. pattern[j-1]" that matches the suffix of "pattern[1] .. pattern[j]”

当j+1位与s[k]位比较,不匹配时

j'=next[j], j’和s[k]比较了,j’移到了原j+1的位置

j	0	1	2	3	4	5
substring 0 to j	A	AB	ABA	ABAB	ABABA	ABABAC
longest prefix-suffix match	none	none	A	AB	ABA	none
next[j]	0	0	1	2	3	0
notes	no prefix and suffix that are different i.e. next[0]=0 for all patterns

Given j, let n = next[j]
"pattern[0] .. pattern[n-1]" = "pattern[j-(n-1)] .. pattern[j]"

"pattern[0] .. pattern[next[j]-1]" = "pattern[j-(next[j]-1)] .. pattern[j]"

e.g. j = 4, n = 3,

"pattern[0] .. pattern[2]" = "pattern[2] .. pattern[4]"

If match fails at position j+1(compare with s[j+1]), keep i same, reset pattern to position n(next[j]).
Have already matched pattern[0] .. pattern[n-1], pattern[0] .. pattern[n-1]=pattern[1] .. pattern[n]

e.g. We have matched ABABA so far.
If next one fails, say we have matched ABA so far and then see if next one matches.
That is, keep i same, just reset j to 3 (= precisely length of longest prefix-suffix match)
Then, if match after ABA fails too, by the same rule we say we have matched A so far, reset to j = 1, and try again from there.
In other words, it starts by trying to match the longest prefix-suffix, but if that fails it works down to the shorter ones until exhausted (no prefix-suffix matches left).

Algorithm to construct table of next[j]

Do this once, when the pattern comes in.
pattern[0] ... pattern[m-1]
Here, i and j both index pattern.

就是说是两个模式串在比较

next[0] = 0

i = 1

j = 0

m = pattern.length

while ( i < m )

{

  // on 1 step i=1,j=0

  if ( pattern[j] == pattern[i] )

  {

    next[i] = j+1 // it’s i not j

    i++

    j++

  }

  else ( pattern[j] != pattern[i] )

  {

    if ( j > 0 ){

            // 比如[0],[1],[2]  === [4],[5][6]

            //  这时 [3] <> [7]

     //maybe there is another pattern we can shift right though,就是前缀和后缀

 j = next[j-1] // 因为next[j]就是给j+1用的,这个可记为定律,并且用j-1的原因还有0到[j-1]才有前后缀匹配的概念,

 // j是没有和模式串中的前缀匹配的,画画图就知道了

     }

     else ( j == 0 )

     {

 // 模式串的下标为0时,与文本串s的下标i的值不匹配,i右移一位,模式串右移一位,0右移还是0

       next[i] = 0

       i++

       j = 0  // redundant, just to make it clear what we are looping with

     }

  }

}

彻底弄明白之数据结构中的KMP算法的更多相关文章

彻底弄明白之数据结构中的排序七大算法-java实现
package ds; /* * author : codinglion * contact: chenyakun@foxmail.com */ import java.util.Random; pu ...
C++数据结构中的基本算法排序
冒泡排序基本思想:两两比较待排序的数,发现反序时交换,直到没有反序为止. public static void BubbleSort(int[] R) { for (int i = 0; i < ...
数据结构中常用的排序算法 && 时间复杂度 && 空间复杂度
第一部分:数据结构中常用的排序算法数据结构中的排序算法一般包括冒泡排序.选择排序.插入排序.归并排序和快速排序, 当然还有很多其他的排序方式,这里主要介绍这五种排序方式. 排序是数据结构中的主要内 ...
[POJ] 3461 Oulipo [KMP算法]
Oulipo Time Limit: 1000MS Memory Limit: 65536K Total Submissions: 23667 Accepted: 9492 Descripti ...
数据结构中很常见的各种树（BST二叉搜索树、AVL平衡二叉树、RBT红黑树、B-树、B+树、B*树）
数据结构中常见的树(BST二叉搜索树.AVL平衡二叉树.RBT红黑树.B-树.B+树.B*树) 二叉排序树.平衡树.红黑树红黑树----第四篇:一步一图一代码,一定要让你真正彻底明白红黑树 --- ...
Java高级工程师需要弄明白的20个知识点
一般的程序员或许只需知道一些JAVA的语法结构,能对数据库数据进行CRUD就可以应付了.但要成为JAVA(高级) 工程师,就要对JAVA做比较深入的研究,需要不断学习进步,以下对高级工程师需要突破的知 ...
弄明白CMS和G1，就靠这一篇了
目录 1 CMS收集器安全点(Safepoint) 安全区域 2 G1收集器卡表(Card Table) 3 总结 4 参考在开始介绍CMS和G1前,我们可以剧透几点: 根据不同分代的特点,收集 ...
[Data Structure] 数据结构中各种树
数据结构中有很多树的结构,其中包括二叉树.二叉搜索树.2-3树.红黑树等等.本文中对数据结构中常见的几种树的概念和用途进行了汇总,不求严格精准,但求简单易懂. 1. 二叉树二叉树是数据结构中一种重要 ...
几张图弄明白ios布局中的尺寸问题
背景先说说逆向那事.各种曲折..各种技术过时,老老实实在啃看雪的帖子..更新会有的. 回正题,这里讨论的是在Masnory框架下的布局问题.像我这种游击队没师傅带,什么都得自己琢磨,一直没闹明白下面 ...

随机推荐

微软MSMQ消息队列的使用
首先在windows系统中安装MSMQ 一.MSMQ交互开发基于消息的应用程序从队列开始.MSMQ包含四种队列类型: 外发队列:消息发送到目的地之前,用它来临时存储消息. 公共队列:在主动目录中公布 ...
java中常用的工具类（二）
下面继续分享java中常用的一些工具类,希望给大家带来帮助! 1.FtpUtil Java 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
详细的JavaScript事件使用指南
事件流事件流描述的是从页面中接收事件的顺序,IE和Netscape提出来差不多完全相反的事件流的概念,IE事件流是事件冒泡流,Netscape事件流是事件捕获流. 事件冒泡 IE的事件流叫做事 ...
取得表中数据的insert语句
Build Insert Statements for the Existing Data in Tables 下面这个脚本实现了取得一个非空表中的所有insert语句 This script bui ...
ASMCMD命令
安装好用的rlwrap工具,在环境变量里添加如下,就能实现显示当前路径(目录),目录补全的方便功能 alias asmcmd='rlwrap -r -i asmcmd –p' asmcmd>he ...
应用程序调试工具gdb,王明学learn
应用程序调试工具gdb学习使用一.GDB简介 GDB 是 GNU 发布的一款功能强大的程序调试工具.GDB 主要完成下面三个方面的功能: 1.启动被调试程序. 2.让被调试的程序在指定的位置停住. ...
Eclipse的详细安装步骤
第一种:这个方法是在线安装的第二种:下载完整免安装包首先打开网址:http://www.eclipse.org/ 然后在这里我就选择64位的安装,就以安装安卓开发的举例: 然后下载即可:
SQLServer 维护脚本分享（11）部分DBCC及系统存储过程
--DBCC命令与用法 DBCC HELP('?') DBCC HELP('useroptions') DBCC USEROPTIONS WITH NO_INFOMSGS --当前DB的区及文件 DB ...
TweenMax参数补充
构造函数:TweenMax(target:Object, duration:Number, vars:Object) target:Object -- 需要缓动的对象 duration:Number ...
AOP动态代理解析3-增强方法的获取
对于指定bean的增强方法的获取一定是包含两个步骤的: 获取所有的增强寻找所有增强中使用于bean的增强并应用那么findCandidateAdvisors与findAdvisorsThatCan ...

彻底弄明白之数据结构中的KMP算法