KMP算法的改进

KMP算法已经在极大程度上提高了子符串的匹配效率，但是仍然有改进的余地。

1. 引入的情景

下面我们就其中的一种情况进行分析：

主串T为"aaaabcde…"
子串S为"aaaade"

那么容易求得子串的next[]={0,1,2,3,4}

下标	1	2	3	4	5
子串	a	a	a	a	d
next	0	1	2	3	4

当使用KMP算法进行匹配时，

由于T[5]!=S[5], 因此子串指针回溯，
子串回溯后变为T[5]与S[4]的关系，依然不等, 子串继续回溯
子串回溯后变为T[5]与S[3]的关系，依然不等，子串继续回溯
子串回溯后变为T[5]与S[2]的关系，依然不等, 子串继续回溯
子串回溯后变为T[5]与S[1]的关系，依然不等，子串继续回溯
由于子串指针的值为0(j=0)，主串和子串的指针同时向前移动一个位置，变为T[6]与S[1]的关系
… …

效果图如下：

实际上我们可以看出，S[1]=S[2]=S[3]=S[4], 既然S[4]!=T[5], 那么前面的几个根本无需比较。因此KMP 算法可进一步优化的地方便在于此。

2. 对KMP算法不足的说明

之所以出现上述问题，个人分析，原因在于：KMP的next数组只分析了当前字符之前的字符串的相似度，而没有把当前字符考虑进去, 从而导致上述没有意义的比较操作。

那么如何才能把当前字符也考虑进去呢?

基本原理就是，在需要子串指针回溯时，进行当前位置元素与回溯之后位置元素比较，如果相等，那么就没有必要再进行比较了，子串的指针继续回溯。如此往复

因此，改进的KMP算法又添加了一个数组nextval, 它是在next基础之上计算出来的。

[

]

{

[

]

[

]

[

]

[

]

[

]

[

]

nextval[i]\begin{cases}next[i], &if\ S\ [i\ ]\ != \ S\ [ next\ [i\ ] \ ] \\nextval[next[i]], &if\ S\ [i\ ]\ == \ S\ [ next\ [i\ ] \ ] \\\end{cases}

nextval[i]{next[i],nextval[next[i]],if S [i ] != S [next [i ] ]if S [i ] == S [next [i ] ]

3. 改进KMP算法实现

/*************************************************************************

             > File Name: kmp_pro.c

             > Author: Toney Sun

             > Mail: vip_13031075266@163.com

       > Created Time: 2020年06月27日 星期六 21时07分12秒

 ************************************************************************/

#include<stdio.h>

#include<string.h>

#include<stdlib.h>

int getNextVal(char *str, int nextval[])

{

	int i = 0;

	int j = -1;

	if(!str || !nextval){

		printf("Parameters can't be NULL or can't be zero\n");

		return -1;

	}

	nextval[0] = -1;

	printf("%2.2d  ", nextval[0]);

	while(i < strlen(str)-1){

		if(j == -1 || str[i] == str[j]){

			i++;

			j++;

/*****************************************/

			if(str[i]!=str[j]){

				nextval[i]=j;

			}else{

				nextval[i]=nextval[j];

			}

/*****************************************/

			printf("%2.2d  ", nextval[i]);

		}else{

			j = nextval[j];

		}

	}

	printf("\n");

	return 0;

}

int kmp_pro(char *Str, char *match)

{

	int i=0,j=0;

	int nextval[100] = {0};

	int ret =getNextVal(match, nextval);

	if(ret != 0){

		printf("Get nextval error\n");

		return -1;

	}

	while(i<(int)strlen(Str) && j<(int)strlen(match)){

			if(j == -1 || Str[i] == match[j]){

				i++;

				j++;

			}else{

				j = nextval[j];

			}

	}

	if(j == strlen(match)){

		return i - j;

	}else{

		return -1;

	}

}

void main(int argc, char *argv[])

{

	char *str="ababaaaaba";

	char *match="aba";

	int index = kmp_pro(str, match);

	printf("-------index=%d------\n",index);

	match="aaa";

	index = kmp_pro(str, match);

	printf("-------index=%d------\n",index);

	match="aab";

	index = kmp_pro(str, match);

	printf("-------index=%d------\n",index);

}