Codeforces Round #246 (Div. 2) D. Prefixes and Suffixes(后缀数组orKMP)

D. Prefixes and Suffixes

time limit per test

1 second

memory limit per test

256 megabytes

input

standard input

output

standard output

You have a string s = s₁s₂...s_|s|,
where |s| is the length of string s,
and s_i its i-th
character.

Let's introduce several definitions:

A substring s[i..j] (1 ≤ i ≤ j ≤ |s|) of
string s is string s_is_i + 1...s_j.
The prefix of string s of length l (1 ≤ l ≤ |s|) is
string s[1..l].
The suffix of string s of length l (1 ≤ l ≤ |s|) is
string s[|s| - l + 1..|s|].

Your task is, for any prefix of string s which matches a suffix of string s,
print the number of times it occurs in strings as a substring.

Input

The single line contains a sequence of characters s₁s₂...s_|s| (1 ≤ |s| ≤ 10⁵) —
string s. The string only consists of uppercase English letters.

Output

In the first line, print integer k (0 ≤ k ≤ |s|) —
the number of prefixes that match a suffix of string s. Next print klines,
in each line print two integers l_i c_i.
Numbers l_i c_i mean
that the prefix of the length l_i matches
the suffix of length l_i and
occurs in string s as a substring c_i times.
Print pairs l_i c_i in
the order of increasing l_i.

Sample test(s)

input

ABACABA

output

input

AAA

output

题意：

给你一个长度不超过10^5的字符串。要你按长度输出和后缀全然匹配的的前缀的长度。和该前缀在整个串中出现的次数。（可重叠）

思路：

比赛时一看到前缀后缀。心里一阵窃喜。

哈哈。刚好学过后缀数组。正好实用武仅仅地了，一番思索后算法已成型。0号后缀就是整个字符串。

和它求公共前缀能和整个后缀匹配的后缀一定有一个前缀能和这个后缀全然匹配。然后再确定出现了多少次。当你知道某个后缀是目标后缀时。你能够知道到它的rank值。然后要全然包括一个后缀的后缀一定在它后面。

依据排名规则。你想啊。假设后缀a的前缀包括后缀b。a还会排在b前面吗？明显长度短的排前面了。所以剩下工作就是确定能够向下扩展的最大距离了。
这个能够依据height数据的值确定。要用到二分+rmq。二分确定位置。

rmq推断是否满足条件。思路尽管正确可是到比赛结束都一直是错的。到后面调试出来才知道还是对后缀数组的理解不够深刻。问题就出在倍增算法为什么要规定txt[n-1]=0.还有j=sa[rank[i]-1];rank[i]=0怎么处理。我们把原来的字符串末尾加个0就可解决。具体见代码：

#include<bits/stdc++.h>

using namespace std;

const int INF=0x3f3f3f3f;

const double eps=1e-8;

const double PI=acos(-1.0);

const int maxn=150010;

char txt[maxn];

int sa[maxn],T1[maxn],T2[maxn],ct[maxn],he[maxn],rk[maxn],ans,n,m;//sa[i]表示排名第i的后缀的起始位置。

int rmq[25][maxn],lg[maxn],ansn[maxn],ansp[maxn],ptr;

void getsa(char *st)//注意m为ASCII码的范围

{

    int i,k,p,*x=T1,*y=T2;

    for(i=0; i<m; i++) ct[i]=0;

    for(i=0; i<n; i++) ct[x[i]=st[i]]++;

    for(i=1; i<m; i++) ct[i]+=ct[i-1];

    for(i=n-1; i>=0; i--)//倒着枚举保证相对顺序

        sa[--ct[x[i]]]=i;

    for(k=1,p=1; p<n; k<<=1,m=p)//枚举长度

    {

        for(p=0,i=n-k; i<n; i++) y[p++]=i;

        for(i=0; i<n; i++) if(sa[i]>=k) y[p++]=sa[i]-k;//按第二keyword排序.y[i]表示第二keyword排名第i的后缀起始位置

        for(i=0; i<m; i++) ct[i]=0;

        for(i=0; i<n; i++) ct[x[y[i]]]++;//x[i]表示起始位置为i的后缀的第一keyword排序

        for(i=1; i<m; i++) ct[i]+=ct[i-1];

        for(i=n-1; i>=0; i--) sa[--ct[x[y[i]]]]=y[i];//接着按第一keyword排序

        for(swap(x,y),p=1,x[sa[0]]=0,i=1; i<n; i++)

            x[sa[i]]=y[sa[i-1]]==y[sa[i]]&&y[sa[i-1]+k]==y[sa[i]+k]?p-1:p++;//x[i]存排名第i后缀的排名

        //保证txt[n-1]=0.规定txt[n-1]=0 的优点，假设y[sa[i-1]]=y[sa[i]]。说明以y[sa[i-1]]或y[sa[i]]

        //开头的长度为k的字符串肯定不包含字符txt[n-1]由于包含了肯定就不相等了。所以调用变量y[sa[i]+l]和y[sa[i-1]+l]

        //不会导致数组下标越界，这样就不须要做特殊推断。

    }

}

void gethe(char *st)//求height数组

{

    int i,j,k=0;

    for(i=0;i<n;i++) rk[sa[i]]=i;

    for(i=0;i<n-1;i++)

    {

        if(k) k--;

        j=sa[rk[i]-1];

        while(st[i+k]==st[j+k]) k++;

        he[rk[i]]=k;

    }

}

void rmq_init()

{

    int i,j;

    for(i=0;i<n;i++)

        rmq[0][i]=he[i];//单个元素

    for(i=1;i<=lg[n];i++)//枚举长度

        for(j=0;j+(1<<i)-1<n;j++)//枚举起点注意边界

            rmq[i][j]=min(rmq[i-1][j],rmq[i-1][j+(1<<(i-1))]);

}

int rmq_min(int l,int r)

{

    int tmp=lg[r-l+1];

    return min(rmq[tmp][l],rmq[tmp][r-(1<<tmp)+1]);

}

void prermq()

{

    int  i;

    lg[0]=-1;

    for(i=1;i<maxn;i++)

        lg[i]=lg[i>>1]+1;

}

void solve()

{

    int low,hi,mid,p,pos,a,b,ans,tp,i;

    getsa(txt),gethe(txt),rmq_init();

    ptr=0,pos=rk[0];

    for(i=n-2;i>0;i--)

    {

        if(rk[i]<pos)

            a=rk[i]+1,b=pos;

        else

            a=pos+1,b=rk[i];

        p=rmq_min(a,b);

        if(p>=n-i-1)

        {

            ansp[ptr]=p,tp=rk[i]+1;

            low=rk[i]+1,hi=n-1,ans=-1;

            while(low<=hi)

            {

                mid=(low+hi)>>1;

                if(rmq_min(tp,mid)>=p)

                    ans=mid,low=mid+1;

                else

                    hi=mid-1;

            }

            ansn[ptr++]=ans-rk[i]+1;

        }

    }

}

int main()

{

    int i;

    prermq();

    while(~scanf("%s",txt))

    {

        m=150,n=strlen(txt);

        n++;

        solve();

        ansp[ptr]=n-1;

        ansn[ptr++]=1;

        printf("%d\n",ptr);

        for(i=0;i<ptr;i++)

            printf("%d %d\n",ansp[i],ansn[i]);

    }

    return 0;

}

接下来是另外一种思路。

比赛完后。第一个思路调不出来。于是就去群里问了下。结果被叉姐歧视了。扔了句kmp就走了。

定神一想是啊。自己智商被深深得歧视了。kmp能够非常轻松得统计出每一个前缀在原串中出现的次数。详细做法是对原串求个失配数组。

然后自己和自己匹配。若第i个位置和第j个位置匹配了说明前缀j在第i个位置出现了一次。我们用cnt[i]记录。前缀i出现的次数。最后统计cnt[next[i]]+=cnt[i]。这个非常好理解。

假设前缀j能在i这个位置出现一次那么next[j]一定能在i这个位置出现一次。

统计完每一个前缀在原串中出现次数后。如今就要找钱缀和后缀匹配的前缀的个数了。

这个非常easy。

自己和自己匹配不就是拿自己的前半部分和自己的其它部分匹配么。所以我们仅仅须要匹配第n+1个位置就能够找出全部和后缀匹配的前缀了。华丽的O(n)就过掉了。。。

。

具体见代码：

#include<bits/stdc++.h>

using namespace std;

const int INF=0x3f3f3f3f;

const double eps=1e-8;

const double PI=acos(-1.0);

const int maxn=150010;

char txt[maxn];

int f[maxn],cnt[maxn],ansp[maxn],ansn[maxn],ct,n;

void getf(char *p)

{

    int i,j;

    f[0]=f[1]=0;

    for(i=1;i<n;i++)

    {

        j=f[i];

        while(j&&p[j]!=p[i])

            j=f[j];

        f[i+1]=p[j]==p[i]?

j+1:0;

    }

}

void KMP()

{

    int i,j,t;

    for(i=0,j=0;i<n;i++)

    {

        while(j&&txt[j]!=txt[i])

            j=f[j];

        if(txt[j]==txt[i])

           cnt[j]++,j++;//cnt[j]表示前缀j出现次数。由于i不同所以终点不同。

    }

    t=j,ct=0;;

    for(j=n;j>0;j--)//为什么能够这样做呢。终点不同的串一定是不同的串。

kmp保证了终点不同。

if(f[j])//f[j]表示下个比較的位置。说明前f[j]-1一定是同样的。

            cnt[f[j]-1]+=cnt[j-1];

    while(t)//前缀匹配后缀

    {

        ansp[ct]=t;

        ansn[ct++]=cnt[t-1];

        t=f[t];

    }

    printf("%d\n",ct);

    for(i=ct-1;i>=0;i--)

        printf("%d %d\n",ansp[i],ansn[i]);

}

int main()

{

    while(~scanf("%s",txt))

    {

        n=strlen(txt);

        memset(cnt,0,sizeof cnt);

        getf(txt);

        KMP();

    }

}

Codeforces Round #246 (Div. 2) D. Prefixes and Suffixes(后缀数组orKMP)的更多相关文章

Codeforces Round #246 (Div. 2) D. Prefixes and Suffixes
D. Prefixes and Suffixes You have a string s = s ...
Codeforces Round #527 (Div. 3) C. Prefixes and Suffixes
题目链接题意:给你一个长度n,还有2*n-2个字符串,长度相同的字符串一个数前缀一个是后缀,让你把每个串标一下是前缀还是后缀,输出任意解即可. 思路;因为不知道前缀还是后缀所以只能搜,但可以肯定的是 ...
Codeforces Round #527 (Div. 3) C. Prefixes and Suffixes (思维,字符串)
题意:给你某个字符串的\(n-1\)个前缀和\(n-1\)个后缀,保证每个所给的前缀后缀长度从\([1,n-1]\)都有,问你所给的子串是前缀还是后缀. 题解:这题最关键的是那两个长度为\(n-1\) ...
Codeforces Round #246 (Div. 2)
题目链接:Codeforces Round #246 (Div. 2) A:直接找满足的人数,然后整除3就是答案 B:开一个vis数组记录每一个衣服的主场和客场出现次数.然后输出的时候主场数量加上反复 ...
Codeforces Round #587 (Div. 3) A. Prefixes
链接: https://codeforces.com/contest/1216/problem/A 题意: Nikolay got a string s of even length n, which ...
Codeforces Round #246 (Div. 2) C. Prime Swaps（贪心，数论）
题目链接:http://codeforces.com/contest/432/problem/C 首先由题意分析出:这些数是从1到n且各不相同,所以最后结果肯定是第i位的数就是i. 采用这样一种贪心策 ...
Codeforces Round #246 (Div. 2) B. Football Kit
题目的意思是求出每个队穿主场衣服和客场衣服的次数每个队作为主场的次数是n-1,作为客场的次数是n-1 当每个队打主场的时候肯定穿的主场衣服当每个队打客场时,如果客场与主场的衣服不同,则穿客场衣服 ...
Codeforces Round #246 (Div. 2) A. Choosing Teams
给定n k以及n个人已参加的比赛数,让你判断最少还能参加k次比赛的队伍数,每对3人,每个人最多参加5次比赛 #include <iostream> using namespace std; ...
Codeforces Round #246 (Div. 2)——Ｄ题
KMP算法,没写出来,完全不理解NEXT数组.现在理解了很多答案都在程序中 ,不过这个思想真的很神奇, 还有毛语不好,一直没看懂题目,现在懂了, 大概是:S中前缀等于后缀,求其长度,和其在S中出现了 ...

随机推荐

high volume logging
logging 是现在系统中必不可少的组件了.市面上已经有很多很多非常成熟的日志产品,log*系列就是一个典型代表.对于erlang系统来说,也有很多,比如error_logger, disk_log ...
Windows8 Metro快捷键 | Win8迷
Windows8 Metro快捷键 | Win8迷 Win + Q : 打开搜索面板 Win + C : 打开屏幕右侧的Charms简化菜单 Win + 空格 : 切换输入语言和键盘布局
JavaScript编程：使用DOM操作样式表
6.使用DOM操作样式表: 操纵元素的Style样式属性: background-color:style.backgroundColor color:style.col ...
KMP原理、分析及C语言实现
(是在matrix67博客基础上整理而来,整理着:华科小涛@http://www.cnblogs.com/hust-ghtao/) 有些算法可以让人发疯,KMP算法就是一个.在网上找了很多资料讲的都让 ...
有空研究一下OwnerDraw和三种Windows风格CS_OWNDC和CS_PARENTDC和CS_CLASSDC
就在StdCtrls.pas, ExtCtrls.pas和Buttons.pas三个类里研究就够了,这是初步的搜索结果: ---------- 在打开的文档中查找 ---------- "G ...
CF 258B Little Elephant and Elections [dp+组合]
给出1,2,3...m 任取7个互不同样的数a1,a2,a3,a4,a5,a6,a7 一个数的幸运度是数位上4或7的个数比方244.470幸运度是2. 44434,7276727.4747,7474 ...
jsp获得本地及serverIP的方法
InetAddress addr = InetAddress.getLocalHost(); String ip = addr.getHostAddress().toString();//获得本机IP ...
ALV编辑行内容有改变时候操作
ALV编辑行内容时,调用方法 check_changed_data返回变量 gf_valid = 'X'的话说明alv行有变化. 以下拿alv维护表程序部分代码做例: DATA: gr_alvgrid ...
很好用的一个类：TJvAppXMLFileStorage
以前写软件的时候,在处理软件配置信息的时候,总是要写很多重复的代码,把配置信息写到Ini文件当中.昨天到网上查了一下,发现JVCL中提供了一个非常好用的类TJvAppXMLFileStorage,可以 ...
【ASP.NET Web API教程】3.4 HttpClient消息处理器
原文:[ASP.NET Web API教程]3.4 HttpClient消息处理器注:本文是[ASP.NET Web API系列教程]的一部分,如果您是第一次看本博客文章,请先看前面的内容. 3.4 ...

Codeforces Round #246 (Div. 2) D. Prefixes and Suffixes(后缀数组orKMP)

Codeforces Round #246 (Div. 2) D. Prefixes and Suffixes(后缀数组orKMP)的更多相关文章

随机推荐

热门专题