CSU-1632 Repeated Substrings (后缀数组)
Description
String analysis often arises in applications from biology and chemistry, such as the study of DNA and protein molecules. One interesting problem is to find how many substrings are repeated (at least twice) in a long string. In this problem, you will write a program to find the total number of repeated substrings in a string of at most 100 000 alphabetic characters. Any unique substring that occurs more than once is counted. As an example, if the string is “aabaab”, there are 5 repeated substrings: “a”, “aa”, “aab”, “ab”, “b”. If the string is “aaaaa”, the repeated substrings are “a”, “aa”, “aaa”, “aaaa”. Note that repeated occurrences of a substring may overlap (e.g. “aaaa” in the second case).
Input
The input consists of at most 10 cases. The first line contains a positive integer, specifying the number of
cases to follow. Each of the following line contains a nonempty string of up to 100 000 alphabetic characters.
Output
For each line of input, output one line containing the number of unique substrings that are repeated. You
may assume that the correct answer fits in a signed 32-bit integer.
Sample Input
3
aabaab
aaaaa
AaAaA
Sample Output
5
4
5 题目大意:统计字符串中重复出现的子串数目。
题目分析:sum(max(height(i)-height(i-1),0))即为答案。 代码如下:
//# define AC # ifndef AC # include<iostream>
# include<cstdio>
# include<cstring>
# include<vector>
# include<queue>
# include<list>
# include<cmath>
# include<set>
# include<map>
# include<string>
# include<cstdlib>
# include<algorithm>
using namespace std;
# define mid (l+(r-l)/2) typedef long long LL;
typedef unsigned long long ULL; const int N=100000;
const int mod=1e9+7;
const int INF=0x7fffffff;
const LL oo=0x7fffffffffffffff; int SA[N+5];
int tSA[N+5];
int cnt[N+5];
int rk[N+5];
int *x,*y;
int height[N+5]; int idx(char c)
{
if('a'<=c&&c<='z') return c-'a';
return c-'A'+26;
} bool same(int i,int j,int k,int n)
{
if(y[i]-y[j]) return false;
if(i+k<n&&j+k>=n) return false;
if(i+k>=n&&j+k<n) return false;
return y[i+k]==y[j+k];
} void buildSA(char *s)
{
int n=strlen(s);
int m=52;
x=rk,y=tSA;
for(int i=0;i<m;++i) cnt[i]=0;
for(int i=0;i<n;++i) ++cnt[x[i]=idx(s[i])];
for(int i=1;i<m;++i) cnt[i]+=cnt[i-1];
for(int i=n-1;i>=0;--i) SA[--cnt[x[i]]]=i; for(int k=1;k<=n;k<<=1){
int p=0;
for(int i=n-k;i<n;++i) y[p++]=i;
for(int i=0;i<n;++i) if(SA[i]>=k) y[p++]=SA[i]-k; for(int i=0;i<m;++i) cnt[i]=0;
for(int i=0;i<n;++i) ++cnt[x[y[i]]];
for(int i=1;i<m;++i) cnt[i]+=cnt[i-1];
for(int i=n-1;i>=0;--i) SA[--cnt[x[y[i]]]]=y[i]; p=1;
swap(x,y);
x[SA[0]]=0;
for(int i=1;i<n;++i)
x[SA[i]]=same(SA[i],SA[i-1],k,n)?p-1:p++;
if(p>=n) break;
m=p;
}
} void getHeight(char *s)
{
int n=strlen(s);
for(int i=0;i<n;++i) rk[SA[i]]=i;
int k=0;
for(int i=0;i<n;++i){
if(rk[i]==0){
height[rk[i]]=k=0;
}else{
if(k) --k;
int j=SA[rk[i]-1];
while(i+k<n&&j+k<n&&s[i+k]==s[j+k])
++k;
height[rk[i]]=k;
}
}
} char str[N+5]; void solve()
{
int n=strlen(str);
int ans=0;
for(int i=0;i<n;++i){
if(height[i]>height[i-1])
ans+=height[i]-height[i-1];
}
printf("%d\n",ans);
} int main()
{
int T;
scanf("%d",&T);
while(T--)
{
scanf("%s",str);
buildSA(str);
getHeight(str);
solve();
}
return 0;
} # endif
CSU-1632 Repeated Substrings (后缀数组)的更多相关文章
- UVALive - 6869 Repeated Substrings 后缀数组
题目链接: http://acm.hust.edu.cn/vjudge/problem/113725 Repeated Substrings Time Limit: 3000MS 样例 sample ...
- CSU-1632 Repeated Substrings[后缀数组求重复出现的子串数目]
评测地址:https://cn.vjudge.net/problem/CSU-1632 Description 求字符串中所有出现至少2次的子串个数 Input 第一行为一整数T(T<=10)表 ...
- csu 1305 Substring (后缀数组)
http://acm.csu.edu.cn/OnlineJudge/problem.php?id=1305 1305: Substring Time Limit: 2 Sec Memory Limi ...
- POJ3415 Common Substrings —— 后缀数组 + 单调栈 公共子串个数
题目链接:https://vjudge.net/problem/POJ-3415 Common Substrings Time Limit: 5000MS Memory Limit: 65536K ...
- POJ1226 Substrings ——后缀数组 or 暴力+strstr()函数 最长公共子串
题目链接:https://vjudge.net/problem/POJ-1226 Substrings Time Limit: 1000MS Memory Limit: 10000K Total ...
- SPOJ - SUBST1 New Distinct Substrings —— 后缀数组 单个字符串的子串个数
题目链接:https://vjudge.net/problem/SPOJ-SUBST1 SUBST1 - New Distinct Substrings #suffix-array-8 Given a ...
- SPOJ- Distinct Substrings(后缀数组&后缀自动机)
Given a string, we need to find the total number of its distinct substrings. Input T- number of test ...
- SPOJ - DISUBSTR Distinct Substrings (后缀数组)
Given a string, we need to find the total number of its distinct substrings. Input T- number of test ...
- POJ 3415 Common Substrings 后缀数组+并查集
后缀数组,看到网上很多题解都是单调栈,这里提供一个不是单调栈的做法, 首先将两个串 连接起来求height 求完之后按height值从大往小合并. height值代表的是 sa[i]和sa[i ...
- POJ1226:Substrings(后缀数组)
Description You are given a number of case-sensitive strings of alphabetic characters, find the larg ...
随机推荐
- C#对Dictionary的按Value排序
使用List对其进行排序 using System; using System.Collections.Generic; using System.Text; namespace ConsoleApp ...
- web 乱码摘抄
JavaWeb--中文乱码小结 JavaWeb--中文乱码小结 0.纯粹html乱码: 换个editor吧(有时候notepad都比sublime_text好用),最好是在<head>&l ...
- impress.js webslide 参数
data-transition-duration="2000" 改变切换场景的速度,默认1000data-perspective="500" 改变透视的深度,默 ...
- DOM扩展之 专有扩展
11.4.3 contains() 方法 用来确定某个节点是不是另一个节点的后代. 注:a.contains(a) 也是返回true.说明contains方法搜索是从自身开始的. DOM Level ...
- sudo: /etc/sudoers is world writable
错误信息: sudo: /etc/sudoers is world writable sudo: no valid sudoers sources found, quitting 解决办法: 修复磁盘 ...
- header、footer、hgroup、address
header:整个页面或者一个块级区域的头部区域,通常用来放置标题等信息: footer:整个页面或者一个块级区域的底部区域,通常用来放置版权信息.联系方式等: hgroup:用来对属于一个块级区域的 ...
- Uber优步宁波司机注册正式开始啦! UBER宁波司机注册指南!
自2012年Uber开始向全球进军以来,目前已进入全球56个国家和地区的市场,在全球超过270个城市提供服务, 而Uber公司的估值已高达412亿美元. [目前开通Uber优步叫车服务的中国城市] ...
- asp.net页面事件执行顺序
转自http://www.cnblogs.com/hnlyh/articles/4230388.html C#代码 using System; using System.Data; using Sys ...
- Uncaught ReferenceError: WebForm_DoPostBackWithOptions is not defined
环境:Asp.Net网站,Framework版本4.0,IIS版本7.0问题:按钮失效,下面是按钮代码: <a id="dnn_ctr1161_Login_Login_DNN_cmdL ...
- XML代码生成器——XMLFACTORY 简介(三)
XML代码生成器——XMLFACTORY 简介(三) 这一篇我们讲“类名称”页签 的配置功能,您将了解到:如何为Xml元素指定对应的类名称及脱壳功能. 如果,你没看过这个系列的第一篇文章,请先去看这篇 ...