CSU-1632 Repeated Substrings (后缀数组)
Description
String analysis often arises in applications from biology and chemistry, such as the study of DNA and protein molecules. One interesting problem is to find how many substrings are repeated (at least twice) in a long string. In this problem, you will write a program to find the total number of repeated substrings in a string of at most 100 000 alphabetic characters. Any unique substring that occurs more than once is counted. As an example, if the string is “aabaab”, there are 5 repeated substrings: “a”, “aa”, “aab”, “ab”, “b”. If the string is “aaaaa”, the repeated substrings are “a”, “aa”, “aaa”, “aaaa”. Note that repeated occurrences of a substring may overlap (e.g. “aaaa” in the second case).
Input
The input consists of at most 10 cases. The first line contains a positive integer, specifying the number of
cases to follow. Each of the following line contains a nonempty string of up to 100 000 alphabetic characters.
Output
For each line of input, output one line containing the number of unique substrings that are repeated. You
may assume that the correct answer fits in a signed 32-bit integer.
Sample Input
3
aabaab
aaaaa
AaAaA
Sample Output
5
4
5 题目大意:统计字符串中重复出现的子串数目。
题目分析:sum(max(height(i)-height(i-1),0))即为答案。 代码如下:
//# define AC # ifndef AC # include<iostream>
# include<cstdio>
# include<cstring>
# include<vector>
# include<queue>
# include<list>
# include<cmath>
# include<set>
# include<map>
# include<string>
# include<cstdlib>
# include<algorithm>
using namespace std;
# define mid (l+(r-l)/2) typedef long long LL;
typedef unsigned long long ULL; const int N=100000;
const int mod=1e9+7;
const int INF=0x7fffffff;
const LL oo=0x7fffffffffffffff; int SA[N+5];
int tSA[N+5];
int cnt[N+5];
int rk[N+5];
int *x,*y;
int height[N+5]; int idx(char c)
{
if('a'<=c&&c<='z') return c-'a';
return c-'A'+26;
} bool same(int i,int j,int k,int n)
{
if(y[i]-y[j]) return false;
if(i+k<n&&j+k>=n) return false;
if(i+k>=n&&j+k<n) return false;
return y[i+k]==y[j+k];
} void buildSA(char *s)
{
int n=strlen(s);
int m=52;
x=rk,y=tSA;
for(int i=0;i<m;++i) cnt[i]=0;
for(int i=0;i<n;++i) ++cnt[x[i]=idx(s[i])];
for(int i=1;i<m;++i) cnt[i]+=cnt[i-1];
for(int i=n-1;i>=0;--i) SA[--cnt[x[i]]]=i; for(int k=1;k<=n;k<<=1){
int p=0;
for(int i=n-k;i<n;++i) y[p++]=i;
for(int i=0;i<n;++i) if(SA[i]>=k) y[p++]=SA[i]-k; for(int i=0;i<m;++i) cnt[i]=0;
for(int i=0;i<n;++i) ++cnt[x[y[i]]];
for(int i=1;i<m;++i) cnt[i]+=cnt[i-1];
for(int i=n-1;i>=0;--i) SA[--cnt[x[y[i]]]]=y[i]; p=1;
swap(x,y);
x[SA[0]]=0;
for(int i=1;i<n;++i)
x[SA[i]]=same(SA[i],SA[i-1],k,n)?p-1:p++;
if(p>=n) break;
m=p;
}
} void getHeight(char *s)
{
int n=strlen(s);
for(int i=0;i<n;++i) rk[SA[i]]=i;
int k=0;
for(int i=0;i<n;++i){
if(rk[i]==0){
height[rk[i]]=k=0;
}else{
if(k) --k;
int j=SA[rk[i]-1];
while(i+k<n&&j+k<n&&s[i+k]==s[j+k])
++k;
height[rk[i]]=k;
}
}
} char str[N+5]; void solve()
{
int n=strlen(str);
int ans=0;
for(int i=0;i<n;++i){
if(height[i]>height[i-1])
ans+=height[i]-height[i-1];
}
printf("%d\n",ans);
} int main()
{
int T;
scanf("%d",&T);
while(T--)
{
scanf("%s",str);
buildSA(str);
getHeight(str);
solve();
}
return 0;
} # endif
CSU-1632 Repeated Substrings (后缀数组)的更多相关文章
- UVALive - 6869 Repeated Substrings 后缀数组
题目链接: http://acm.hust.edu.cn/vjudge/problem/113725 Repeated Substrings Time Limit: 3000MS 样例 sample ...
- CSU-1632 Repeated Substrings[后缀数组求重复出现的子串数目]
评测地址:https://cn.vjudge.net/problem/CSU-1632 Description 求字符串中所有出现至少2次的子串个数 Input 第一行为一整数T(T<=10)表 ...
- csu 1305 Substring (后缀数组)
http://acm.csu.edu.cn/OnlineJudge/problem.php?id=1305 1305: Substring Time Limit: 2 Sec Memory Limi ...
- POJ3415 Common Substrings —— 后缀数组 + 单调栈 公共子串个数
题目链接:https://vjudge.net/problem/POJ-3415 Common Substrings Time Limit: 5000MS Memory Limit: 65536K ...
- POJ1226 Substrings ——后缀数组 or 暴力+strstr()函数 最长公共子串
题目链接:https://vjudge.net/problem/POJ-1226 Substrings Time Limit: 1000MS Memory Limit: 10000K Total ...
- SPOJ - SUBST1 New Distinct Substrings —— 后缀数组 单个字符串的子串个数
题目链接:https://vjudge.net/problem/SPOJ-SUBST1 SUBST1 - New Distinct Substrings #suffix-array-8 Given a ...
- SPOJ- Distinct Substrings(后缀数组&后缀自动机)
Given a string, we need to find the total number of its distinct substrings. Input T- number of test ...
- SPOJ - DISUBSTR Distinct Substrings (后缀数组)
Given a string, we need to find the total number of its distinct substrings. Input T- number of test ...
- POJ 3415 Common Substrings 后缀数组+并查集
后缀数组,看到网上很多题解都是单调栈,这里提供一个不是单调栈的做法, 首先将两个串 连接起来求height 求完之后按height值从大往小合并. height值代表的是 sa[i]和sa[i ...
- POJ1226:Substrings(后缀数组)
Description You are given a number of case-sensitive strings of alphabetic characters, find the larg ...
随机推荐
- pushlet
自己准备做一个小游戏,租个云服务,然后挂在网上,可以跟同学一起玩,不过首先布置的是,这个游戏是否能实现,多人在线网页游戏,考虑到是否能够实时查询,在网上借鉴了下聊天原理,http长连接,搜索到push ...
- SharePoint Framework 概述
博客地址:http://blog.csdn.net/FoxDave 本文翻译自新出的SharePoint Framework概述介绍文章,原文地址:http://dev.office.com/sh ...
- Win7 下IIS(7.5)发布 ASP.NET MVC
操作系统 Win 7 旗舰版 开发工具 VS2015 使用技术 IIS7.5 + MVC4.0 一 . 在IIS上部署程序后出现错误-当前标识(NT AUTHORITY/NETWORK SERVICE ...
- Python开发入门与实战18-Windows Azure 虚拟机部署
18. 微软云虚拟机部署 上一章节我们介绍了如何在新浪云部署我们的在python django应用,本章我们来介绍如何Windows Azure上部署我们的应用. 18.1. 注册Windows Az ...
- Linux/Unix 线程同步技术之互斥量(1)
众所周知,互斥量(mutex)是同步线程对共享资源访问的技术,用来防止下面这种情况:线程A试图访问某个共享资源时,线程B正在对其进行修改,从而造成资源状态不一致.与之相关的一个术语临界区(critic ...
- html之页面元素印射
首先我遇到了一个问题,尽管不是搞前端开发的但事情交到了我这里就有必要去解决. 而这个问题就是我在这边文本框输入的内容要显示在另一个文本框中其实也是非常简单.但是对于初出茅庐的新手来说就有可能会难倒他. ...
- MySQL数据库3 - MySQL常用数据类型
一. MySql常用数据类型 数据类型:整数(tinyint smailint int bigint) 定点数 decimal(p,s) ------ 小数点位置固定的 ---> 数 ...
- PKU 1003解题
首先庆祝一下,今天连A了3题.感觉后面这题太简单了.. 由于英文不好 ,找了个翻译: 若将一叠卡片放在一张桌子的边缘,你能放多远?如果你有一张卡片,你最远能达到卡片长度的一半.(我们假定卡片都正放在桌 ...
- SqlServer性能优化 提高并发性能二(九)
补充上一篇修改用非聚集索引: update Employee set age=age+1 from Employee with(index=nc_Employee_Age) where age< ...
- Tableview RefreashControl 下拉之后马上返回
Tableview RefreashControl 下拉之后马上返回 原因很简单: 我把 [self.tableView setContentInset:UIEdgeInsetsMake(0, 0, ...