UVALive 6869 Repeated Substrings
Repeated Substrings
Time Limit: 3000MS Memory Limit: Unknown 64bit IO Format: %lld & %llu
Description
String analysis often arises in applications from biology and chemistry, such as the study of DNA and protein molecules. One interesting problem is to find how many substrings are repeated (at least twice) in a long string. In this problem, you will write a program to find the total number of repeated substrings in a string of at most 100 000 alphabetic characters. Any unique substring that occurs more than once is counted. As an example, if the string is “aabaab”, there are 5 repeated substrings: “a”, “aa”, “aab”, “ab”, “b”. If the string is “aaaaa”, the repeated substrings are “a”, “aa”, “aaa”, “aaaa”. Note that repeated occurrences of a substring may overlap (e.g. “aaaa” in the second case).
Input
The input consists of at most 10 cases. The first line contains a positive integer, specifying the number of
cases to follow. Each of the following line contains a nonempty string of up to 100 000 alphabetic characters.
Output
For each line of input, output one line containing the number of unique substrings that are repeated. You
may assume that the correct answer fits in a signed 32-bit integer.
Sample Input
3
aabaab
aaaaa
AaAaA
Sample Output
5
4
5
HINT
Source
解题:后缀数组lcp的应用,如果lcp[i] > lcp[i-1]那么累加lcp[i] - lcp[i-1]
#include <bits/stdc++.h>
using namespace std;
const int maxn = ;
int rk[maxn],wb[maxn],wv[maxn],wd[maxn],lcp[maxn];
bool cmp(int *r,int i,int j,int k) {
return r[i] == r[j] && r[i+k] == r[j+k];
}
void da(int *r,int *sa,int n,int m) {
int i,k,p,*x = rk,*y = wb;
for(i = ; i < m; ++i) wd[i] = ;
for(i = ; i < n; ++i) wd[x[i] = r[i]]++;
for(i = ; i < m; ++i) wd[i] += wd[i-];
for(i = n-; i >= ; --i) sa[--wd[x[i]]] = i; for(p = k = ; p < n; k <<= ,m = p) {
for(p = ,i = n-k; i < n; ++i) y[p++] = i;
for(i = ; i < n; ++i) if(sa[i] >= k) y[p++] = sa[i] - k;
for(i = ; i < n; ++i) wv[i] = x[y[i]]; for(i = ; i < m; ++i) wd[i] = ;
for(i = ; i < n; ++i) wd[wv[i]]++;
for(i = ; i < m; ++i) wd[i] += wd[i-];
for(i = n-; i >= ; --i) sa[--wd[wv[i]]] = y[i]; swap(x,y);
x[sa[]] = ;
for(p = i = ; i < n; ++i)
x[sa[i]] = cmp(y,sa[i-],sa[i],k)?p-:p++;
}
}
void calcp(int *r,int *sa,int n) {
for(int i = ; i <= n; ++i) rk[sa[i]] = i;
int h = ;
for(int i = ; i < n; ++i) {
if(h > ) h--;
for(int j = sa[rk[i]-]; i+h < n && j+h < n; h++)
if(r[i+h] != r[j+h]) break;
lcp[rk[i]] = h;
}
}
int r[maxn],sa[maxn];
char str[maxn];
int main() {
int hn,x,y,cs,ret;
scanf("%d",&cs);
while(cs--) {
scanf("%s",str);
int len = strlen(str);
for(int i = ; str[i]; ++i)
r[i] = str[i];
ret = r[len] = ;
da(r,sa,len+,);
calcp(r,sa,len);
for(int i = ; i <= len; ++i)
if(lcp[i] > lcp[i-]) ret += lcp[i] - lcp[i-];
printf("%d\n",ret);
}
return ;
}
后缀自动机
#include <bits/stdc++.h>
using namespace std;
const int maxn = ;
int cnt[maxn],c[maxn],sa[maxn];
struct node{
int son[],f,len;
void init(){
memset(son,-,sizeof son);
f = -;
len = ;
}
};
struct SAM{
node e[maxn];
int tot,last;
int newnode(int len = ){
e[tot].init();
e[tot].len = len;
return tot++;
}
void init(){
tot = last = ;
newnode();
}
void add(int c){
int p = last,np = newnode(e[p].len + );
while(p != - && e[p].son[c] == -){
e[p].son[c] = np;
p = e[p].f;
}
if(p == -) e[np].f = ;
else{
int q = e[p].son[c];
if(e[p].len + == e[q].len) e[np].f = q;
else{
int nq = newnode();
e[nq] = e[q];
e[nq].len = e[p].len + ;
e[q].f = e[np].f = nq;
while(p != - && e[p].son[c] == q){
e[p].son[c] = nq;
p = e[p].f;
}
}
}
last = np;
cnt[np] = ;
}
}sam;
char str[maxn];
int main(){
int kase;
scanf("%d",&kase);
while(kase--){
scanf("%s",str);
sam.init();
memset(cnt,,sizeof cnt);
int len = strlen(str);
for(int i = ; str[i]; ++i)
sam.add(str[i]);
node *e = sam.e;
memset(c,,sizeof c);
for(int i = ; i < sam.tot; ++i) c[e[i].len]++;
for(int i = ; i <= len; ++i) c[i] += c[i-];
for(int i = sam.tot-; i >= ; --i) sa[--c[e[i].len]] = i;
for(int i = sam.tot-; i > ; --i){
int v = sa[i];
cnt[e[v].f] += cnt[v];
}
int ret = ;
for(int i = ; i < sam.tot; ++i){
if(cnt[i] <= ) continue;
ret += e[i].len - e[e[i].f].len;
}
printf("%d\n",ret);
}
return ;
}
UVALive 6869 Repeated Substrings的更多相关文章
- UVALive - 6869 Repeated Substrings 后缀数组
题目链接: http://acm.hust.edu.cn/vjudge/problem/113725 Repeated Substrings Time Limit: 3000MS 样例 sample ...
- CSU-1632 Repeated Substrings (后缀数组)
Description String analysis often arises in applications from biology and chemistry, such as the stu ...
- UVALive 6869(后缀数组)
传送门:Repeated Substrings 题意:给定一个字符串,求至少重复一次的不同子串个数. 分析:模拟写出子符串后缀并排好序可以发现,每次出现新的重复子串个数都是由现在的height值减去前 ...
- Repeated Substrings(UVAlive 6869)
题意:求出现过两次以上的不同子串有多少种. /* 用后缀数组求出height[]数组,然后扫一遍, 发现height[i]-height[i-1]>=0,就ans+=height[i]-heig ...
- UVALive 4671 K-neighbor substrings 巧用FFT
UVALive4671 K-neighbor substrings 给定一个两个字符串A和B B为模式串.问A中有多少不同子串与B的距离小于k 所谓距离就是不同位的个数. 由于字符串只包含a和 ...
- UVALive - 4671 K-neighbor substrings (FFT+哈希)
题意:海明距离的定义:两个相同长度的字符串中不同的字符数.现给出母串A和模式串B,求A中有多少与B海明距离<=k的不同子串 分析:将字符a视作1,b视作0.则A与B中都是a的位置乘积是1.现将B ...
- CSU-1632 Repeated Substrings[后缀数组求重复出现的子串数目]
评测地址:https://cn.vjudge.net/problem/CSU-1632 Description 求字符串中所有出现至少2次的子串个数 Input 第一行为一整数T(T<=10)表 ...
- LeetCode 1100. Find K-Length Substrings With No Repeated Characters
原题链接在这里:https://leetcode.com/problems/find-k-length-substrings-with-no-repeated-characters/ 题目: Give ...
- [LeetCode] Repeated DNA Sequences 求重复的DNA序列
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
随机推荐
- php使用flock堵塞写入文件和非堵塞写入文件
php使用flock堵塞写入文件和非堵塞写入文件 堵塞写入代码:(全部程序会等待上次程序运行结束才会运行,30秒会超时) <?php $file = fopen("test.txt&q ...
- rest_framework (版本)
请求进来 封装request. 版本限制 认证 权限 节流 版本 self.version_param url中版本的key self.default_version self.is_allowed_ ...
- python 同步IO
IO在计算机中指Input/Output 由CPU这个超快的计算核心来执行,涉及到数据交换的地方,通常是磁盘.网络等,就需要IO接口.IO编程中,Stream(流)是一个很重要的概念,可以把流想象成一 ...
- [poj 2480] Longge's problem 解题报告 (欧拉函数)
题目链接:http://poj.org/problem?id=2480 题目大意: 题解: 我一直很欣赏数学题完美的复杂度 #include<cstring> #include<al ...
- BZOJ 2424 DP OR 费用流
思路: 1.DP f[i][j]表示第i个月的月底 还剩j的容量 转移还是相对比较好想的-- f[i][j+1]=min(f[i][j+1],f[i][j]+d[i]); if(j>=u[i+1 ...
- BZOJ 3209 数位DP
思路: 先预处理出来组合数 按位做 枚举sum[x]是多少 注意Mod不是一个质数 //By SiriusRen #include <cstdio> using namespace std ...
- 求推荐go语言开发工具及go语言应该以哪种目录结构组织代码?
go语言的开发工具推荐? go语言开发普通程序及开发web程序的时候,应该以哪种目录结构组织代码? 求推荐go语言开发工具及go语言应该以哪种目录结构组织代码? >> golang这个答案 ...
- 运维派 企业面试题2 创建10个 "十个随机字母_test.html" 文件
Linux运维必会的实战编程笔试题(19题) 企业面试题2: 使用for循环在/tmp/www目录下通过随机小写10个字母加固定字符串test批量创建10个html文件,名称例如为: --[root@ ...
- mac ssh报错处理
总结一下 1.The authenticity of host '[192.168.1.100]:22 ([192.168.1.100]:22)' can't be established. RSA ...
- Playing With Stones UVALive - 5059 Nim SG函数 打表找规律
Code: #include<cstdio> #include<algorithm> using namespace std; typedef long long ll; ll ...