Repeated Substrings

Time Limit: 3000MS Memory Limit: Unknown 64bit IO Format: %lld & %llu

Description

String analysis often arises in applications from biology and chemistry, such as the study of DNA and protein molecules. One interesting problem is to find how many substrings are repeated (at least twice) in a long string. In this problem, you will write a program to find the total number of repeated substrings in a string of at most 100 000 alphabetic characters. Any unique substring that occurs more than once is counted. As an example, if the string is “aabaab”, there are 5 repeated substrings: “a”, “aa”, “aab”, “ab”, “b”. If the string is “aaaaa”, the repeated substrings are “a”, “aa”, “aaa”, “aaaa”. Note that repeated occurrences of a substring may overlap (e.g. “aaaa” in the second case).

Input

The input consists of at most 10 cases. The first line contains a positive integer, specifying the number of
cases to follow. Each of the following line contains a nonempty string of up to 100 000 alphabetic characters.

Output

For each line of input, output one line containing the number of unique substrings that are repeated. You
may assume that the correct answer fits in a signed 32-bit integer.

Sample Input

3
aabaab
aaaaa
AaAaA

Sample Output

5
4
5

HINT

 

Source

解题:后缀数组lcp的应用,如果lcp[i] > lcp[i-1]那么累加lcp[i] - lcp[i-1]

 #include <bits/stdc++.h>
using namespace std;
const int maxn = ;
int rk[maxn],wb[maxn],wv[maxn],wd[maxn],lcp[maxn];
bool cmp(int *r,int i,int j,int k) {
return r[i] == r[j] && r[i+k] == r[j+k];
}
void da(int *r,int *sa,int n,int m) {
int i,k,p,*x = rk,*y = wb;
for(i = ; i < m; ++i) wd[i] = ;
for(i = ; i < n; ++i) wd[x[i] = r[i]]++;
for(i = ; i < m; ++i) wd[i] += wd[i-];
for(i = n-; i >= ; --i) sa[--wd[x[i]]] = i; for(p = k = ; p < n; k <<= ,m = p) {
for(p = ,i = n-k; i < n; ++i) y[p++] = i;
for(i = ; i < n; ++i) if(sa[i] >= k) y[p++] = sa[i] - k;
for(i = ; i < n; ++i) wv[i] = x[y[i]]; for(i = ; i < m; ++i) wd[i] = ;
for(i = ; i < n; ++i) wd[wv[i]]++;
for(i = ; i < m; ++i) wd[i] += wd[i-];
for(i = n-; i >= ; --i) sa[--wd[wv[i]]] = y[i]; swap(x,y);
x[sa[]] = ;
for(p = i = ; i < n; ++i)
x[sa[i]] = cmp(y,sa[i-],sa[i],k)?p-:p++;
}
}
void calcp(int *r,int *sa,int n) {
for(int i = ; i <= n; ++i) rk[sa[i]] = i;
int h = ;
for(int i = ; i < n; ++i) {
if(h > ) h--;
for(int j = sa[rk[i]-]; i+h < n && j+h < n; h++)
if(r[i+h] != r[j+h]) break;
lcp[rk[i]] = h;
}
}
int r[maxn],sa[maxn];
char str[maxn];
int main() {
int hn,x,y,cs,ret;
scanf("%d",&cs);
while(cs--) {
scanf("%s",str);
int len = strlen(str);
for(int i = ; str[i]; ++i)
r[i] = str[i];
ret = r[len] = ;
da(r,sa,len+,);
calcp(r,sa,len);
for(int i = ; i <= len; ++i)
if(lcp[i] > lcp[i-]) ret += lcp[i] - lcp[i-];
printf("%d\n",ret);
}
return ;
}

后缀自动机

 #include <bits/stdc++.h>
using namespace std;
const int maxn = ;
int cnt[maxn],c[maxn],sa[maxn];
struct node{
int son[],f,len;
void init(){
memset(son,-,sizeof son);
f = -;
len = ;
}
};
struct SAM{
node e[maxn];
int tot,last;
int newnode(int len = ){
e[tot].init();
e[tot].len = len;
return tot++;
}
void init(){
tot = last = ;
newnode();
}
void add(int c){
int p = last,np = newnode(e[p].len + );
while(p != - && e[p].son[c] == -){
e[p].son[c] = np;
p = e[p].f;
}
if(p == -) e[np].f = ;
else{
int q = e[p].son[c];
if(e[p].len + == e[q].len) e[np].f = q;
else{
int nq = newnode();
e[nq] = e[q];
e[nq].len = e[p].len + ;
e[q].f = e[np].f = nq;
while(p != - && e[p].son[c] == q){
e[p].son[c] = nq;
p = e[p].f;
}
}
}
last = np;
cnt[np] = ;
}
}sam;
char str[maxn];
int main(){
int kase;
scanf("%d",&kase);
while(kase--){
scanf("%s",str);
sam.init();
memset(cnt,,sizeof cnt);
int len = strlen(str);
for(int i = ; str[i]; ++i)
sam.add(str[i]);
node *e = sam.e;
memset(c,,sizeof c);
for(int i = ; i < sam.tot; ++i) c[e[i].len]++;
for(int i = ; i <= len; ++i) c[i] += c[i-];
for(int i = sam.tot-; i >= ; --i) sa[--c[e[i].len]] = i;
for(int i = sam.tot-; i > ; --i){
int v = sa[i];
cnt[e[v].f] += cnt[v];
}
int ret = ;
for(int i = ; i < sam.tot; ++i){
if(cnt[i] <= ) continue;
ret += e[i].len - e[e[i].f].len;
}
printf("%d\n",ret);
}
return ;
}

UVALive 6869 Repeated Substrings的更多相关文章

  1. UVALive - 6869 Repeated Substrings 后缀数组

    题目链接: http://acm.hust.edu.cn/vjudge/problem/113725 Repeated Substrings Time Limit: 3000MS 样例 sample ...

  2. CSU-1632 Repeated Substrings (后缀数组)

    Description String analysis often arises in applications from biology and chemistry, such as the stu ...

  3. UVALive 6869(后缀数组)

    传送门:Repeated Substrings 题意:给定一个字符串,求至少重复一次的不同子串个数. 分析:模拟写出子符串后缀并排好序可以发现,每次出现新的重复子串个数都是由现在的height值减去前 ...

  4. Repeated Substrings(UVAlive 6869)

    题意:求出现过两次以上的不同子串有多少种. /* 用后缀数组求出height[]数组,然后扫一遍, 发现height[i]-height[i-1]>=0,就ans+=height[i]-heig ...

  5. UVALive 4671 K-neighbor substrings 巧用FFT

    UVALive4671   K-neighbor substrings   给定一个两个字符串A和B B为模式串.问A中有多少不同子串与B的距离小于k 所谓距离就是不同位的个数. 由于字符串只包含a和 ...

  6. UVALive - 4671 K-neighbor substrings (FFT+哈希)

    题意:海明距离的定义:两个相同长度的字符串中不同的字符数.现给出母串A和模式串B,求A中有多少与B海明距离<=k的不同子串 分析:将字符a视作1,b视作0.则A与B中都是a的位置乘积是1.现将B ...

  7. CSU-1632 Repeated Substrings[后缀数组求重复出现的子串数目]

    评测地址:https://cn.vjudge.net/problem/CSU-1632 Description 求字符串中所有出现至少2次的子串个数 Input 第一行为一整数T(T<=10)表 ...

  8. LeetCode 1100. Find K-Length Substrings With No Repeated Characters

    原题链接在这里:https://leetcode.com/problems/find-k-length-substrings-with-no-repeated-characters/ 题目: Give ...

  9. [LeetCode] Repeated DNA Sequences 求重复的DNA序列

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

随机推荐

  1. nj04---事件回调函数

    一.回调函数 1.异步式读取文件 var fs=require('fs'); fs.readFile('file.txt','utf-8',function(err,data){ if(err){ c ...

  2. 卸载虚拟机时错误关闭了某个服务,使得电脑除了chrome浏览器都不能联网

    不是中毒,也不需要重装系统,可能是关闭了svchost服务 以下为搜索到的答案,亲测第一种好用 最近两周在三班四班有5位同学电脑7次出现网络故障,表现为能连上锐捷.DNS正常却不能上网,其中在我自己的 ...

  3. BZOJ4259: 残缺的字符串 & BZOJ4503: 两个串

    [传送门:BZOJ4259&BZOJ4503] 简要题意: 给出两个字符串,第一个串长度为m,第二个串长度为n,字符串中如果有*字符,则代表当前位置可以匹配任何字符 求出第一个字符串在第二个字 ...

  4. ConfigurationSection

    https://msdn.microsoft.com/en-us/library/system.configuration.configurationsection(v=vs.110).aspx Re ...

  5. js中如何获取对象的长度和名称

    js如何获取对象长度和名称 一.总结 一句话总结:对象的长度不能用.length获取,用js原生的Object.keys可以获取到 var obj = {'name' : 'Tom' , 'sex' ...

  6. for循环练习-----ATM取款

    要求: 代码: package com.jianglai.atm; import java.util.Scanner; public class ATM { public static void ma ...

  7. SQLServer中同义词Synonym的用法

    以前一直认为SqlServer中的同义词(Synonym)没有什么用处,所以也一直没有去查它的语法格式.今天碰到一个问题,用Synonym来解决再好不过了.问题是这样子的,我的系统中用到了多个数据库, ...

  8. 51Nod 1010 只包含因子2 3 5的数(打表+二分)

    K的因子中只包含2 3 5.满足条件的前10个数是:2,3,4,5,6,8,9,10,12,15. 所有这样的K组成了一个序列S,现在给出一个数n,求S中 >= 给定数的最小的数. 例如:n = ...

  9. ES6学习笔记(十七)Class 的基本语法

    1.简介 类的由来 JavaScript 语言中,生成实例对象的传统方法是通过构造函数.下面是一个例子. function Point(x, y) { this.x = x; this.y = y; ...

  10. ArchLinux 音乐播放客户端ncmpcpp和服务端mpd的配置

    Ncmcpp是一个mpd客户端,它提供了很多方便的操作 MPD是一个服务器-客户端架构的音频播放器.功能包括音频播放, 播放列表管理和音乐库维护,所有功能占用的资源都很少. --取自 wiki.arc ...