大致题意:

给出n个长度为60的DNA基因(A腺嘌呤 G鸟嘌呤 T胸腺嘧啶 C胞嘧啶)序列,求出他们的最长公共子序列

使用后缀数组解决

 #include<stdio.h>
#include<string.h>
char str[],res[];
int num[],loc[];
int sa[],rank[],height[];
int wa[],wb[],wv[],wd[];
int vis[];
int seq_num;
int cmp(int *r,int a,int b,int l){
return r[a]==r[b]&&r[a+l]==r[b+l];
}
void DA(int *r,int n,int m){
int i,j,p,*x=wa,*y=wb,*t;
for(i=;i<m;i++)wd[i]=;
for(i=;i<n;i++)wd[x[i]=r[i]]++;
for(i=;i<m;i++)wd[i]+=wd[i-];
for(i=n-;i>=;i--) sa[--wd[x[i]]]=i;
for(j=,p=;p<n;j*=,m=p){
for(p=,i=n-j;i<n;i++) y[p++]=i;
for(i=;i<n;i++) if(sa[i]>=j) y[p++] = sa[i] -j;
for(i=;i<n;i++)wv[i]=x[y[i]];
for(i=;i<m;i++) wd[i]=;
for(i=;i<n;i++)wd[wv[i]]++;
for(i=;i<m;i++)wd[i]+=wd[i-];
for(i=n-;i>=;i--) sa[--wd[wv[i]]]=y[i];
for(t=x,x=y,y=t,p=,x[sa[]]=,i=;i<n;i++){
x[sa[i]]=cmp(y,sa[i-],sa[i],j)?p-:p++;
}
}
}
void calHeight(int *r,int n){
int i,j,k=;
for(i=;i<=n;i++)rank[sa[i]]=i;
for(i=;i<n;height[rank[i++]]=k){
for(k?k--:,j=sa[rank[i]-];r[i+k]==r[j+k];k++);
}
}
int check(int mid,int len){
int i,j,tot;
tot=;
memset(vis,,sizeof(vis));
for(i=;i<=len;i++){
if(height[i]<mid){
memset(vis,,sizeof(vis));
tot=;
}else{
if(!vis[loc[sa[i-]]]){
vis[loc[sa[i-]]]=;
tot++;
}
if(!vis[loc[sa[i]]]){
vis[loc[sa[i]]]=;
tot++;
}
if(tot==seq_num){
for(j=;j<mid;j++){
res[j]=num[sa[i]+j]+'A'-;
}res[mid]='\0';
return ;
}
}
}
return ;
}
int main() {
int case_num,n,sp,ans;
scanf("%d",&case_num);
for(int i=;i<case_num;i++){
scanf("%d",&seq_num);
n=;
sp=;
ans=;
for(int j=;j<seq_num;j++){
scanf("%s",str);
for(int k=;k<;k++){
loc[n]=j;
num[n++]=str[k]-'A'+;
}
loc[n]=sp;
num[n++]=sp++;
}
num[n]=;
DA(num,n+,sp);
calHeight(num,n);
int left=,right=,mid; while(right>=left){
mid=(right+left)/;
int tt=check(mid,n);
if(tt){
left=mid+;
ans=mid;
}else{
right=mid-;
}
}
if(ans>=){
printf("%s\n",res);
}else{
printf("no significant commonalities\n");
}
}
return ;
}
 
附:原题目
 
Time Limit: 1000MS   Memory Limit: 65536K
Total Submissions: 14020   Accepted: 6227

Description

The Genographic Project is a research partnership between IBM and The National Geographic Society that is analyzing DNA from hundreds of thousands of contributors to map how the Earth was populated.

As an IBM researcher, you have been tasked with writing a program that will find commonalities amongst given snippets of DNA that can be correlated with individual survey information to identify new genetic markers.

A DNA base sequence is noted by listing the nitrogen bases in the order in which they are found in the molecule. There are four bases: adenine (A), thymine (T), guanine (G), and cytosine (C). A 6-base DNA sequence could be represented as TAGACC.

Given a set of DNA base sequences, determine the longest series of bases that occurs in all of the sequences.

Input

Input to this problem will begin with a line containing a single integer n indicating the number of datasets. Each dataset consists of the following components:

  • A single positive integer m (2 <= m <= 10) indicating the number of base sequences in this dataset.
  • m lines each containing a single base sequence consisting of 60 bases.

Output

For each dataset in the input, output the longest base subsequence common to all of the given base sequences. If the longest common subsequence is less than three bases in length, display the string "no significant commonalities" instead. If multiple subsequences of the same longest length exist, output only the subsequence that comes first in alphabetical order.

Sample Input

3
2
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
3
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
GATACTAGATACTAGATACTAGATACTAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
GATACCAGATACCAGATACCAGATACCAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
3
CATCATCATCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
ACATCATCATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AACATCATCATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

Sample Output

no significant commonalities
AGATAC
CATCATCAT

Blue Jeans - poj 3080(后缀数组)的更多相关文章

  1. (字符串 KMP)Blue Jeans -- POJ -- 3080:

    链接: http://poj.org/problem?id=3080 http://acm.hust.edu.cn/vjudge/contest/view.action?cid=88230#probl ...

  2. POJ - 3294~Relevant Phrases of Annihilation SPOJ - PHRASES~Substrings POJ - 1226~POJ - 3450 ~ POJ - 3080 (后缀数组求解多个串的公共字串问题)

    多个字符串的相关问题 这类问题的一个常用做法是,先将所有的字符串连接起来, 然后求后缀数组 和 height 数组,再利用 height 数组进行求解. 这中间可能需要二分答案. POJ - 3294 ...

  3. POJ 3080 后缀数组/KMP

    题目链接:http://poj.org/problem?id=3080 题意:给定n个DNA串,求最长公共子串.如果最长公共子串的长度小于3时输出no significant commonalitie ...

  4. Match:Blue Jeans(POJ 3080)

    DNA序列 题目大意:给你m串字符串,要你找最长的相同的连续字串 这题暴力kmp即可,注意要按字典序排序,同时,是len<3才输出no significant commonalities #in ...

  5. Blue Jeans - POJ 3080(多串的共同子串)

    题目大意:有M个串,每个串的长度都是60,查找这M个串的最长公共子串(连续的),长度不能小于3,如果同等长度的有多个输出字典序最小的那个.   分析:因为串不多,而且比较短,所致直接暴力枚举的第一个串 ...

  6. Blue Jeans POJ 3080 寻找多个串的最长相同子串

    Description The Genographic Project is a research partnership between IBM and The National Geographi ...

  7. poj 3693 后缀数组 重复次数最多的连续重复子串

    Maximum repetition substring Time Limit: 1000MS   Memory Limit: 65536K Total Submissions: 8669   Acc ...

  8. POJ 3415 后缀数组

    题目链接:http://poj.org/problem?id=3415 题意:给定2个串[A串和B串],求两个串公共子串长度大于等于k的个数. 思路:首先是两个字符串的问题.所以想用一个'#'把两个字 ...

  9. POJ 3450 后缀数组/KMP

    题目链接:http://poj.org/problem?id=3450 题意:给定n个字符串,求n个字符串的最长公共子串,无解输出IDENTITY LOST,否则最长的公共子串.有多组解时输出字典序最 ...

随机推荐

  1. Maven设置snapshot无法在远程仓库下载的问题解决

    检查步骤如下: 1.检查nexus是否纳入public版本中: 2.配置中是否启用snapshots功能.以下方法两种设置都可以,任选一个即可. 一种是在项目pom.xml使用: <reposi ...

  2. MySQL查询时区分大小写(转)

    说明:在MySQL查询时要区分大小写会涉及到两个概念character set和collation,这两个概念在表设计时或者在查询时都可以指定的,详细参考:http://www.cnblogs.com ...

  3. 【spring boot】4.spring boot配置多环境资源文件

    一个spring boot 项目在开发环境.测试环境.生产环境下,好多的配置都是不尽相同的.所以配置多分的资源文件,仅仅在部署在不同环境的时候,选择激活不同的资源文件就可以实现多环境的部署. 项目结构 ...

  4. EF4.4 升级EF6.0问题总结

    如出现下面代码错误,基本可能确定EF数据库配置错误 在 System.Data.Entity.Core.Metadata.Edm.MetadataArtifactLoaderCompositeReso ...

  5. Java 图片添加水印效果

    package com.xiaowu.drawwater.demo; import java.awt.AlphaComposite; import java.awt.Graphics2D; impor ...

  6. 如何使用ninja编译系统编译我们的程序?

    使用ninja 配置自己的环境来使用ninja 构建程序 Android使用ninja Windows使用 调试 不使用VS 技巧 问题 Ninja的原意是忍者,忍者神龟的忍者.这里被google拿来 ...

  7. Netty组件介绍(转)

    http://www.tuicool.com/articles/mEJvYb 为了更好的理解和进一步深入Netty,我们先总体认识一下Netty用到的组件及它们在整个Netty架构中是怎么协调工作的. ...

  8. 用table表格来调整控件的格式

    由于想自己写一个web,所以也在学习html语言的一些东西,让我回忆起了大学时代曾对网页设计产生过兴趣,无奈那时候还没有自己的电脑,还常去网吧买个软盘下载一些图片,然后用fontpage做一些网页.后 ...

  9. 记录一个奇妙的Bug, -1 &gt;= 2 ?

    直接上代码: #include <iostream> #include <vector> using namespace std; int main() { vector< ...

  10. 移动负载均衡技术(MBL)

    移动负载均衡技术(MBL)   转至元数据结尾 附件:5 被admin添加,被admin最后更新于四月 27, 2015 转至元数据起始 互联网技术发展到今天,已经进入移动时代,很多在传统CS和BS的 ...