Multiple sequence alignment Benchmark Data set

1. 汇总: 序列比对标准数据集: http://www.drive5.com/bench/

This is a collection of multiple alignment benchmarks in a uniform
format that is convenient for further analysis. All files are in
FASTA format, with upper-case letters used to indicate aligned
columns.

See References below for original sources of benchmark data.

Benchmarks are:

--------------------------1---------------------------

bali2dna
BALIBASE v2, reverse-translated to DNA

bali2dnaf
Bali2dbn, with frame-shifts induced by random insertions of one
or two nucleotides into the middle 50% of exactly one sequence
in each set.

bali3
BALIBASE v3.

bali3pdb
BALIS, the structural subset of BALIBASE v3.

bali3pdbm
MU-BALIS, i.e. BALIS re-aligned by MUSTANG.

---------------------------2--------------------------

ox
OXBENCH.

oxm
MU-OXBENCH, i.e. OXBENCH re-aligned by MUSTANG.

oxx
OXBENCH-X, i.e. the Extended set in OBENCH.

---------------------------3--------------------------

prefab4
PREFAB v4.

prefab4ref
PREFAB-R, i.e. the pair-wise reference pairs in PREFAB v4.

prefab4refm
MU-PREFAB-R, i.e. PREFAB-R re-aligned by MUSTANG.

---------------------------4--------------------------

sabre
Consistent multiple alignments constructed from SABMARK v1.65.

sabrem
MU-SABRE, i.e. SABRE re-aligned by MUSTANG.

-----------------------------------------------------

Directory structure under each benchmark is:

in/
Input sequences.

ref/
Reference alignments. Upper-case regions indicate conservative
regions that are intended for use in assessment. Lower-case regions
should not be used.

info/
Contains ids.txt (list of set identifiers that are filenames in ref/
and in/), nrseqs.txt (number of sequences in each set), and
pctids.txt (%id in conservative regions in each set).

Download page for qscore :http://www.drive5.com/bench/bench.tar.gz

This is a quality scoring program that compares two multiple sequence alignments: an alignment to be evaluated (the "test" alignment) and a second alignment that is believed to be correct (the "reference" alignment). The program outputs the following scores:
- The PREFAB Q score (aka the Balibase SPS score or the Developer score).
- The Modeler score
- The Cline et al. shift score
- The Balibase TC (total column) score


Balibase标准数据库地址: http://www.lbgi.fr/balibase/


References
----------

Thompson JD, Koehl P, Ripp R, Poch O (2005) BAliBASE 3.0: latest
developments of the multiple sequence alignment benchmark. Proteins
61: 127-136.

Bahr A, Thompson JD, Thierry JC, Poch O (2001) BAliBASE (Benchmark
Alignment dataBASE): enhancements for repeats, transmembrane
sequences and circular permutations. Nucleic Acids Res 29: 323-326.

Thompson JD, Plewniak F, Poch O (1999) BAliBASE: a benchmark
alignment database for the evaluation of multiple alignment programs.
Bioinformatics 15: 87-88.

Van Walle I, Lasters I, Wyns L (2005) SABmark--a benchmark for
sequence alignment that covers the entire known fold space.
Bioinformatics 21: 1267-1268.

Raghava GP, Searle SM, Audley PC, Barber JD, Barton GJ (2003)
OXBench: a benchmark for evaluation of protein multiple sequence
alignment accuracy. BMC Bioinformatics 4: 47.

Edgar RC (2004) MUSCLE: multiple sequence alignment with high
accuracy and high throughput. Nucleic Acids Res 32: 1792-1797.

Multiple sequence alignment Benchmark Data set的更多相关文章

  1. [Sequence Alignment Methods] Dynamic time warping (DTW)

    本系列介绍几种序列对齐方法,包括Dynamic time warping (DTW),Smith–Waterman algorithm,Cross-recurrence plot Dynamic ti ...

  2. [Sequence Alignment Methods] Cross-Recurrent Plot (CRP)

    A recurrence plot (RP) is a straightforward way to visualize characteristics of similar system state ...

  3. [Sequence Alignment Methods] Smith–Waterman algorithm

    Smith–Waterman algorithm 首先需要澄清一个事实,Smith–Waterman algorithm是求两个序列的最佳subsequence匹配,与之对应的算法但是求两个序列整体匹 ...

  4. INTRODUCTION TO BIOINFORMATICS

    INTRODUCTION TO BIOINFORMATICS      这套教程源自Youtube,算得上比较完整的生物信息学领域的视频教程,授课内容完整清晰,专题化的讲座形式,细节讲解比国内的京师大 ...

  5. Bioinformatics Glossary

    原文:http://homepages.ulb.ac.be/~dgonze/TEACHING/bioinfo_glossary.html Affine gap costs: A scoring sys ...

  6. 三代PacBio reads纠错 - 专题

    三代纠错的重要性不言而喻,三代的核心优势就是长,唯一的缺点就是错误率高,但好就好在错误是随机分布的,可以通过算法解决,这也就是为什么现在有这么多针对三代开发的纠错工具. 纠错和组装是分不开的,纠错就是 ...

  7. Difference between Hard Clip(H) and Soft Clip(S) in Samtools CIGAR string

    一般人都知道 H 和 S 的表面上的区别,即 S 就是 soft, H 就是 hard,S 后,序列里还是会保留序列的信息,而 H 则不会. ----------------------------- ...

  8. SOAPdenovo组装软件使用记录

    背景: 1.为什么要从头测序组装基因组? 基因组是不同表型的遗传基础:获得参考基因组是深入研究一个生物体全基因组的第一步也是必须的一步:从头测序组装能够对新的测序物种构建参考基因组: 2.为什么要研究 ...

  9. 序列比对之Biostrings包

    基本概念 Biostrings包很重要的3个功能是进行Pairwise sequence alignment 和Multiple sequence alignment及 Pattern finding ...

随机推荐

  1. Shell脚本_判断根分区使用率

    vim gen_fen_qu_shi_yong_lv.sh chmod  755 gen_fen_qu_shi_yong_lv.sh 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...

  2. Shell脚本_备份/etc数据

    vim backup_etc.sh chmod 755 backup_etc.sh 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2 ...

  3. bzoj1503

    treap改了好长时间,erase写错了... #include<iostream> #include<cstdio> #include<cstdlib> usin ...

  4. 100722C

    My birthday is coming up and traditionally I'm serving pie. Not just one pie, no, I have a number N ...

  5. dede使用方法----如何转换时间戳

    dede用sql调用一个mysql时间,mysql的时间字段是时间戳展示的,突然不知道咋转换了,有点迷茫,结果找了下,发现其实很简单,直接用dede的就行了,如下: 完整时间:[field:datel ...

  6. iOS 蓝牙开发(四)BabyBluetooth蓝牙库介绍(转)

    转载自:http://www.cocoachina.com/ios/20151106/14072.html 原文作者:刘彦玮 BabyBluetooth 是一个最简单易用的蓝牙库,基于CoreBlue ...

  7. python 学习笔记12(序列常用方法总结)

    http://www.cnblogs.com/vamei/archive/2012/07/19/2599940.html 多回想!!! 1. 序列(list,tuple,string) len(s) ...

  8. Android笔试和面试提点

    Android基础知识 Android 的四大组件是哪些? Activity,Service,Broadcast和ContentProvide Android 的常用的容器布局是哪些? FrameLa ...

  9. 1.Android常见异常:android.view.WindowLeaked 分析以及解决办法

    在项目中遇到WindowManager: Activity  has leaked window问题,其实在stackoverflow.com可以找到详细答案:http://stackoverflow ...

  10. bzoj 1101 zap

    gcd(x,y)=d-->gcd(x/d,y/d)=1. 即求Σ(i<=n/d)Σ(j<=m/d) e(gcd(i,j)) 因为e=miu×1,可以卷积. 因为多组询问,需要sqrt ...