Multiple sequence alignment Benchmark Data set
Multiple sequence alignment Benchmark Data set
1. 汇总: 序列比对标准数据集: http://www.drive5.com/bench/

This is a collection of multiple alignment benchmarks in a uniform
format that is convenient for further analysis. All files are in
FASTA format, with upper-case letters used to indicate aligned
columns.
See References below for original sources of benchmark data.
Benchmarks are:
--------------------------1---------------------------
bali2dna
BALIBASE v2, reverse-translated to DNA
bali2dnaf
Bali2dbn, with frame-shifts induced by random insertions of one
or two nucleotides into the middle 50% of exactly one sequence
in each set.
bali3
BALIBASE v3.
bali3pdb
BALIS, the structural subset of BALIBASE v3.
bali3pdbm
MU-BALIS, i.e. BALIS re-aligned by MUSTANG.
---------------------------2--------------------------
ox
OXBENCH.
oxm
MU-OXBENCH, i.e. OXBENCH re-aligned by MUSTANG.
oxx
OXBENCH-X, i.e. the Extended set in OBENCH.
---------------------------3--------------------------
prefab4
PREFAB v4.
prefab4ref
PREFAB-R, i.e. the pair-wise reference pairs in PREFAB v4.
prefab4refm
MU-PREFAB-R, i.e. PREFAB-R re-aligned by MUSTANG.
---------------------------4--------------------------
sabre
Consistent multiple alignments constructed from SABMARK v1.65.
sabrem
MU-SABRE, i.e. SABRE re-aligned by MUSTANG.
-----------------------------------------------------
Directory structure under each benchmark is:
in/
Input sequences.
ref/
Reference alignments. Upper-case regions indicate conservative
regions that are intended for use in assessment. Lower-case regions
should not be used.
info/
Contains ids.txt (list of set identifiers that are filenames in ref/
and in/), nrseqs.txt (number of sequences in each set), and
pctids.txt (%id in conservative regions in each set).
Download page for qscore :http://www.drive5.com/bench/bench.tar.gz
This is a quality scoring program that compares two multiple sequence alignments: an alignment to be evaluated (the "test" alignment) and a second alignment that is believed to be correct (the "reference" alignment). The program outputs the following scores:
- The PREFAB Q score (aka the Balibase SPS score or the Developer score).
- The Modeler score
- The Cline et al. shift score
- The Balibase TC (total column) score
Balibase标准数据库地址: http://www.lbgi.fr/balibase/

References
----------
Thompson JD, Koehl P, Ripp R, Poch O (2005) BAliBASE 3.0: latest
developments of the multiple sequence alignment benchmark. Proteins
61: 127-136.
Bahr A, Thompson JD, Thierry JC, Poch O (2001) BAliBASE (Benchmark
Alignment dataBASE): enhancements for repeats, transmembrane
sequences and circular permutations. Nucleic Acids Res 29: 323-326.
Thompson JD, Plewniak F, Poch O (1999) BAliBASE: a benchmark
alignment database for the evaluation of multiple alignment programs.
Bioinformatics 15: 87-88.
Van Walle I, Lasters I, Wyns L (2005) SABmark--a benchmark for
sequence alignment that covers the entire known fold space.
Bioinformatics 21: 1267-1268.
Raghava GP, Searle SM, Audley PC, Barber JD, Barton GJ (2003)
OXBench: a benchmark for evaluation of protein multiple sequence
alignment accuracy. BMC Bioinformatics 4: 47.
Edgar RC (2004) MUSCLE: multiple sequence alignment with high
accuracy and high throughput. Nucleic Acids Res 32: 1792-1797.
Multiple sequence alignment Benchmark Data set的更多相关文章
- [Sequence Alignment Methods] Dynamic time warping (DTW)
本系列介绍几种序列对齐方法,包括Dynamic time warping (DTW),Smith–Waterman algorithm,Cross-recurrence plot Dynamic ti ...
- [Sequence Alignment Methods] Cross-Recurrent Plot (CRP)
A recurrence plot (RP) is a straightforward way to visualize characteristics of similar system state ...
- [Sequence Alignment Methods] Smith–Waterman algorithm
Smith–Waterman algorithm 首先需要澄清一个事实,Smith–Waterman algorithm是求两个序列的最佳subsequence匹配,与之对应的算法但是求两个序列整体匹 ...
- INTRODUCTION TO BIOINFORMATICS
INTRODUCTION TO BIOINFORMATICS 这套教程源自Youtube,算得上比较完整的生物信息学领域的视频教程,授课内容完整清晰,专题化的讲座形式,细节讲解比国内的京师大 ...
- Bioinformatics Glossary
原文:http://homepages.ulb.ac.be/~dgonze/TEACHING/bioinfo_glossary.html Affine gap costs: A scoring sys ...
- 三代PacBio reads纠错 - 专题
三代纠错的重要性不言而喻,三代的核心优势就是长,唯一的缺点就是错误率高,但好就好在错误是随机分布的,可以通过算法解决,这也就是为什么现在有这么多针对三代开发的纠错工具. 纠错和组装是分不开的,纠错就是 ...
- Difference between Hard Clip(H) and Soft Clip(S) in Samtools CIGAR string
一般人都知道 H 和 S 的表面上的区别,即 S 就是 soft, H 就是 hard,S 后,序列里还是会保留序列的信息,而 H 则不会. ----------------------------- ...
- SOAPdenovo组装软件使用记录
背景: 1.为什么要从头测序组装基因组? 基因组是不同表型的遗传基础:获得参考基因组是深入研究一个生物体全基因组的第一步也是必须的一步:从头测序组装能够对新的测序物种构建参考基因组: 2.为什么要研究 ...
- 序列比对之Biostrings包
基本概念 Biostrings包很重要的3个功能是进行Pairwise sequence alignment 和Multiple sequence alignment及 Pattern finding ...
随机推荐
- iOS开发小技巧--实现毛玻璃效果的方法
一.美工出图 二.第三方框架 -- DRNRealTimeBlur,框架继承自UIView.使用方法:创建UIView直接继承自框架的View,就有了毛玻璃效果 三.CoreImage -- 图片加高 ...
- .Net Core 1.0.0正式版安装及示例教程
使用VS Code 从零开始开发调试.NET Core 1.0 RTM. .NET Core 是一个开源的.跨平台的 .NET 实现. VS Code 全称是 Visual Studio Code,V ...
- php实现木桶排序
今天重新看了看木桶排序,思路比较简单,这里整理一下,免得下次忘记. 假设要对一组数据 2 2 3 1 6 5 4 进行桶排序. 1.首先选出最小元素1和最大元素6,做一个桶,也就是定义一个1-6的数组 ...
- CSS截取截取字符长度并显示省略号的方法
HTML部分 <div> <span>这是一个CSS3截取截取字符的例子.它根据宽度来处理.</span> </div> <div class=& ...
- 使用IDEA和gradle搭建Spring MVC和MyBatis开发环境
1. 概述 Gradle是一个基于Apache Ant和Apache Maven概念的项目自动化建构工具. 它使用一种基于Groovy的特定领域语言(DSL)来声明项目设置,抛弃了基于XML的各种繁琐 ...
- 如何在linux Shell脚本里面把一个数组传递到awk内部进行处理
前段时间和几位同事讨论过一个问题:Shell脚本里面怎样把一个数组传递到awk内部进行处理? 当时没有找到方法.前两天在QQ群里讨论awk的时候,无意间又聊起这个话题.机缘巧合之下找到一个思路,特此分 ...
- 【USACO】Transformations(模拟)
Transformations A square pattern of size N x N (1 <= N <= 10) black and white square tiles is ...
- 「c++小学期」实验题目及代码
面向对象编程的C++,和平时做题用的C++还是有差距的.实验的题目都是小题目,就都做一下吧.(没放代码的为要验收的 实验一 简单C++程序设计 1. 猜价格游戏 编写C++程序完成以下功能: (1) ...
- System.BadImageFormatException: 未能加载文件或程序集""或它的某一个依赖项。试图加载格式不正确的程序。
解决方法: 1.更改程序集的生成目标平台为[Any CPU],或者针对平台进行编译. 项目右键->[属性]->[生成]->[生成目标平台] 2.尝试一下修改线程池设置为32位支持.
- dedecms /member/flink_main.php SQL Injection Vul
catalog . 漏洞描述 . 漏洞触发条件 . 漏洞影响范围 . 漏洞代码分析 . 防御方法 . 攻防思考 1. 漏洞描述 会员模块中存在的SQL注入 Relevant Link http://w ...