FASTQ format

每个FASTQ文件中每个序列通常有四行信息:

1: 以 '@' 字符开头,后面紧接着的是序列标识符和可选字段的描述(类似FASTA title line).

2: 序列

3: 以 '+' 字符开头, 后面紧接着的是可选字段的描述性信息

4: 第二行序列的质量信息

Illumina sequence identifiers

@HWUSI-EAS100R:6:73:941:1973#0/1

sequence identifiers description
HWUSI-EAS100R the unique instrument name
6 flowcell lane
73 tile number within the flowcell lane
941 'x'-coordinate of the cluster within the tile
1973 'y'-coordinate of the cluster within the tile
#0 index number for a multiplexed sample (0 for no indexing)
/1 the member of a pair, /1 or /2 (paired-end or mate-pair reads only)

Versions of the Illumina pipeline since 1.4 appear to use #NNNNNN instead of #0 for the multiplex ID, where NNNNNN is the sequence of the multiplex tag.

With Casava 1.8 the format of the '@' line has changed:

@EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG

sequence identifiers description
EAS139 the unique instrument name
136 the run id
FC706VJ the flowcell id
2 flowcell lane
2104 tile number within the flowcell lane
15343 'x'-coordinate of the cluster within the tile
197393 'y'-coordinate of the cluster within the tile
1 the member of a pair, 1 or 2 (paired-end or mate-pair reads only)
Y Y if the read is filtered, N otherwise
18 0 when none of the control bits are on, otherwise it is an even number(偶数)
ATCACG index sequence

将FASTQ 转换为 FASTA 格式:

zcat input_file.fastq.gz | awk 'NR%4==1{printf ">%s\n", substr($0,2)}NR%4==2{print}' > output_file.fa

#printf 命令的语法:format-string 为格式控制字符串,arguments 为参数列表。
printf format-string [arguments...] #substr(s,p) 返回字符串s中从p开始的后缀部分
#substr(s,p,n) 返回字符串s中从p开始长度为n的后缀部分。

FASTQ format的更多相关文章

  1. 怎么检测自己fastq的Phred类型 | phred33 phred64

    http://wiki.bits.vib.be/index.php/Identify_the_Phred_scale_of_quality_scores_used_in_fastQ # S - San ...

  2. Quality assessment and quality control of NGS data

    http://www.molecularevolution.org/resources/activities/QC_of_NGS_data_activity_new table of contents ...

  3. Canu Tutorial(canu指导手册)

    链接:Canu Tutorial Canu assembles reads from PacBio RS II or Oxford Nanopore MinION instruments into u ...

  4. het smooth 组装高杂合度二倍体基因组前期数据处理

    http://sourceforge.net/projects/het-smooth/ equencing technologies, such as Illumina sequencing, pro ...

  5. 去除reads中的pcr 重复,fastquniq

    改编: python ~/tools2assemble/run_fastuniq.py SHT-3K-1_1.fq.gz SHT-3K-1_2.fq.gz 好像不支持gz文件,要先解压 http:// ...

  6. Question: Should I use reads with good quality but failed-vendor flag?--biostart for vendor quality

    https://www.biostars.org/p/198405/ Quick question is: I have some mapped reads in bam file which hav ...

  7. <二代測序> 下载 NCBI sra 文件

    本文近期更新地址: http://blog.csdn.net/tanzuozhev/article/details/51077222 随着測序技术的不断提高.二代測序数据成指数增长. NCBI提供了S ...

  8. 利用Bioperl的SeqIO模块解析fastq文件

    测序数据中经常会接触到fastq格式的文件,比如说拿到fastq格式的原始数据后希望查看测序碱基的质量并去除低质量碱基.一般而言大家都是用现有的工具,比如说fastqc这个Java写的小程序,确实很好 ...

  9. fasta/fastq格式解读

    1)知识简介--------------------------------------------------------1.1)测序质量值 首先在了解fastq,fasta之前,了解一下什么是质量 ...

随机推荐

  1. A Secure Cookie Protocol 安全cookie协议 配置服务器Cookie

    Title http://www.cse.msu.edu/~alexliu/publications/Cookie/cookie.pdf AbstractCookies are the primary ...

  2. Java 之 JUC

    1. JUC 简介 在 Java 5.0 提供了 java.util.concurrent(简称JUC)包,在此包中增加了在并发编程中很常用的工具类, 用于定义类似于线程的自定义子系统,包括线程池,异 ...

  3. 通过virt-manager 利用NFS创建、迁移虚拟机2

    前面一篇文章介绍了利用NFS创建虚拟机的过程,本文就介绍下如何利用NFS对虚拟机进行动态迁移. 动态迁移就是把一台虚拟机在不停止其运行的情况下迁移到另一台物理机上.这看起来似乎不太可能,不过还好kvm ...

  4. ssm之mapper异常 Result Maps collection already contains value for com.ssj.mapper.UserMapper 和 Type interface com.ssj.mapper.UserMapper is already known to the MapperRegistry.

    异常一:Result Maps collection already contains value for com.ssj.mapper.XXXMapper 原因分析:XXXmapper.xml文件中 ...

  5. XVII Open Cup named after E.V. Pankratiev Stage 14, Grand Prix of Tatarstan, Sunday, April 2, 2017 Problem J. Terminal

    题目:Problem J. TerminalInput file: standard inputOutput file: standard inputTime limit: 2 secondsMemo ...

  6. TOSCA自动化测试工具--Log defect

    1.执行完用例后,对于失败的用例进行分析,如果有缺陷,可以提对应的缺陷 2.在issues模块, 右键创建自己需要的文件夹,然后在文件夹上右键找到虫子图形点下,就可以创建缺陷了,填上对应的内容 3.如 ...

  7. cgwin的ssh错误解决办法

    参考博客    http://hi.baidu.com/luckygirl/item/bd00a6d8a05c310d20e25039 方法一(推荐): 修改/etc/passwd文件,在其中加入 s ...

  8. linux系统上使用unzip命令

    最近在本地使用maven打包工程后,将工程部署到linux服务器的tomcat上,使用unzip解压工程报--->未找到命令.即该命名文件未安装,需要安装一下.安装命令如下: yum insta ...

  9. 使用selenium前学习HTML(2)——标签

    <!-- HTML 标题(Heading)是通过 <h1> - <h6> 等标签进行定义的. HTML 段落是通过 <p> 标签进行定义的. HTML 链接是 ...

  10. USB Transfer and Packet Sizes

    https://msdn.microsoft.com/en-us/library/ff538112.aspx http://blog.csdn.net/chenyujing1234/article/d ...