默认参数:

java -jar trimmomatic-0.30.jar PE s_1_1_sequence.txt.gz s_1_2_sequence.txt.gz
lane1_forward_paired.fq.gz lane1_forward_unpaired.fq.gz lane1_reverse_paired.fq.gz
lane1_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3
TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

This will perform the following:

  • Remove adapters (ILLUMINACLIP:TruSeq3-PE.fa:2:30:10)
  • Remove leading low quality or N bases (below quality 3) (LEADING:3)
  • Remove trailing low quality or N bases (below quality 3) (TRAILING:3)
  • Scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 15 (SLIDINGWINDOW:4:15)
  • Drop reads below the 36 bases long (MINLEN:36)

根据自己的数据调整一下MINLEN

用途:trim, crop, remove adapters

支持gz, bz2的fastq

http://www.usadellab.org/cms/?page=trimmomatic

http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf

先下载二进制包

http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.33.zip

然后解压unzip

解压后直接就可以用了

从0.32版本,软件会自动识别是33还是64.

ILLUMINACLIP: Cut adapter and other illumina-specific sequences from the read.

  • ILLUMINACLIP:<fastaWithAdaptersEtc>:<seed mismatches>:<palindrome clip threshold>:<simple clip threshold>

    • fastaWithAdaptersEtc: specifies the path to a fasta file containing all the adapters, PCR sequences etc. The naming of the various sequences within this file determines how they are used. See below.
    • seedMismatches: specifies the maximum mismatch count which will still allow a full match to be performed
    • palindromeClipThreshold: specifies how accurate the match between the two 'adapter ligated' reads must be for PE palindrome read alignment.
    • simpleClipThreshold: specifies how accurate the match between any adapter etc. sequence must be against a read.

SLIDINGWINDOW: Performs a sliding window trimming approach. It starts scanning at the 5‟ end and clips the read once the average quality within the window falls below a threshold.

MAXINFO: An adaptive quality trimmer which balances read length and error rate to maximise the value of each read

LEADING: Cut bases off the start of a read, if below a threshold quality

TRAILING: Cut bases off the end of a read, if below a threshold quality

CROP: Cut the read to a specified length by removing bases from the end

HEADCROP: Cut the specified number of bases from the start of the read

MINLEN: Drop the read if it is below a specified length

AVGQUAL: Drop the read if the average quality is below the specified level

TOPHRED33: Convert quality scores to Phred-33

TOPHRED64: Convert quality scores to Phred-64

顺序:

Trimming occur in the order in which the steps are specified on the command line. It is recommended in most cases that adapter clipping, if required, is done as early as possible, since correctly identifying adapters using partial matches is more difficult.

The Adapter Fasta

Illumina adapter and other technical sequences are copyrighted by Illumina,but we have been granted permission to distribute them with Trimmomatic. Suggested adapter sequences are provided for TruSeq2 (as used in GAII machines) and TruSeq3 (as used by HiSeq and MiSeq machines), for both single-end and paired-end mode. These sequences have not been extensively tested, and depending on specific issues which may occur in library preparation, other sequences may work better for a given dataset.

To make a custom version of fasta, you must first understand how it will be used. Trimmomatic uses two strategies for adapter trimming: Palindrome and Simple

With 'simple' trimming, each adapter sequence is tested against the reads, and if a sufficiently accurate match is detected, the read is clipped appropriately.

'Palindrome' trimming is specifically designed for the case of 'reading through' a short fragment into the adapter sequence on the other end. In this approach, the appropriate adapter sequences are 'in silico ligated' onto the start of the reads, and the combined adapter+read sequences, forward and reverse are aligned. If they align in a manner which indicates 'read-through', the forward read is clipped and the reverse read dropped (since it contains no new data).

Naming of the sequences indicates how they should be used. For 'Palindrome' clipping, the sequence names should both start with 'Prefix', and end in '/1' for the forward adapter and '/2' for the reverse adapter. All other sequences are checked using 'simple' mode. Sequences with names ending in '/1' or '/2' will be checked only against the forward or reverse read. Sequences not ending in '/1' or '/2' will be checked against both the forward and reverse read. If you want to check for the reverse-complement of a specific sequence, you need to specifically include the reverse-complemented form of the sequence as well, with another name.

The thresholds used are a simplified log-likelihood approach. Each matching base adds just over 0.6, while each mismatch reduces the alignment score by Q/10. Therefore, a perfect match of a 12 base sequence will score just over 7, while 25 bases are needed to score 15. As such we recommend values between 7 - 15 for this parameter. For palindromic matches, a longer alignment is possible - therefore this threshold can be higher, in the range of 30. The 'seed mismatch' parameter is used to make alignments more efficient, specifying the maximum base mismatch count in the 'seed' (16 bases). Typical values here are 1 or 2.

输入和输出的名字要起好:

For input files,

either of the following can be used:

Explicitly naming the 2 input files Naming the forward file using the -basein flag, where the reverse file can be determined automatically.

The second file is determined by looking for common patterns of file naming, and changing the appropriate character to reference the reverse file.

Examples which should be correctly handled include:

Sample_Name_R1_001.fq.gh -> Sample_Name_R2_001.fq.gz

Sample_Name.f.fastq -> Sample_Name.r.fastq

Sample_Name.1.sequence.txt -> Sample_Name.2.sequence.txt

For output files, either of the following can be used:

Explicity naming the 4 output files Providing a base file name using the –baseout flag,

from which the 4 output files can be derived. If the name “mySampleFiltered.fq.gz” is provided, the following 4 file names will be used:

mySampleFiltered_1P.fq.gz - for paired forward reads

mySampleFiltered_1U.fq.gz - for unpaired forward reads

mySampleFiltered_2P.fq.gz - for paired reverse reads

mySampleFiltered_2U.fq.gz - for unpaired reverse reads

Most processing steps take one or more settings, delimited by ':' (a colon)

SLIDINGWINDOW
Perform a sliding window trimming, cutting once the average quality within the window falls
below a threshold. By considering multiple bases, a single poor quality base will not cause the
removal of high quality data later in the read.
SLIDINGWINDOW:<windowSize>:<requiredQuality>
windowSize: specifies the number of bases to average across
requiredQuality: specifies the average quality required.

The SLIDINGWINDOW trimmer will cut the leftmost position in the window where the average quality drops below the threshold and remove the rest of the read. However if there is low quality in the very beginning of the read then it will fail the minimum length tests and be removed completely - the remaining 3-prime end (even if it is good quality will not be printed)

Consider the following test file t.fq:

@1\1
AATGATCGTAGCGATGCAAGCTAGCCCGATGCCCGATCGCATCG
+
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeEFCB
@2\1
AATGATCGTAGCGATGCAAGCTAGCCCGATGCCCGATCGCATCG
+
EFCBeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee

and processing with:

$ java -jar trimmomatic.jar SE -phred64 t.fq tt.fq SLIDINGWINDOW:4:15
TrimmomaticSE: Started with arguments: -phred64 t.fq tt.fq SLIDINGWINDOW:4:15
Automatically using 16 threads
Input Reads: 2 Surviving: 1 (50.00%) Dropped: 1 (50.00%)
TrimmomaticSE: Completed successfully

The output file looks like the following:

@1\1
AATGATCGTAGCGATGCAAGCTAGCCCGATGCCCGATCGC
+
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee

As you can see the read with the poor quality at the beginning has been removed completely

LEADING
Remove low quality bases from the beginning. As long as a base has a value below this
threshold the base is removed and the next base will be investigated.
LEADING:<quality>
quality: Specifies the minimum quality required to keep a base.

TRAILING
Remove low quality bases from the end. As long as a base has a value below this threshold
the base is removed and the next base (which as trimmomatic is starting from the 3‟ prime end
would be base preceding the just removed base) will be investigated. This approach can be
used removing the special illumina „low quality segment‟ regions (which are marked with
quality score of 2), but we recommend Sliding Window or MaxInfo instead
TRAILING:<quality>
quality: Specifies the minimum quality required to keep a base.

CROP
Removes bases regardless of quality from the end of the read, so that the read has maximally
the specified length after this step has been performed. Steps performed after CROP might of
course further shorten the read.
CROP:<length>
length: The number of bases to keep, from the start of the read.

HEADCROP
Removes the specified number of bases, regardless of quality, from the beginning of the read.
HEADCROP:<length>
length: The number of bases to remove from the start of the read.

Paired End:
java -jar trimmomatic-0.30.jar PE s_1_1_sequence.txt.gz s_1_2_sequence.txt.gz
lane1_forward_paired.fq.gz lane1_forward_unpaired.fq.gz lane1_reverse_paired.fq.gz
lane1_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3
TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

This will perform the following in this order

1,Remove Illumina adapters provided in the TruSeq3-PE.fa file (provided). Initially Trimmomatic will look for seed matches (16 bases) allowing maximally 2 mismatches. These seeds will be extended and clipped if in the case of paired end reads a score of 30 is reached (about 50 bases), or in the case of single ended reads a score of 10, (about 17 bases).
2,Remove leading low quality or N bases (below quality 3)
3,Remove trailing low quality or N bases (below quality 3)
4,Scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 15
5, Drop reads which are less than 36 bases long after these steps

最后我的命令如下:

先把开头N去掉,然后前40个碱基,缺点是有些第六个碱基是N的,没法去掉! 估计要写脚本,但是脚本太慢了。。。改天问问老师有没有什么好工具!

freemao

FAFU

Trimmomatic安装与使用的更多相关文章

  1. Trimmomatic过滤Illumina低质量序列

    1. 下载安装 直接去官网下载二进制软件,解压后的trimmomatic-0.36.jar即为我们需要的软件 官网: http://www.usadellab.org/cms/index.php?pa ...

  2. docker——容器安装tomcat

    写在前面: 继续docker的学习,学习了docker的基本常用命令之后,我在docker上安装jdk,tomcat两个基本的java web工具,这里对操作流程记录一下. 软件准备: 1.jdk-7 ...

  3. 网络原因导致 npm 软件包 node-sass / gulp-sass 安装失败的处理办法

    如果你正在构建一个基于 gulp 的前端自动化开发环境,那么极有可能会用到 gulp-sass ,由于网络原因你可能会安装失败,因为安装过程中部分细节会到亚马逊云服务器上获取文件.本文主要讨论在不变更 ...

  4. Sublime Text3安装JsHint

    介绍 Sublime Text3使用jshint依赖Nodejs,SublimeLinter和Sublimelinter-jshint. NodeJs的安装省略. 安装SublimeLinter Su ...

  5. Fabio 安装和简单使用

    Fabio(Go 语言):https://github.com/eBay/fabio Fabio 是一个快速.现代.zero-conf 负载均衡 HTTP(S) 路由器,用于部署 Consul 管理的 ...

  6. gentoo 安装

    加载完光驱后 1进行ping命令查看网络是否通畅 2设置硬盘的标识为GPT(主要用于64位且启动模式为UEFI,还有一个是MBR,主要用于32位且启动模式为bois) parted -a optima ...

  7. Linux平台 Oracle 10gR2(10.2.0.5)RAC安装 Part3:db安装和升级

    Linux平台 Oracle 10gR2(10.2.0.5)RAC安装 Part3:db安装和升级 环境:OEL 5.7 + Oracle 10.2.0.5 RAC 5.安装Database软件 5. ...

  8. Linux平台 Oracle 10gR2(10.2.0.5)RAC安装 Part1:准备工作

    Linux平台 Oracle 10gR2(10.2.0.5)RAC安装 Part1:准备工作 环境:OEL 5.7 + Oracle 10.2.0.5 RAC 1.实施前准备工作 1.1 服务器安装操 ...

  9. 【原】nodejs全局安装和本地安装的区别

    来微信支付有2年多了,从2年前的互联网模式转变为O2O模式,主要的场景是跟线下的商户去打交道,不像以往的互联网模式,有产品经理提需求,我们帮忙去解决问题. 转型后是这样的,团队成员更多需要去寻找业务的 ...

随机推荐

  1. BPMX3模拟登录

    实现功能只需要输入一个帐号即可登录系统. 需要实现上面的功能需要: 1.编辑imitate.jsp页面 <%@page import="com.hotent.core.util.Con ...

  2. JavaScript基本类型值与引用类型值

    前言 JS变量可以用来保存两种类型的值:基本类型值和引用类型值.基本类型的值源自一下5种基本数据类型:Underfined.Null.Boolean.Number和String. 基本类型值和引用类型 ...

  3. Shell脚本:使用rsync备份文件/目录

    本文我们介绍一个shell脚本,用来使用rsync命令将你本地Linux机器上的文件/目录备份到远程Linux服务器上.使用该脚本会以交互的方式实施备份,你需要提供远程备份服务器的主机名/ip地址和文 ...

  4. 对Docker的价值和应用场景分析

    近年来,Docker在IT界可谓风光十足,各大技术论坛上赚足了眼球,公司内外也有相当多的介绍和尝试,看上去如此高大上的技术,貌似会给云.服务部署.运维等领域带来颠覆性的创新. 近期查阅了一些文档,较深 ...

  5. 浅谈C++源码的过国内杀软的免杀

    以下只是简单的思路和定位.也许有人秒过,但是不要笑话我写的笨方法.定位永远是过期不了的. 其实这里废话一下 , 本人并不是大牛 ,今天跟大家分享下 .所以写出这篇文章.(大牛飘过) 只是个人实战的经验 ...

  6. POJ 1083 Moving Tables 思路 难度:0

    http://poj.org/problem?id=1083 这道题题意是有若干段线段,每次要求线段不重叠地取,问最少取多少次. 因为这些线段都是必须取的,所以需要让空隙最小 思路: 循环直到线段全部 ...

  7. 启动BPM的5个步骤

    在大部分业务中,我们通常认为:一个主要的业务流程管理项目从设计时间开始会比较好.我们知道很多方式来提高效率,增加生产力以及简化我们员工的工 作 - 这正是业务流程管理所做的.不幸的是,不管我们意图多好 ...

  8. 20145338 《Java程序设计》第1周学习总结

    教材学习内容总结 第一章 java平台概论 1.1Java不只是语言 Java最早是Sun公司"绿色项目"中撰写Star应用程序的程序语言,当时叫Oak.1995年5月23日改名为 ...

  9. “System.Threading.ThreadAbortException”类型的第一次机会异常在 mscorlib.dll 中发

    问题原因: Thread.Abort 方法 .NET Framework 4  其他版本   1(共 1)对本文的评价是有帮助 - 评价此主题 在调用此方法的线程上引发 ThreadAbortExce ...

  10. matlab图像剪裁命令imcrop()

    调用格式: I2=imcrop(I,RECT): X2=imcrop(X,MAP,RECT): RGB2=imcrop(RGB,RECT): 其中,I.X.RGB分别对应灰度图像.索引图像.RGB图像 ...