1.简介

MSGF+也是近年来应用得比较多的蛋白鉴定软件。java写的，2008年初次发表JPR，2014年升级发表NC，免费开源，持续更新维护，良心软件。而且，有研究者对不同蛋白质组学鉴定软件进行比较分析，MSGF+的表现也是非常不错的（一下子找不到文献出处~~）。

Github源码：https://github.com/MSGFPlus/msgfplus

支持的输入格式包括：mzML, mzXML, Mascot Generic File (mgf), MS2 files, Micromass Peak List files (pkl), Concatenated DTA files (_dta.txt)

主要支持HUPO PSI 的标准输入mzML格式，以及输出mzIdentML格式（简写mzid ），易转化为TSV格式。

关于mzIdentML格式，参考http://www.psidev.info/mzidentml

2.安装运行

软件下载：https://github.com/MSGFPlus/msgfplus/releases

关于使用，MS-GF+有非常详细的文档：MS-GF+ Documentation

参数配置文件：

https://github.com/MSGFPlus/msgfplus/tree/master/docs/ParameterFiles

关于运行，提供了很多示例以及参数的解释：

https://msgfplus.github.io/msgfplus/MSGFPlus.html

运行示例1：

java -Xmx4000M -jar MSGFPlus.jar \

  -s test.mzML \

  -d uniprot_swissprot_human_20190313_20417.fasta \

  -t 20ppm -ti -1,2 -ntt 0 -tda 1 -e 0 -m 3 -inst 3 -minCharge 1 -maxCharge 6 -addFeatures 1 \

  -mod Mods.txt \

  -o test.mzid

修饰文件Mods.txt内容如下：

# This file is used to specify modifications

# # for comments

#

# Max Number of Modifications per peptide

# If this value is large, the search takes long.

NumMods=2

# To input a modification, use the following command:

# Mass or CompositionStr, Residues, ModType, Position, Name (all the five fields are required).

# CompositionStr (C[Num]H[Num]N[Num]O[Num]S[Num]P[Num]Br[Num]Cl[Num]Fe[Num])

#       - C (Carbon), H (Hydrogen), N (Nitrogen), O (Oxygen), S (Sulfer), P (Phosphorus), Br (Bromine), Cl (Chlorine), Fe (Iron), and Se (Selenium) are allowed.

#       - Negative numbers are allowed.

#       - E.g. C2H2O1 (valid), H2C1O1 (invalid)

# Mass can be used instead of CompositionStr. It is important to specify accurate masses (integer masses are insufficient).

#       - E.g. 15.994915

# Residues: affected amino acids (must be upper letters)

#       - Must be uppor letters or *

#       - Use * if this modification is applicable to any residue.

#       - * should not be "anywhere" modification (e.g. "15.994915, *, opt, any, Oxidation" is not allowed.)

#       - E.g. NQ, *

# ModType: "fix" for fixed modifications, "opt" for variable modifications (case insensitive)

# Position: position in the peptide where the modification can be attached.

#       - One of the following five values should be used:

#       - any (anywhere), N-term (peptide N-term), C-term (peptide C-term), Prot-N-term (protein N-term), Prot-C-term (protein C-term)

#       - Case insensitive

#       - "-" can be omitted

#       - E.g. any, Any, Prot-n-Term, ProtNTerm => all valid

# Name: name of the modification (Unimod PSI-MS name)

#       - For proper mzIdentML output, this name should be the same as the Unimod PSI-MS name

#       - E.g. Phospho, Acetyl

#       - Visit http://www.unimod.org to get PSI-MS names.

C2H3N1O1,C,fix,any,Carbamidomethyl              # Fixed Carbamidomethyl C

#144.102063,*,fix,N-term,iTRAQ4plex             # iTRAQ 4 plex

#144.102063,K,fix,any,iTRAQ4plex                        # iTRAQ 4 plex

# Variable Modifications (default: none)

O1,M,opt,any,Oxidation                          # Oxidation M

#15.994915,M,opt,any,Oxidation                  # Oxidation M (mass is used instead of CompositionStr)

H-1N-1O1,NQ,opt,any,Deamidated                  # Negative numbers are allowed.

#C2H3NO,*,opt,N-term,Carbamidomethyl            # Variable Carbamidomethyl N-term

#H-2O-1,E,opt,N-term,Glu->pyro-Glu                      # Pyro-glu from E

#H-3N-1,Q,opt,N-term,Gln->pyro-Glu                      # Pyro-glu from Q

#C2H2O,*,opt,Prot-N-term,Acetyl                 # Acetylation Protein N-term

#C2H2O1,K,opt,any,Acetyl                        # Acetylation K

#CH2,K,opt,any,Methyl                           # Methylation K

#HO3P,STY,opt,any,Phospho                       # Phosphorylation STY

运行示例2：

java -Xmx4g -Xms1g -jar MSGFPlus.jar

-conf MSGFPlus_Parameters.txt \

-d test.fasta \

-s test.mzML \

-o test.mzid

参数配置文件MSGFPlus_Parameters.txt内容如下：

#Parent mass tolerance

#  Examples: 2.5Da or 30ppm

#  Use comma to set asymmetric values, for example "0.5Da,2.5Da" will set 0.5Da to the left (expMass<theoMass) and 2.5Da to the right (expMass>theoMass)

PrecursorMassTolerance=20ppm

#Max Number of Modifications per peptide

# If this value is large, the search will be slow

NumMods=5

#Modifications (see below for examples)

StaticMod=C2H3N1O1,  C,   fix,  any,  Carbamidomethyl              # Fixed Carbamidomethyl C

DynamicMod=O1,       M,   opt,  any,  Oxidation                    # Oxidized methionine

DynamicMod=H-1N-1O1, NQ,  opt,  any,  Deamidated                   # Deamidation of Glutamine (+0.984016)

#Custom amino acids

CustomAA=C3H5NO,     U,  custom, U,   Selenocysteine               # Custom amino acids can only have C, H, N, O, and S

#CustomAA=H0,        X,  custom, X,   RemoveAA                     # Remove AA

#Fragmentation Method

#  0 means as written in the spectrum or CID if no info (Default)

#  1 means CID

#  2 means ETD

#  3 means HCD

#  4 means Merge spectra from the same precursor (e.g. CID/ETD pairs, CID/HCD/ETD triplets)

FragmentationMethodID=3

#Instrument ID

#  0 means Low-res LCQ/LTQ (Default for CID and ETD); use InstrumentID=0 if analyzing a dataset with low-res CID and high-res HCD spectra

#  1 means High-res LTQ (Default for HCD; also appropriate for high res CID); use InstrumentID=1 for Orbitrap, Lumos, and QEHFX instruments

#  2 means TOF

#  3 means Q-Exactive

InstrumentID=1

#Enzyme ID

#  0 means No enzyme used

#  1 means Trypsin (Default); use this along with NTT=0 for a no-enzyme search of a tryptically digested sample

#  2: Chymotrypsin, 3: Lys-C, 4: Lys-N, 5: Glu-C, 6: Arg-C, 7: Asp-N, 8: alphaLP, 9: No Enzyme (for peptidomics)

EnzymeID=1

#Isotope error range

#  Takes into account of the error introduced by choosing non-monoisotopic peak for fragmentation.

#  Useful for accurate precursor ion masses

#  Ignored if the parent mass tolerance is > 0.5Da or 500ppm

#  The combination of -t and -ti determins the precursor mass tolerance.

#  e.g. "-t 20ppm -ti -1,2" tests abs(exp-calc-n*1.00335Da)<20ppm for n=-1, 0, 1, 2.

IsotopeErrorRange=0,3

#Number of tolerable termini

#  The number of peptide termini that must have been cleaved by the enzyme (default 1)

#  For trypsin, 2 means fully tryptic only, 1 means partially tryptic, and 0 means no-enzyme search

NTT=2

#Target/Decoy search mode

#  0 means don't search decoy database (default)

#  1 means search decoy database to compute FDR (source FASTA file must be forward-only proteins)

TDA=1

#Number of Threads (by default, uses all available cores)

NumThreads=8

#Minimum peptide length to consider

MinPepLength=6

#Maximum peptide length to consider

MaxPepLength=50

#Minimum precursor charge to consider (if not specified in the spectrum)

MinCharge=1

#Maximum precursor charge to consider (if not specified in the spectrum)

MaxCharge=6

#Number of matches per spectrum to be reported

#If this value is greater than 1 then the FDR values computed by MS-GF+ will be skewed by high-scoring 2nd and 3rd hits

NumMatchesPerSpec=1

#Amino Acid Modification Examples

# Specific static modifications using one or more StaticMod= entries

# Specific dynamic modifications using one or more DynamicMod= entries

# Modification format is:

# Mass or CompositionStr, Residues, ModType, Position, Name (all the five fields are required).

# Examples:

#   C2H3N1O1,  C,  fix, any,         Carbamidomethyl    # Fixed Carbamidomethyl C (alkylation)

#   O1,        M,  opt, any,         Oxidation          # Oxidation M

#   15.994915, M,  opt, any,         Oxidation          # Oxidation M (mass is used instead of CompositionStr)

#   H-1N-1O1,  NQ, opt, any,         Deamidated         # Negative numbers are allowed.

#   CH2,       K,  opt, any,         Methyl             # Methylation K

#   C2H2O1,    K,  opt, any,         Acetyl             # Acetylation K

#   HO3P,      STY,opt, any,         Phospho            # Phosphorylation STY

#   C2H3NO,    *,  opt, N-term,      Carbamidomethyl    # Variable Carbamidomethyl N-term

#   H-2O-1,    E,  opt, N-term,      Glu->pyro-Glu      # Pyro-glu from E

#   H-3N-1,    Q,  opt, N-term,      Gln->pyro-Glu      # Pyro-glu from Q

#   C2H2O,     *,  opt, Prot-N-term, Acetyl             # Acetylation Protein N-term

#Custom amino acids examples

# Only supports empirical formulas of elements C H N O S.

# If other elements are needed, or a specific mass is needed, they can be added as fixed modifications on the custom AA

# Maximum atom counts: 255 C, 255 H, 63 N, 63 O, 15 S

# Format spec is:

# EmpiricalFormula, ResidueSymbol, custom, OriginalAA, Name (all the five fields are required, though OriginalAA is not actually used for anything)

# Examples:

#   C5H7N1O2S0,J,custom,P,Hydroxylation     # Hydroxyproline

#   C3H6N2O0S1,X,custom,C,Amidation         # C-terminal amidation of Cys

#   C5H5N1O1S0,Z,custom,E,Glu->pyro-Glu     # N-terminal pyroGlu residue, from either Glu OR Gln

3.结果

原始输出格式MzIdentML，示例文件test.mzid。

有2种方法将mzid文件转化为tsv，使结果更加易读。详见https://msgfplus.github.io/msgfplus/MzidToTsv.html：

一是MSGFPlus.jar内置的MzIDToTsv工具，实现容易，但对于大文件慢。

Usage: java -Xmx3500M -cp MSGFPlus.jar edu.ucsd.msjava.ui.MzIDToTsv

	-i MzIDFile (MS-GF+ output file (*.mzid))

	[-o TSVFile] (TSV output file (*.tsv) (Default: MzIDFileName.tsv))

	[-showQValue 0/1] (0: do not show Q-values, 1: show Q-values (Default))

	[-showDecoy 0/1] (0: do not show decoy PSMs (Default), 1: show decoy PSMs)

	[-unroll 0/1] (0: merge shared peptides (Default), 1: unroll shared peptides)

二是单独使用MzidToTsvConverter.exe工具，转化快，处理大文件，限于Windows（Linux需要mono）

MzidToTsvConverter.exe -mzid:SearchResults.mzid -unroll -showDecoy

转化为tsv后的示例文件：test_Unrolled.tsv

表头内容包含：

      1 #SpecFile

      2 SpecID

      3 ScanNum

      4 FragMethod

      5 Precursor

      6 IsotopeError

      7 PrecursorError(ppm)

      8 Charge

      9 Peptide

     10 Protein

     11 DeNovoScore

     12 MSGFScore

     13 SpecEValue

     14 EValue

     15 QValue

     16 PepQValue

ref：

https://msgfplus.github.io/msgfplus/index.html

http://www.psidev.info/mzidentml

https://omics.pnl.gov/software/ms-gf

https://github.com/MSGFPlus/msgfplus

https://github.com/MSGFPlus/msgfplus/tree/master/docs/ParameterFiles

https://msgfplus.github.io/msgfplus/MzidToTsv.html

https://github.com/MSGFPlus/msgfplus/releases

蛋白质组学鉴定定量系列软件总结：

【1】蛋白鉴定软件之X!Tandem

【2】蛋白鉴定软件之Comet

【3】蛋白鉴定软件之Mascot

【4】蛋白质组学鉴定软件之MSGFPlus

【5】蛋白质组学鉴定定量软件之PD

【6】蛋白质组学鉴定定量软件之MaxQuant

【4】蛋白质组学鉴定软件之MSGFPlus的更多相关文章

【6】蛋白质组学鉴定定量软件之MaxQuant
目录 1.简介 2.下载安装 3.配置与运行 4.结果 5.Perseus后处理 6.小结 1.简介 2016年,德国马普所的Cox和蛋白质组学领域巨擘Matthias Mann合作开发了MaxQua ...
【5】蛋白质组学鉴定定量软件之PD
目录 1.简介 2.安装与配置 3.分析流程 4.结果 1.简介 PD全称Proteome Discoverer,是ThermoFisher在2008年推出的商业Windows软件,没错,收费,还不菲 ...
【3】蛋白鉴定软件之Mascot
目录 1.简介 2.配置 2.1在线版本 2.2 服务器版本 3.运行 3.1 在线版本 3.2 服务器版本 4.结果 1.简介 Mascot是非常经典的蛋白鉴定软件,被Frost & Sul ...
【2】蛋白鉴定软件之Comet
目录 1.简介 2.下载安装 3.软件使用 4.结果 1.简介官网:http://comet-ms.sourceforge.net/ 1993年开发,持续更新,免费开源适用Windows/Linu ...
【1】蛋白鉴定软件之X!Tandem
目录 1. 简介 2.下载安装 3. 软件试用 4. 结果 5. FAQ 1. 简介 X!Tandem是GPM:The Global Proteome Machine(主要基于Web的开源用户界面,用 ...
Journal of Proteomics Research | 自动的、可重复的免疫多肽数据分析流程MHCquant
题目:MHCquant: Automated and reproducible data analysis for immunopeptidomics 期刊:Journal of Proteome R ...
从零开始编写自己的C#框架（24）——测试
导航 1.前言 2.不堪回首的开发往事 3.测试推动开发的成长——将Bug消灭在自测中 4.关于软件测试 5.制定测试计划 6.编写测试用例 7.执行测试用例 8.发现并提交Bug 9.开发人员修复B ...
ST
这次说一下测试的基础部分软件测试软件测试(英语:software testing),描述一种用来促进鉴定软件的正确性.完整性.安全性和质量的过程.换句话说,软件测试是一种实际输出与预期输出间的审核 ...
软件测试software testing summarize
软件测试(英语:software testing),描述一种用来促进鉴定软件的正确性.完整性.安全性和质量的过程.软件测试的经典定义是:在规定的条件下对程序进行操作,以发现程序错误,衡量软件质量,并对 ...

随机推荐

【二食堂】Beta - Scrum Meeting 4
Scrum Meeting 4 例会时间:5.17 18:30~18:50 进度情况组员当前进度今日任务李健 1. 继续完成文本区域划词添加的功能 issue 1. 划词功能已经实现,继续开发 ...
大闸蟹的OO第二单元总结
OO的第二单元是讲多线程的协作与控制,三次作业分别为FAFS电梯,ALS电梯和三部需要协作的电梯.三次作业由浅入深,让我们逐渐理解多线程的工作原理和运行状况. 第一次作业: 第一次作业是傻瓜电梯,也就 ...
【行人惯性导航】关于行人导航中IMU位姿推导的知识点及相关代码
IMU姿态惯性推导最近从事行人惯性导航的研究,本人也是一个小白,其中看了很多文献,有很多个人思考很费时间的地方,撰写此随笔的目的不仅是给自己做一个笔记,也是给各位有需要的仁兄一点个人理解. 本文只关 ...
pascals-triangle leetcode C++
Given numRows, generate the first numRows of Pascal's triangle. For example, given numRows = 5, Retu ...
SI Macro
获取 buf 里的 symbol cbuf = BufListCount() msg(cbuf) ibuf = 0 while (ibuf < cbuf) { hbuf = BufListIte ...
Serverless 工程实践｜自建 Apache OpenWhisk 平台
作者 | 刘宇(江昱) 前言:OpenWhisk 是一个开源.无服务器的云平台,可以在运行时容器中通过执行扩展的代码响应各种事件,而无须用户关心相关的基础设施架构. OpenWhisk 简介 Open ...
/etc/passwd 和 /etc/shadows 详解
linux操作系统上的用户如果需要登录主机,当其输入用户名和密码之后: 首先在/etc/passwd文件中查找是否有你的账号,如果没有无法登录,如果有的话将该用户的UID和GID读出来,此外将此用户的 ...
k8s入坑之路（9）k8s网络插件详解
Flannel: 最成熟.最简单的选择 Calico: 性能好.灵活性最强,目前的企业级主流 Canal: 将Flannel提供的网络层与Calico的网络策略功能集成在一起. Weave: 独有的功 ...
MySQL、Oracle批量插入SQL的通用写法
举个例子: 现在要批量新增User对象到数据库USER表中 public class User{ //姓名 private String name; //年龄 private Integer age; ...
Java学习（二十二）
学了一个在css中叫font的样式: 感觉还是挺好用的不过要注意如果把font放在最后,其他会使用默认值,可能会覆盖掉前面的例如新学的行高在font中语法是 font:30px/40px &qu ...

【4】蛋白质组学鉴定软件之MSGFPlus

1.简介

2.安装运行

3.结果

【4】蛋白质组学鉴定软件之MSGFPlus的更多相关文章

随机推荐

热门专题