【2】蛋白鉴定软件之Comet
1.简介
官网:http://comet-ms.sourceforge.net/
- 1993年开发,持续更新,免费开源
- 适用Windows/Linux
- 多线程,支持多种输入输出格式:输入谱图文件(
mzXML, mzML, mgf, or ms2/cms2
),输出.pep.xml/.pin.xml/.sqt/.out
等文件
运行:
comet.exe input.mzXML
comet.exe input.mzML
comet.exe input.mgf
comet.exe input.ms2
comet.exe *.ms2 #支持多文件输入
其他整合了Comet的工具:
- Crux
- Bio Docker
- LabKey
- MASSyPup
- PatternLab
- ProHits
- SearchGUI
- PeptideShaker
- Trans-Proteomic Pipeline (TPP)
2.下载安装
下载UI界面版本:setup.exe.,用户指南:http://comet-ms.sourceforge.net/CometUI/CometUI-User-Guide.pdf
下载Linux版本:https://sourceforge.net/projects/comet-ms/files/
依然只试用Linux版本。
unzip comet_2019015.zip
3.软件使用
运行非常简单,软件后调用参数配置文件和谱图原始文件即可。
参数配置文件在官网解释得非常详细:Search parameters。同时针对不同质谱仪的一级和二级质量误差,官方提供了3个示例参数文件:
● comet.params.low-low 用于低一级和二级误差,如 ion trap
● comet.params.high-low 用于高一级误差和低二级误差,如Velos-Orbitrap
● comet.params.high-high 用于高一级和二级误差,如 Q Exactive 或 Q-Tof
以高分辨质谱仪为例,以下参数除了数据库设置,大部分参数默认即可:
# comet_version 2019.01 rev. 0
# Comet MS/MS search engine parameters file.
# Everything following the '#' symbol is treated as a comment.
database_name = /some/path/db.fasta
decoy_search = 0 # 0=no (default), 1=concatenated search, 2=separate search
peff_format = 0 # 0=no (normal fasta, default), 1=PEFF PSI-MOD, 2=PEFF Unimod
peff_obo = # path to PSI Mod or Unimod OBO file
num_threads = 0 # 0=poll CPU to set num threads; else specify num threads directly (max 128)
#
# masses
#
peptide_mass_tolerance = 20.00
peptide_mass_units = 2 # 0=amu, 1=mmu, 2=ppm
mass_type_parent = 1 # 0=average masses, 1=monoisotopic masses
mass_type_fragment = 1 # 0=average masses, 1=monoisotopic masses
precursor_tolerance_type = 1 # 0=MH+ (default), 1=precursor m/z; only valid for amu/mmu tolerances
isotope_error = 3 # 0=off, 1=0/1 (C13 error), 2=0/1/2, 3=0/1/2/3, 4=-8/-4/0/4/8 (for +4/+8 labeling)
#
# search enzyme
#
search_enzyme_number = 1 # choose from list at end of this params file
search_enzyme2_number = 0 # second enzyme; set to 0 if no second enzyme
num_enzyme_termini = 2 # 1 (semi-digested), 2 (fully digested, default), 8 C-term unspecific , 9 N-term unspecific
allowed_missed_cleavage = 2 # maximum value is 5; for enzyme search
#
# Up to 9 variable modifications are supported
# format: <mass> <residues> <0=variable/else binary> <max_mods_per_peptide> <term_distance> <n/c-term> <required> <neutral_loss>
# e.g. 79.966331 STY 0 3 -1 0 0 97.976896
#
variable_mod01 = 15.9949 M 0 3 -1 0 0 0.0
variable_mod02 = 0.0 X 0 3 -1 0 0 0.0
variable_mod03 = 0.0 X 0 3 -1 0 0 0.0
variable_mod04 = 0.0 X 0 3 -1 0 0 0.0
variable_mod05 = 0.0 X 0 3 -1 0 0 0.0
variable_mod06 = 0.0 X 0 3 -1 0 0 0.0
variable_mod07 = 0.0 X 0 3 -1 0 0 0.0
variable_mod08 = 0.0 X 0 3 -1 0 0 0.0
variable_mod09 = 0.0 X 0 3 -1 0 0 0.0
max_variable_mods_in_peptide = 5
require_variable_mod = 0
#
# fragment ions
#
# ion trap ms/ms: 1.0005 tolerance, 0.4 offset (mono masses), theoretical_fragment_ions = 1
# high res ms/ms: 0.02 tolerance, 0.0 offset (mono masses), theoretical_fragment_ions = 0, spectrum_batch_size = 10000
#
fragment_bin_tol = 0.02 # binning to use on fragment ions
fragment_bin_offset = 0.0 # offset position to start the binning (0.0 to 1.0)
theoretical_fragment_ions = 0 # 0=use flanking peaks, 1=M peak only
use_A_ions = 0
use_B_ions = 1
use_C_ions = 0
use_X_ions = 0
use_Y_ions = 1
use_Z_ions = 0
use_NL_ions = 0 # 0=no, 1=yes to consider NH3/H2O neutral loss peaks
#
# output
#
output_sqtstream = 0 # 0=no, 1=yes write sqt to standard output
output_sqtfile = 0 # 0=no, 1=yes write sqt file
output_txtfile = 0 # 0=no, 1=yes write tab-delimited txt file
output_pepxmlfile = 1 # 0=no, 1=yes write pep.xml file
output_percolatorfile = 0 # 0=no, 1=yes write Percolator tab-delimited input file
print_expect_score = 1 # 0=no, 1=yes to replace Sp with expect in out & sqt
num_output_lines = 5 # num peptide results to show
show_fragment_ions = 0 # 0=no, 1=yes for out files only
sample_enzyme_number = 1 # Sample enzyme which is possibly different than the one applied to the search.
# Used to calculate NTT & NMC in pepXML output (default=1 for trypsin).
#
# mzXML parameters
#
scan_range = 0 0 # start and end scan range to search; either entry can be set independently
precursor_charge = 0 0 # precursor charge range to analyze; does not override any existing charge; 0 as 1st entry ignores parameter
override_charge = 0 # 0=no, 1=override precursor charge states, 2=ignore precursor charges outside precursor_charge range, 3=see online
ms_level = 2 # MS level to analyze, valid are levels 2 (default) or 3
activation_method = ALL # activation method; used if activation method set; allowed ALL, CID, ECD, ETD, ETD+SA, PQD, HCD, IRMPD
#
# misc parameters
#
digest_mass_range = 600.0 5000.0 # MH+ peptide mass range to analyze
peptide_length_range = 5 63 # minimum and maximum peptide length to analyze (default 1 63; max length 63)
num_results = 100 # number of search hits to store internally
max_duplicate_proteins = 20 # maximum number of protein names to report for each peptide identification; -1 reports all duplicates
skip_researching = 1 # for '.out' file output only, 0=search everything again (default), 1=don't search if .out exists
max_fragment_charge = 3 # set maximum fragment charge state to analyze (allowed max 5)
max_precursor_charge = 6 # set maximum precursor charge state to analyze (allowed max 9)
nucleotide_reading_frame = 0 # 0=proteinDB, 1-6, 7=forward three, 8=reverse three, 9=all six
clip_nterm_methionine = 0 # 0=leave sequences as-is; 1=also consider sequence w/o N-term methionine
spectrum_batch_size = 15000 # max. # of spectra to search at a time; 0 to search the entire scan range in one loop
decoy_prefix = DECOY_ # decoy entries are denoted by this string which is pre-pended to each protein accession
equal_I_and_L = 1 # 0=treat I and L as different; 1=treat I and L as same
output_suffix = # add a suffix to output base names i.e. suffix "-C" generates base-C.pep.xml from base.mzXML input
mass_offsets = # one or more mass offsets to search (values substracted from deconvoluted precursor mass)
precursor_NL_ions = # one or more precursor neutral loss masses, will be added to xcorr analysis
#
# spectral processing
#
minimum_peaks = 10 # required minimum number of peaks in spectrum to search (default 10)
minimum_intensity = 0 # minimum intensity value to read in
remove_precursor_peak = 0 # 0=no, 1=yes, 2=all charge reduced precursor peaks (for ETD), 3=phosphate neutral loss peaks
remove_precursor_tolerance = 1.5 # +- Da tolerance for precursor removal
clear_mz_range = 0.0 0.0 # for iTRAQ/TMT type data; will clear out all peaks in the specified m/z range
#
# additional modifications
#
add_Cterm_peptide = 0.0
add_Nterm_peptide = 0.0
add_Cterm_protein = 0.0
add_Nterm_protein = 0.0
add_G_glycine = 0.0000 # added to G - avg. 57.0513, mono. 57.02146
add_A_alanine = 0.0000 # added to A - avg. 71.0779, mono. 71.03711
add_S_serine = 0.0000 # added to S - avg. 87.0773, mono. 87.03203
add_P_proline = 0.0000 # added to P - avg. 97.1152, mono. 97.05276
add_V_valine = 0.0000 # added to V - avg. 99.1311, mono. 99.06841
add_T_threonine = 0.0000 # added to T - avg. 101.1038, mono. 101.04768
add_C_cysteine = 57.021464 # added to C - avg. 103.1429, mono. 103.00918
add_L_leucine = 0.0000 # added to L - avg. 113.1576, mono. 113.08406
add_I_isoleucine = 0.0000 # added to I - avg. 113.1576, mono. 113.08406
add_N_asparagine = 0.0000 # added to N - avg. 114.1026, mono. 114.04293
add_D_aspartic_acid = 0.0000 # added to D - avg. 115.0874, mono. 115.02694
add_Q_glutamine = 0.0000 # added to Q - avg. 128.1292, mono. 128.05858
add_K_lysine = 0.0000 # added to K - avg. 128.1723, mono. 128.09496
add_E_glutamic_acid = 0.0000 # added to E - avg. 129.1140, mono. 129.04259
add_M_methionine = 0.0000 # added to M - avg. 131.1961, mono. 131.04048
add_O_ornithine = 0.0000 # added to O - avg. 132.1610, mono 132.08988
add_H_histidine = 0.0000 # added to H - avg. 137.1393, mono. 137.05891
add_F_phenylalanine = 0.0000 # added to F - avg. 147.1739, mono. 147.06841
add_U_selenocysteine = 0.0000 # added to U - avg. 150.0379, mono. 150.95363
add_R_arginine = 0.0000 # added to R - avg. 156.1857, mono. 156.10111
add_Y_tyrosine = 0.0000 # added to Y - avg. 163.0633, mono. 163.06333
add_W_tryptophan = 0.0000 # added to W - avg. 186.0793, mono. 186.07931
add_B_user_amino_acid = 0.0000 # added to B - avg. 0.0000, mono. 0.00000
add_J_user_amino_acid = 0.0000 # added to J - avg. 0.0000, mono. 0.00000
add_X_user_amino_acid = 0.0000 # added to X - avg. 0.0000, mono. 0.00000
add_Z_user_amino_acid = 0.0000 # added to Z - avg. 0.0000, mono. 0.00000
#
# COMET_ENZYME_INFO _must_ be at the end of this parameters file
#
[COMET_ENZYME_INFO]
0. No_enzyme 0 - -
1. Trypsin 1 KR P
2. Trypsin/P 1 KR -
3. Lys_C 1 K P
4. Lys_N 0 K -
5. Arg_C 1 R P
6. Asp_N 0 D -
7. CNBr 1 M -
8. Glu_C 1 DE P
9. PepsinA 1 FL P
10. Chymotrypsin 1 FWYL P
一般设置数据库database_name
,线程数num_threads
,特异性酶search_enzyme_number = 1
。(如果是多肽组学,设置为非特异性酶search_enzyme_number = 0
)
运行命令:
comet.2019015.linux.exe -P./comet.params.high-high test_1.mzML
谱图文件支持mzXML, mzML, mgf, or ms2/cms2
等多种格式,obitrap的高分辨质谱仪(.raw)需要转化。关于Linux上质谱原始数据的格式转化,可参考博文:【ThermoRawFileParser】质谱raw格式转换mgf(-f参数设为1即可得到mzML格式)。
4.结果
运行结果会出现`test_1.pep.xml,test_1.pin,test_1.txt等文件。主要看txt文件,即为鉴定结果:
第一行:
CometVersion 2019.01 rev. 5 test_1 07/28/2020, 02:12:23 PM /path/to/database/test.fasta
结果表头:
1 scan
2 num
3 charge
4 exp_neutral_mass
5 calc_neutral_mass
6 e-value
7 xcorr
8 delta_cn
9 sp_score
10 ions_matched
11 ions_total
12 plain_peptide
13 modified_peptide
14 prev_aa
15 next_aa
16 protein
17 protein_count
18 modifications
一般也要根据需要,进行后处理。
蛋白质组学鉴定定量系列软件总结:
【1】蛋白鉴定软件之X!Tandem
【2】蛋白鉴定软件之Comet
【3】蛋白鉴定软件之Mascot
【4】蛋白质组学鉴定软件之MSGFPlus
【5】蛋白质组学鉴定定量软件之PD
【6】蛋白质组学鉴定定量软件之MaxQuant
【2】蛋白鉴定软件之Comet的更多相关文章
- 【3】蛋白鉴定软件之Mascot
目录 1.简介 2.配置 2.1在线版本 2.2 服务器版本 3.运行 3.1 在线版本 3.2 服务器版本 4.结果 1.简介 Mascot是非常经典的蛋白鉴定软件,被Frost & Sul ...
- 【1】蛋白鉴定软件之X!Tandem
目录 1. 简介 2.下载安装 3. 软件试用 4. 结果 5. FAQ 1. 简介 X!Tandem是GPM:The Global Proteome Machine(主要基于Web的开源用户界面,用 ...
- 【4】蛋白质组学鉴定软件之MSGFPlus
目录 1.简介 2.安装运行 3.结果 1.简介 MSGF+也是近年来应用得比较多的蛋白鉴定软件.java写的,2008年初次发表JPR,2014年升级发表NC,免费开源,持续更新维护,良心软件.而且 ...
- 【6】蛋白质组学鉴定定量软件之MaxQuant
目录 1.简介 2.下载安装 3.配置与运行 4.结果 5.Perseus后处理 6.小结 1.简介 2016年,德国马普所的Cox和蛋白质组学领域巨擘Matthias Mann合作开发了MaxQua ...
- 【5】蛋白质组学鉴定定量软件之PD
目录 1.简介 2.安装与配置 3.分析流程 4.结果 1.简介 PD全称Proteome Discoverer,是ThermoFisher在2008年推出的商业Windows软件,没错,收费,还不菲 ...
- MCP|MZL|Accurate Estimation of Context- Dependent False Discovery Rates in Top- Down Proteomics 在自顶向下蛋白组学中精确设定评估条件估计假阳性
一. 概述: 自顶向下的蛋白质组学技术近年来也发展成为高通量蛋白定性定量手段.该技术可以在一次的实验中定性上千种蛋白,然而缺乏一个可靠的假阳性控制方法阻碍了该技术的发展.在大规模流程化的假阳性控制手段 ...
- 【宏蛋白组】iMetaLab平台分析肠道宏蛋白质组数据
目录 一.iMetaLab简介 二.内置工具与模块 1. Data Processing module 2. Functional Analysis 3. R Developing environme ...
- Journal of Proteomics Research | 自动的、可重复的免疫多肽数据分析流程MHCquant
题目:MHCquant: Automated and reproducible data analysis for immunopeptidomics 期刊:Journal of Proteome R ...
- 解读人:李思奇,Development of a sensitive, scalable method for spatial, cell-type-resolved proteomics of the human brain. (一种用于研究人类大脑基于空间或细胞类型的蛋白质组学的灵敏方法)
发表时间:(2019年4月) 一. 概述: 本文报道了一种可研究人类大脑组织中特定神经细胞的蛋白质组学的方法.作者通过激光捕获显微切割技术(LCM)从逝者大脑中分离出目的神经元细胞,接着尝试了一系列不 ...
随机推荐
- [Beta]the Agiles Scrum Meeting 5
会议时间:2020.5.19 20:00 1.每个人的工作 今天已完成的工作 成员 已完成的工作 issue yjy 为评测机增加更多评测指标 评测部分增加更多评测指标 tq 为评测机增加更多评测指标 ...
- 人人都写过的5个Bug!
大家好,我是良许. 计算机专业的小伙伴,在学校期间一定学过 C 语言.它是众多高级语言的鼻祖,深入学习这门语言会对计算机原理.操作系统.内存管理等等底层相关的知识会有更深入的了解,所以我在直播的时候, ...
- 三极管和MOS管驱动电路的正确用法
1 三极管和MOS管的基本特性 三极管是电流控制电流器件,用基极电流的变化控制集电极电流的变化.有NPN型三极管(简称P型三极管)和PNP型三极管(简称N型三极管)两种,符号如下: MOS管是电压控制 ...
- 有了 HTTP 协议,为什么还需要 Websocket?
WebSocket 是一种基于 TCP 连接上进行全双工通信的协议,相对于 HTTP 这种非持久的协议来说,WebSocket 是一个持久化网络通信的协议. 它不仅可以实现客户端请求服务器,同时可以允 ...
- 广域网(ppp协议、HDLC协议)
文章转自:https://blog.csdn.net/weixin_43914604/article/details/105028759 学习课程:<2019王道考研计算机网络> 学习目的 ...
- 碰撞的蚂蚁 牛客网 程序员面试金典 C++ Java Python
碰撞的蚂蚁 牛客网 程序员面试金典 C++ Java Python 题目描述 在n个顶点的多边形上有n只蚂蚁,这些蚂蚁同时开始沿着多边形的边爬行,请求出这些蚂蚁相撞的概率.(这里的相撞是指存在任意两只 ...
- openstack 虚机热迁移问题:虚机状态一直处于迁移中的情况处理
前提:在偶尔的虚机热迁移中,发现虚机一直属于迁移状态中. 但是查看后台流量监控,发现没有流量已经下来了.然后在目标机器上查看,发现kvm已经在目标机器上. 1.查看kvm 实际所处宿主机方法: a.拿 ...
- js-arguments 函数参数对象详解
前言 JavaScript 函数参数不同于其他编程语言,既不需要规定参数的类型,也不需要关心参数的个数,因此 JavaScript 因函数参数而变得十分灵活,本文总结一下 arguments 参数对象 ...
- .NET 开源工作流: Slickflow流程引擎高级开发(九) -- 条件事件模式解释及应用
前言:在流程流转过程中,有时候需要条件模式的支持,这样可以使得流程流转更加灵活多变.比如在业务变量满足一定的条件时,可以启动特定配置的流程(或者位于主流程内部的子流程).本文主要描述条件启动和条件中间 ...
- uni-app视频组件设置圆角
无法实现,建议写个image在中间位置加个播放按钮,点击播放跳转新页面只需要在跳转参数里面把视频链接加上,在onLoad里面获取视频链接,自动播放视频,很多app目前都是这样做的,关闭页面后视频会自动 ...