1.简介

官网：http://comet-ms.sourceforge.net/

1993年开发，持续更新，免费开源
适用Windows/Linux
多线程，支持多种输入输出格式：输入谱图文件（mzXML, mzML, mgf, or ms2/cms2），输出.pep.xml/.pin.xml/.sqt/.out等文件

运行：

comet.exe input.mzXML

comet.exe input.mzML

comet.exe input.mgf

comet.exe input.ms2

comet.exe *.ms2   #支持多文件输入

其他整合了Comet的工具：

2.下载安装

下载UI界面版本：setup.exe.，用户指南：http://comet-ms.sourceforge.net/CometUI/CometUI-User-Guide.pdf

下载Linux版本：https://sourceforge.net/projects/comet-ms/files/

依然只试用Linux版本。

unzip  comet_2019015.zip

3.软件使用

运行非常简单，软件后调用参数配置文件和谱图原始文件即可。

参数配置文件在官网解释得非常详细：Search parameters。同时针对不同质谱仪的一级和二级质量误差，官方提供了3个示例参数文件：

●  comet.params.low-low 用于低一级和二级误差，如 ion trap

●  comet.params.high-low 用于高一级误差和低二级误差，如Velos-Orbitrap

●  comet.params.high-high 用于高一级和二级误差，如 Q Exactive 或 Q-Tof

以高分辨质谱仪为例，以下参数除了数据库设置，大部分参数默认即可：

# comet_version 2019.01 rev. 0

# Comet MS/MS search engine parameters file.

# Everything following the '#' symbol is treated as a comment.

database_name = /some/path/db.fasta

decoy_search = 0                       # 0=no (default), 1=concatenated search, 2=separate search

peff_format = 0                        # 0=no (normal fasta, default), 1=PEFF PSI-MOD, 2=PEFF Unimod

peff_obo =                             # path to PSI Mod or Unimod OBO file

num_threads = 0                        # 0=poll CPU to set num threads; else specify num threads directly (max 128)

#

# masses

#

peptide_mass_tolerance = 20.00

peptide_mass_units = 2                 # 0=amu, 1=mmu, 2=ppm

mass_type_parent = 1                   # 0=average masses, 1=monoisotopic masses

mass_type_fragment = 1                 # 0=average masses, 1=monoisotopic masses

precursor_tolerance_type = 1           # 0=MH+ (default), 1=precursor m/z; only valid for amu/mmu tolerances

isotope_error = 3                      # 0=off, 1=0/1 (C13 error), 2=0/1/2, 3=0/1/2/3, 4=-8/-4/0/4/8 (for +4/+8 labeling)

#

# search enzyme

#

search_enzyme_number = 1               # choose from list at end of this params file

search_enzyme2_number = 0              # second enzyme; set to 0 if no second enzyme

num_enzyme_termini = 2                 # 1 (semi-digested), 2 (fully digested, default), 8 C-term unspecific , 9 N-term unspecific

allowed_missed_cleavage = 2            # maximum value is 5; for enzyme search

#

# Up to 9 variable modifications are supported

# format:  <mass> <residues> <0=variable/else binary> <max_mods_per_peptide> <term_distance> <n/c-term> <required> <neutral_loss>

#     e.g. 79.966331 STY 0 3 -1 0 0 97.976896

#

variable_mod01 = 15.9949 M 0 3 -1 0 0 0.0

variable_mod02 = 0.0 X 0 3 -1 0 0 0.0

variable_mod03 = 0.0 X 0 3 -1 0 0 0.0

variable_mod04 = 0.0 X 0 3 -1 0 0 0.0

variable_mod05 = 0.0 X 0 3 -1 0 0 0.0

variable_mod06 = 0.0 X 0 3 -1 0 0 0.0

variable_mod07 = 0.0 X 0 3 -1 0 0 0.0

variable_mod08 = 0.0 X 0 3 -1 0 0 0.0

variable_mod09 = 0.0 X 0 3 -1 0 0 0.0

max_variable_mods_in_peptide = 5

require_variable_mod = 0

#

# fragment ions

#

# ion trap ms/ms:  1.0005 tolerance, 0.4 offset (mono masses), theoretical_fragment_ions = 1

# high res ms/ms:    0.02 tolerance, 0.0 offset (mono masses), theoretical_fragment_ions = 0, spectrum_batch_size = 10000

#

fragment_bin_tol = 0.02                # binning to use on fragment ions

fragment_bin_offset = 0.0              # offset position to start the binning (0.0 to 1.0)

theoretical_fragment_ions = 0          # 0=use flanking peaks, 1=M peak only

use_A_ions = 0

use_B_ions = 1

use_C_ions = 0

use_X_ions = 0

use_Y_ions = 1

use_Z_ions = 0

use_NL_ions = 0                        # 0=no, 1=yes to consider NH3/H2O neutral loss peaks

#

# output

#

output_sqtstream = 0                   # 0=no, 1=yes  write sqt to standard output

output_sqtfile = 0                     # 0=no, 1=yes  write sqt file

output_txtfile = 0                     # 0=no, 1=yes  write tab-delimited txt file

output_pepxmlfile = 1                  # 0=no, 1=yes  write pep.xml file

output_percolatorfile = 0              # 0=no, 1=yes  write Percolator tab-delimited input file

print_expect_score = 1                 # 0=no, 1=yes to replace Sp with expect in out & sqt

num_output_lines = 5                   # num peptide results to show

show_fragment_ions = 0                 # 0=no, 1=yes for out files only

sample_enzyme_number = 1               # Sample enzyme which is possibly different than the one applied to the search.

                                       # Used to calculate NTT & NMC in pepXML output (default=1 for trypsin).

#

# mzXML parameters

#

scan_range = 0 0                       # start and end scan range to search; either entry can be set independently

precursor_charge = 0 0                 # precursor charge range to analyze; does not override any existing charge; 0 as 1st entry ignores parameter

override_charge = 0                    # 0=no, 1=override precursor charge states, 2=ignore precursor charges outside precursor_charge range, 3=see online

ms_level = 2                           # MS level to analyze, valid are levels 2 (default) or 3

activation_method = ALL                # activation method; used if activation method set; allowed ALL, CID, ECD, ETD, ETD+SA, PQD, HCD, IRMPD

#

# misc parameters

#

digest_mass_range = 600.0 5000.0       # MH+ peptide mass range to analyze

peptide_length_range = 5 63            # minimum and maximum peptide length to analyze (default 1 63; max length 63)

num_results = 100                      # number of search hits to store internally

max_duplicate_proteins = 20            # maximum number of protein names to report for each peptide identification; -1 reports all duplicates

skip_researching = 1                   # for '.out' file output only, 0=search everything again (default), 1=don't search if .out exists

max_fragment_charge = 3                # set maximum fragment charge state to analyze (allowed max 5)

max_precursor_charge = 6               # set maximum precursor charge state to analyze (allowed max 9)

nucleotide_reading_frame = 0           # 0=proteinDB, 1-6, 7=forward three, 8=reverse three, 9=all six

clip_nterm_methionine = 0              # 0=leave sequences as-is; 1=also consider sequence w/o N-term methionine

spectrum_batch_size = 15000            # max. # of spectra to search at a time; 0 to search the entire scan range in one loop

decoy_prefix = DECOY_                  # decoy entries are denoted by this string which is pre-pended to each protein accession

equal_I_and_L = 1                      # 0=treat I and L as different; 1=treat I and L as same

output_suffix =                        # add a suffix to output base names i.e. suffix "-C" generates base-C.pep.xml from base.mzXML input

mass_offsets =                         # one or more mass offsets to search (values substracted from deconvoluted precursor mass)

precursor_NL_ions =                    # one or more precursor neutral loss masses, will be added to xcorr analysis

#

# spectral processing

#

minimum_peaks = 10                     # required minimum number of peaks in spectrum to search (default 10)

minimum_intensity = 0                  # minimum intensity value to read in

remove_precursor_peak = 0              # 0=no, 1=yes, 2=all charge reduced precursor peaks (for ETD), 3=phosphate neutral loss peaks

remove_precursor_tolerance = 1.5       # +- Da tolerance for precursor removal

clear_mz_range = 0.0 0.0               # for iTRAQ/TMT type data; will clear out all peaks in the specified m/z range

#

# additional modifications

#

add_Cterm_peptide = 0.0

add_Nterm_peptide = 0.0

add_Cterm_protein = 0.0

add_Nterm_protein = 0.0

add_G_glycine = 0.0000                 # added to G - avg.  57.0513, mono.  57.02146

add_A_alanine = 0.0000                 # added to A - avg.  71.0779, mono.  71.03711

add_S_serine = 0.0000                  # added to S - avg.  87.0773, mono.  87.03203

add_P_proline = 0.0000                 # added to P - avg.  97.1152, mono.  97.05276

add_V_valine = 0.0000                  # added to V - avg.  99.1311, mono.  99.06841

add_T_threonine = 0.0000               # added to T - avg. 101.1038, mono. 101.04768

add_C_cysteine = 57.021464             # added to C - avg. 103.1429, mono. 103.00918

add_L_leucine = 0.0000                 # added to L - avg. 113.1576, mono. 113.08406

add_I_isoleucine = 0.0000              # added to I - avg. 113.1576, mono. 113.08406

add_N_asparagine = 0.0000              # added to N - avg. 114.1026, mono. 114.04293

add_D_aspartic_acid = 0.0000           # added to D - avg. 115.0874, mono. 115.02694

add_Q_glutamine = 0.0000               # added to Q - avg. 128.1292, mono. 128.05858

add_K_lysine = 0.0000                  # added to K - avg. 128.1723, mono. 128.09496

add_E_glutamic_acid = 0.0000           # added to E - avg. 129.1140, mono. 129.04259

add_M_methionine = 0.0000              # added to M - avg. 131.1961, mono. 131.04048

add_O_ornithine = 0.0000               # added to O - avg. 132.1610, mono  132.08988

add_H_histidine = 0.0000               # added to H - avg. 137.1393, mono. 137.05891

add_F_phenylalanine = 0.0000           # added to F - avg. 147.1739, mono. 147.06841

add_U_selenocysteine = 0.0000          # added to U - avg. 150.0379, mono. 150.95363

add_R_arginine = 0.0000                # added to R - avg. 156.1857, mono. 156.10111

add_Y_tyrosine = 0.0000                # added to Y - avg. 163.0633, mono. 163.06333

add_W_tryptophan = 0.0000              # added to W - avg. 186.0793, mono. 186.07931

add_B_user_amino_acid = 0.0000         # added to B - avg.   0.0000, mono.   0.00000

add_J_user_amino_acid = 0.0000         # added to J - avg.   0.0000, mono.   0.00000

add_X_user_amino_acid = 0.0000         # added to X - avg.   0.0000, mono.   0.00000

add_Z_user_amino_acid = 0.0000         # added to Z - avg.   0.0000, mono.   0.00000

#

# COMET_ENZYME_INFO _must_ be at the end of this parameters file

#

[COMET_ENZYME_INFO]

0.  No_enzyme              0      -           -

1.  Trypsin                1      KR          P

2.  Trypsin/P              1      KR          -

3.  Lys_C                  1      K           P

4.  Lys_N                  0      K           -

5.  Arg_C                  1      R           P

6.  Asp_N                  0      D           -

7.  CNBr                   1      M           -

8.  Glu_C                  1      DE          P

9.  PepsinA                1      FL          P

10. Chymotrypsin           1      FWYL        P

一般设置数据库database_name，线程数num_threads，特异性酶search_enzyme_number = 1。（如果是多肽组学，设置为非特异性酶search_enzyme_number = 0）

运行命令：

comet.2019015.linux.exe -P./comet.params.high-high test_1.mzML

谱图文件支持mzXML, mzML, mgf, or ms2/cms2等多种格式，obitrap的高分辨质谱仪（.raw）需要转化。关于Linux上质谱原始数据的格式转化，可参考博文：【ThermoRawFileParser】质谱raw格式转换mgf（-f参数设为1即可得到mzML格式）。

4.结果

运行结果会出现`test_1.pep.xml，test_1.pin，test_1.txt等文件。主要看txt文件，即为鉴定结果：

第一行：

CometVersion 2019.01 rev. 5     test_1       07/28/2020, 02:12:23 PM  /path/to/database/test.fasta

结果表头：

      1 scan

      2 num

      3 charge

      4 exp_neutral_mass

      5 calc_neutral_mass

      6 e-value

      7 xcorr

      8 delta_cn

      9 sp_score

     10 ions_matched

     11 ions_total

     12 plain_peptide

     13 modified_peptide

     14 prev_aa

     15 next_aa

     16 protein

     17 protein_count

     18 modifications

一般也要根据需要，进行后处理。

蛋白质组学鉴定定量系列软件总结：

【1】蛋白鉴定软件之X!Tandem

【2】蛋白鉴定软件之Comet

【3】蛋白鉴定软件之Mascot

【4】蛋白质组学鉴定软件之MSGFPlus

【5】蛋白质组学鉴定定量软件之PD

【6】蛋白质组学鉴定定量软件之MaxQuant

【2】蛋白鉴定软件之Comet的更多相关文章

【3】蛋白鉴定软件之Mascot
目录 1.简介 2.配置 2.1在线版本 2.2 服务器版本 3.运行 3.1 在线版本 3.2 服务器版本 4.结果 1.简介 Mascot是非常经典的蛋白鉴定软件,被Frost & Sul ...
【1】蛋白鉴定软件之X!Tandem
目录 1. 简介 2.下载安装 3. 软件试用 4. 结果 5. FAQ 1. 简介 X!Tandem是GPM:The Global Proteome Machine(主要基于Web的开源用户界面,用 ...
【4】蛋白质组学鉴定软件之MSGFPlus
目录 1.简介 2.安装运行 3.结果 1.简介 MSGF+也是近年来应用得比较多的蛋白鉴定软件.java写的,2008年初次发表JPR,2014年升级发表NC,免费开源,持续更新维护,良心软件.而且 ...
【6】蛋白质组学鉴定定量软件之MaxQuant
目录 1.简介 2.下载安装 3.配置与运行 4.结果 5.Perseus后处理 6.小结 1.简介 2016年,德国马普所的Cox和蛋白质组学领域巨擘Matthias Mann合作开发了MaxQua ...
【5】蛋白质组学鉴定定量软件之PD
目录 1.简介 2.安装与配置 3.分析流程 4.结果 1.简介 PD全称Proteome Discoverer,是ThermoFisher在2008年推出的商业Windows软件,没错,收费,还不菲 ...
MCP|MZL|Accurate Estimation of Context- Dependent False Discovery Rates in Top- Down Proteomics 在自顶向下蛋白组学中精确设定评估条件估计假阳性
一. 概述: 自顶向下的蛋白质组学技术近年来也发展成为高通量蛋白定性定量手段.该技术可以在一次的实验中定性上千种蛋白,然而缺乏一个可靠的假阳性控制方法阻碍了该技术的发展.在大规模流程化的假阳性控制手段 ...
【宏蛋白组】iMetaLab平台分析肠道宏蛋白质组数据
目录一.iMetaLab简介二.内置工具与模块 1. Data Processing module 2. Functional Analysis 3. R Developing environme ...
Journal of Proteomics Research | 自动的、可重复的免疫多肽数据分析流程MHCquant
题目:MHCquant: Automated and reproducible data analysis for immunopeptidomics 期刊:Journal of Proteome R ...
解读人：李思奇，Development of a sensitive, scalable method for spatial, cell-type-resolved proteomics of the human brain. (一种用于研究人类大脑基于空间或细胞类型的蛋白质组学的灵敏方法)
发表时间:(2019年4月) 一. 概述: 本文报道了一种可研究人类大脑组织中特定神经细胞的蛋白质组学的方法.作者通过激光捕获显微切割技术(LCM)从逝者大脑中分离出目的神经元细胞,接着尝试了一系列不 ...

随机推荐

锚点布局anchorlayout在kv中的引用
from kivy.app import App from kivy.uix.anchorlayout import AnchorLayout from kivy.uix.button import ...
深入浅出Java内存模型
面试官:我记得上一次已经问过了为什么要有Java内存模型面试官:我记得你的最终答案是:Java为了屏蔽硬件和操作系统访问内存的各种差异,提出了「Java内存模型」的规范,保证了Java程序在各种平台 ...
Java：修饰符小记
Java:修饰符小记对 Java 中的修饰符,做一个微不足道的小小小小记 Java 语言提供了很多修饰符,大概分为两类: 访问权限修饰符非访问权限修饰符访问权限修饰符修饰符说明 publi ...
MarkDown学习随笔
MarkDown语法的学习标题设置标题方法是在前面加#号,一级标题(最大)是加#+空格 ,二级标题是加##+空格,之后的以此类推. 字体在文本的前后分别加上一个星号表示斜体字在文本的前后分 ...
『学了就忘』Linux基础命令 — 19、目录操作的相关命令
目录 1.ls命令 2.cd命令 (1)绝对路径和相对路径 (2)cd命令的简化用法 3.pwd命令 4.mkdir命令 5.rmdir命令常用目录操作的相关命令: ls命令 cd命令 pwd命令 ...
DeWeb第1个通用化模块：登录模块，仅需要修改一个配置文件即可实现登录功能
演示: https://delphibbs.com/login.dw 开发环境和源代码 https://gitee.com/xamh/dewebsdk 效果图: 配置方法: 在Runtime目录中放一 ...
【Java】 List和Array转换
List转Array toArray 首先展示初学者容易犯的错误示例 List<String> strList = new ArrayList<>(); strList.add ...
[linux]centos7.4部署django+Uwsgi+Nginx
前言:我已经写了几个接口用来部署在服务器上的,首先选择django+Uwsgi+Nginx因为配置简单,比较符合python的简单操作功能强大的特点然后对于django的一些版本在之前的文章写了参 ...
Python基础（__slots__）
class Point(object): __slots__ = ('name','point') p1 = Point() p1.name = 100 print(p1.name)#100 #p1. ...
【linux系统】命令学习（八）bash 编程实战学习
常见shell : bash sh zsh windows: git bash cygwin MAC : terminal iterm netstat 是linux下用于显示网络状态的命令.通 ...

【2】蛋白鉴定软件之Comet