EasyProt — An easy-to-use graphical platform for proteomics data analysis
Graphical abstract
Highlights
► Fully graphical, easy-to-use and multi-user platform for proteomics data analysis from MS/MS data ► Protein identification featuring on-the-fly false positives removal ► Protein quantification through both isobaric tagging and label-free methods ► Characterization of unexpected post-translational modifications ► Peak lists conversion and processing
Introduction
In the field of proteomics, a wide number of software exists [1] to perform the analysis of mass spectrometry-based proteomics data. The number of software available can be overwhelming and even confusing in regard to which one to use. Which tool would be most appropriate to solve a given problem? There is no obvious answer to this simple question. With the omnipresence of the Internet, several portals, such as ExPASy (http://expasy.org) for instance, are now featuring a vast array of tools that can potentially be used. However, the selection remains difficult, especially given the lack of fully integrated and easy-to-use software available to perform the entire proteomic data analysis workflow. Given the diversity of available software and their specificities, there is a need for an integrated, easy-to-use software platform. Unfortunately, creating a full-fledged pipeline by connecting the different pieces of software together is a daunting task for anyone but programmers, considering such a project certainly involves programming. Developing graphical interfaces to existing command line tools is a non-trivial and time consuming task which requires extensive programming skills.
Software such as the Trans-Proteomic Pipeline [2], the Computational Proteomics Analysis System [3], the Systems Biology Experiment Analysis Management System (http://www.sbeams.org/Proteomics/) or the Virtual Expert Mass Spectrometrist [4] are obvious candidates, but while being greatly customizable, they generally require nontrivial configuration work as well as various external dependencies to work properly. EasyProt on the other hand is a fully integrated solution in which underlying technologies (such as the search engine) and their complexities are integrated in the platform and thus not visible to the end user. Compared with other web-based software, EasyProt offers a more modern and dynamic web interface as well some unique features such as on-the-fly false discovery rate (FDR) computation and fully integrated isobaric and label-free quantification processing.
Our goal was to develop a software platform that would fulfill the needs of scientists in the field, while emphasizing ease-of-use for non-bioinformatician users. To accomplish this objective, we worked in close collaboration with researchers from proteomic laboratories, starting from a basic protein identification workflow, while incrementally adding new features over time, such as exports, visualizers and quantification pipelines. The EasyProt platform covers the whole workflow from proprietary data file formats produced by mass-spectrometers, to identification and quantification results, ready to be analyzed by researchers and scientists with various backgrounds. EasyProt is implemented in the Java language and is structured around two distinct graphical applications.
The first one, EasyprotConv, is a standalone desktop application to process mass spectrometers' proprietary data formats. EasyprotConv features peak list processing such as precursor-ion isotope correction [5], spectra filtering, charge state deconvolution and low collision energy — higher collision energy spectra merging for isobaric relative peptide quantification with Orbitrap-hybrid instruments (LTQ-OT) [6]. Through the use of Superhirn [7], EasyprotConv performs label-free [8] processing, such as peak detection, liquid chromatography alignment and feature map normalization.
The second component, EasyProt, is a multi-user web application implementing peptide and protein identification through Olav [9], [10], unexpected post-translational modification identification with Popitam [11], isobaric quantification (TMT [12] and iTRAQ [13]) with IsoQuant and Isobar [14], label-free quantification, and several viewers and exports. A particularity of EasyProt resides in how one can set the false discovery rate [15] on-the-fly, after the identification search has completed rather than at submission time. When dealing with multiple identification searches spread across multiple files (e.g., various peptide fractionation methods), EasyProt transparently merges all results from several searches into a single result that can then be exported.
Both EasyProt and EasyprotConv are freely available to academic institutions at the following web address: http://easyprot.unige.ch. This website also features video tutorials showing how to use EasyProt to perform common tasks such as protein identification and quantification.
To illustrate the EasyProt platform, two identification and quantification workflows are presented. The first one is based on a labeled quantification method by isobaric tagging, while the second one is based on a label-free quantification approach. Both quantification workflows were conducted and validated against the same samples from ProteoRed Multicentric Proteomic Experiment 2009 PME5 (http://www.proteored.org).
Section snippets
Sample preparation
Our laboratory participated in the ProteoRed multicentric experiment 2009 PME5 initiative and received two identical complex protein mixtures, labeled A and B, containing each 100 μg of total protein. The mixture consisted of a single ion-exchange chromatographic fraction of a soluble Escherichia coli digest. Four mammalian proteins, cytochrome C, apomyoglobin, aldolase and serum albumin, were spiked at different concentration levels into samples A and B. For the TMT sample preparation, off-gel
Sofware architecture
The EasyProt platform is structured around two distinct applications: EasyprotConv and EasyProt. The reason the software platform is divided into two parts is twofold. First, it allows users to perform conversions on their own workstations if desired, and second, it decreases the load on the server in case of heavy data processing, such as when performing searches on large peak lists with several variable modifications, or when performing label-free analysis on numerous data sets.
Results
To illustrate the versatility of the EasyProt platform, two quantification workflows, one by 6-plex Tandem Mass Tags (TMT [22]) and one by label-free, were used on data from the “ProteoRed multicentric experiment 2009 (PME5)” study (http://www.proteored.org). These workflows were entirely performed with EasyProt, starting from the processing of RAW files available post-acquisition, to the end result featuring protein expressions in the form of Excel sheets.
The PME5 ProteoRed study consisted of
Discussions
The quantification results obtained in the analysis of both isobaric and label-free methods with EasyProt were very conclusive since the ratios were close to the theoretical ones. However, our label-free and isobaric workflows followed different strategies. The former is based on an iterative filtering approach that tries to incrementally reduce the list of potential candidates while minimizing the number of false positives. Given the flexibility of our label-free workflow architecture with its
Conclusions
The EasyProt platform was successfully used for the identification and quantification of proteins using two different quantification methods: isobaric tagging with TMT, and label-free. Both quantification workflows were able to quantify the four spiked proteins from ProteoRed Multicentric Experiment 2009 with ratios close to the theoretical ones. During the whole process, from data pre-processing to identification and quantification, every single step was easily performed using EasyProt's
Acknowledgment
We thank everyone at the Biomedical Proteomics Research Group (BPRG) at the University of Geneva, Switzerland, particularly to Virginie Licker, Dr. Natacha Turck, and Dr. Priscille Giron. Likewise, we thank everyone at the Geneva Bioinformatics SA, Switzerland for their valuable input and contribution, especially to Dr. Alexandre Masselot, Dr. Nicolas Budin, and Dr. Pierre-Alain Binz. In addition, we thank the ProteoRed consortium for their contribution.
References (22)
- et al.
Combining low- and high-energy tandem mass spectra for optimized peptide quantification with isobaric tags
J Proteomics
(2010) - et al.
Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents
Mol Cell Proteomics
(2004) - et al.
Absolute quantification of proteins by LCMSE: a virtue of parallel MS acquisition
Mol Cell Proteomics
(2006) - et al.
A face in the crowd: recognizing peptides through database search
Mol Cell Proteomics
(2011) - et al.
A uniform proteomics MS/MS analysis platform utilizing open XML file formats
Mol Syst Biol
(2005) - et al.
Computational Proteomics Analysis System (CPAS): an extensible, open-source analytic system for evaluating and publishing proteomic data and high throughput biological experiments
J Proteome Res
(2006) Virtual Expert Mass Spectrometrist v3.0: an integrated tool for proteome analysis
Methods Mol Biol
(2007)- et al.
Increasing information from shotgun proteomic data by accounting for misassigned precursor ion masses
Proteomics
(2008) - et al.
SuperHirn — a novel tool for high resolution LC–MS-based peptide/protein profiling
Proteomics
(2007) - et al.
Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards
Anal Chem
(2003)
OLAV: towards high-throughput tandem mass spectrometry data identification
Proteomics
Cited by (60)
Baldur: Bayesian Hierarchical Modeling for Label-Free Proteomics with Gamma Regressing Mean-Variance Trends
2023, Molecular and Cellular ProteomicsPlacental growth factor regulates the pentose phosphate pathway and antioxidant defense systems in human retinal endothelial cells
2020, Journal of ProteomicsCitation Excerpt :Data analysis was performed on Proteome Discoverer 2.4 (Thermo Fisher Scientific) using Sequest and Mascot search engines and the MaxQuant tool program (http://www.maxquant.org) [12]. The data were searched against the National Center for Biotechnology Information (NCBI) human reference sequence (NCBI RefSeq) protein database containing 71,644 protein sequences (https://www.ncbi.nlm.nih.gov/refseq/) [19]. The fixed modification included in the search parameters were the carbamidomethylation of cysteine, TMT 10-plex labels at the N-terminal of the peptide (229.16 Da), and a lysine side chain (229.16 Da).
Data for Tandem Mass Tag (TMT) proteomic analysis of the pancreas during the early phase of experimental pancreatitis
2018, Data in BriefCitation Excerpt :A total of 12 raw files were obtained. Peak lists were generated into.mgf format with EasyProtConv, and CID/HCD merging was used to improve peptide identification and quantification [7]. The resulting 12.
A tandem mass tag (TMT) proteomic analysis during the early phase of experimental pancreatitis reveals new insights in the disease pathogenesis
2018, Journal of ProteomicsCitation Excerpt :The technical efficiency of the TMT6 experiments was assessed by the peptide labeling rate and the peptide relative intensity distribution of LACB among the 6 tags. Only proteins with at least two unique peptide sequences and a false discovery rate (FDR) ≤ 1% [27] were selected for further quantification. Proteins were clustered based on shared peptides indistinguishable by MS. Quantification was conducted using Isobar R package v.1.9.3.2 [29].
Quantitative shotgun proteomics unveils candidate novel esophageal adenocarcinoma (EAC)-specific proteins
2017, Molecular and Cellular Proteomics
- 1
Present address: Queensland Institute of Medical Research, Brisbane, Australia.
- 2
Present address: Proteomics and Metabolomics Core, Nestlé Institute of Health Sciences, Lausanne, Switzerland.