|
|
|
Proteomics Platform
Bioinformatics Unit
Mass spectrometry-based proteomics presents a daunting bioinformatics challenge. Data, whether from a few or several hundred samples, must be stored, analyzed and annotated in a way that is secure, scalable and flexible. The GQ Proteomics Platform at the McGill University and Genome Quebec Innovation Centre provides this functionality through CellMapBase, an Oracle relational database. The CellMapBase application, developed by Robert Kearney as part of the CellMap project led by John Bergeron at McGill University, is now available to all Platform clientele.
The GQ Proteomics Platform uses an automated data processing and storage pipeline that captures and permanently stores important MS parameters, such as ion intensity and retention time, for downstream analysis and data mining. Peptides and proteins are identified using Mascot (www.matrixscience.com) and results are stored and accessed through a secure web-based interface (https://portal.proteomics.mcgill.ca). Since raw data are permanently archived, it is possible to perform new searches using different search parameters, including the presence of post-translational modifications such as phosphorylation, newly annotated databases, or different taxonomies and protein databases.
CellMapBase allows the researcher to
- Analyse large data sets.
- Eliminate redundancy in Mascot results resulting from multiple database entries.
- Generate results in a format that conforms to standards established by MIAPE (Minimal Information About a Proteomics Experiment) and the Human Proteome Organization’s (HUPO) Proteomics Standard Initiative (PSI).
- Track samples through all steps from submission to protein annotation and to maintain a complete, accurate and permanent record of methods, data and analysis protocols.
- Compare results from different data sets.
A key feature of CellMapBase is a tool that generates a minimal list of proteins that explains all the peptides identified by Mascot for any experiment with 95% confidence. Because CellMapBase tracks associations between spectra, peptide sequence and identified proteins, the researcher can also obtain relative protein quantification data by either spectral counts (Blondeau et al, Proc Natl Acad Sci USA. 2004 March 16;101(11):3833-8; Liu et al., Anal Chem. 2004 Jul 15;76(14):4193-201; Gilchrist et al., Cell, 2006 December 15;127(6):1265-81). An additional quantification method based on the measurement of precursor ion intensity is also being implemented.
Annotated protein lists created by CellMapBase can include the following information:
- Protein names and information on relative abundance.
- References to major protein databases (UniProt, RefSeq, IPI, NCBI nr, ENSEMBL).
- Predicted protein functions, including InterPro matches, GO terms (optionally summarized by user-defined “GO Slim”).
The researcher, using CellMapBase, can also make proteomics data publicly available by directly uploading it into the Pride database at the European Bioinformatics Institute (EBI), a central, international repository of proteomics data.
Access to data generated by the Proteomics Platform relies on two levels of security. First, web-based viewing of Mascot and CellMapBase results is login and password secure. Secondly, each sample is associated with a “project identifier number” that allows for controlled access to data and data analysis results. Researchers can restrict who has access to their project data and can grant permissions for different activities (e.g. read-only). Where a higher level of security is required, CellMapBase can be modified so that automatic annotation information from a project is shared while protein identification and other associated information remain private.
This integrated bioinformatics pipeline relies on state-of the-art 30 terabyte mass storage facilities coupled to robust, automated data back up and archiving located at McGill University. |