|
|
|
Proteomics Platform
Bioinformatics and Data Analysis
Mass spectrometry-based proteomics presents a daunting bioinformatics challenge. Data, whether from a few or several hundred samples, must be stored, analyzed and annotated in a way that is secure, scalable and flexible. The Genome Québec Proteomics Platform at the McGill University and Genome Québec Innovation Centre provides this functionality through CellMapBase, an Oracle relational database. The CellMapBase application, developed by Professor Robert Kearney as part of the CellMap project led by Dr John Bergeron at McGill University, is now available to all Platform clientele.
The Genome Québec Proteomics Platform uses an automated data processing and storage pipeline that captures and permanently stores important MS parameters, such as ion intensity and retention time, for downstream analysis and data mining. Peptides and proteins are identified using Mascot (www.matrixscience.com) and results are stored and accessed through a secure web-based interface (https://portal.proteomics.mcgill.ca). Since raw data are permanently archived, it is possible to perform new searches using different search parameters, including the screening for post-translational modifications such as phosphorylation, newly annotated protein databases, or different taxonomies.
CellMapBase allows the scientists to:
- Analyse large data sets.
- Eliminate redundancy in protein identification resulting from multiple database entries.
- Generate results in a format that conforms to standards established by MIAPE (Minimal Information About a Proteomics Experiment) and PSI (Proteomics Standard Initiative) of HUPO (Human Proteome Organization’s).
- Track samples through all steps from submission to protein annotation and to maintain a complete, accurate and permanent record of methods, data and analysis protocols.
- Compare results from different data sets.
A key feature of CellMapBase is a tool that generates a condensed list of proteins that is supported by peptides identified 95% confidence level by Mascot for any experiment. Because CellMapBase tracks associations between spectra, peptide sequence and identified proteins, the researcher can also obtain relative protein quantification data by redundant peptide counting (Blondeau et al, Proc. Natl Acad. Sci. USA (2004) 101:3833-8; Liu et al., Anal. Chem. (2004) 76:4193-01; Gilchrist et al., Cell (2006) 127:1265-81). An additional quantification method based on the measurement of precursor ion intensity is also being implemented.
Annotated protein lists created by CellMapBase can include the following information:
- Protein names and information on relative abundance.
- References to major protein databases (UniProt, RefSeq, IPI, NCBI nr, ENSEMBL).
- Predicted protein functions, including InterPro matches, GO terms (optionally summarized by user-defined “GO Slim”).
The researcher, using CellMapBase, can also make proteomics data publicly available by directly uploading it into the PRIDE (Proteomics IDEntification database) database at the European Bioinformatics Institute (EBI), a central, international repository of proteomics data.
Access to data generated by the Proteomics Platform relies on two levels of security. First, web-based viewing of Mascot and CellMapBase results is login and password secure. Secondly, each sample is associated with a “project identifier number” that allows for controlled access to data and data analysis results. Researchers can restrict who has access to their project data and can grant permissions for different activities (e.g. read-only). Where a higher level of security is required, CellMapBase can be modified so that automatic annotation information from a project is shared while protein identification and other associated information remain private.
This integrated bioinformatics pipeline relies on state-of the-art 50 terabytes mass storage facility coupled to robust, automated data back up and archiving.
|