Refining transcriptome gene catalogs by MS-validation of expressed proteins


Article
Version originale de l'auteur·e / Author's Original

Date de publication

Contributrices et contributeurs

Direction de recherche

Publié dans

Proteomics : proteomics and systems biology

Date de la Conférence

Lieu de la Conférence

Éditeur

Wiley

Cycle d'études

Programme

Mots-clés

  • Dinoflagellate
  • MS-sequencing
  • Proteomics
  • Transcriptome

Organisme subventionnaire

Résumé

Protein sequence identification by tandem mass spectroscopy (LC-MS/MS) identifies thousands of protein sequences even in complex mixtures, and provides valuable insight into the biological functions of different cells. For non-model organisms, transcriptomes are generally used to allow peptide identification, an important addition to their use as a gene catalog allowing the potential metabolic activities of cells to be determined. We used LC-MS/MS data to identify which of the six possible reading frames in the transcriptome was actually used by the cell to make protein, and asked whether this would have an impact on downstream analyses using the dataset. We combined results from several LC-MS/MS experiments designed to identify peptide sequences in extracts from the dinoflagellate Lingulodinium polyedra using a 74 655-sequence transcriptome. We compiled a list of 6628 translated nucleic acid sequences that contained the ensemble of peptide matches (termed MS-validated sequences) and assessed the similarity in downstream analyses between this data set and the 6628 nucleic acid sequences from which they were derived. When compared with BLASTx analyses of the DNA sequences, the MS-validated protein-sequences-analyzed using BLASTp showed differences in gene ontology, had more identified BLAST hits, and contained more KEGG pathway enzymes. The MS-validated protein sequences also differ from datasets containing longest open reading frame (ORF) protein sequences. We also note a poor correlation between the levels of protein and mRNA abundance, a comparison not previously performed for dinoflagellates. The differences observed between analyses of MS-validated protein sequence and nucleic acid sequence datasets suggest use of the former may provide a more accurate representation of cellular capacity than the latter. Developing MS-validated protein sequence datasets may also speed interpretation of MS-MS spectra in bottom up proteomics experiments.

Table des matières

Notes

Notes

Autre version linguistique

Ensemble de données lié

Licence

Approbation

Évaluation

Complété par

Référencé par

Ce document diffusé sur Papyrus est la propriété exclusive des titulaires des droits d'auteur et est protégé par la Loi sur le droit d'auteur (L.R.C. (1985), ch. C-42). Sauf si le document est diffusé sous une licence Creative Commons, il ne peut être utilisé que dans le cadre d'une utilisation équitable et non commerciale comme le prévoit la Loi (i.e. à des fins d'étude privée ou de recherche, de critique ou de compte-rendu). Pour toute autre utilisation, une autorisation écrite des titulaires des droits d'auteur sera nécessaire.