Repository logo

Refining transcriptome gene catalogs by MS-validation of expressed proteins

UdeM.ReferenceFournieParDeposantdoi.org/10.1002/pmic.201700271
UdeM.VersionRioxxVersion originale de l'auteur·e / Author's Original
dc.contributor.affiliationUniversité de Montréal. Faculté des arts et des sciences. Département de sciences biologiques
dc.contributor.authorTse, Sirius P. K.
dc.contributor.authorBeauchemin, Mathieu
dc.contributor.authorMorse, David
dc.contributor.authorLo, Samuel C. L.
dc.date.accessioned2024-07-12T13:28:01Z
dc.date.availableNO_RESTRICTION
dc.date.available2024-07-12T13:28:01Z
dc.date.issued2017-11-19
dc.description.abstractProtein sequence identification by tandem mass spectroscopy (LC-MS/MS) identifies thousands of protein sequences even in complex mixtures, and provides valuable insight into the biological functions of different cells. For non-model organisms, transcriptomes are generally used to allow peptide identification, an important addition to their use as a gene catalog allowing the potential metabolic activities of cells to be determined. We used LC-MS/MS data to identify which of the six possible reading frames in the transcriptome was actually used by the cell to make protein, and asked whether this would have an impact on downstream analyses using the dataset. We combined results from several LC-MS/MS experiments designed to identify peptide sequences in extracts from the dinoflagellate Lingulodinium polyedra using a 74 655-sequence transcriptome. We compiled a list of 6628 translated nucleic acid sequences that contained the ensemble of peptide matches (termed MS-validated sequences) and assessed the similarity in downstream analyses between this data set and the 6628 nucleic acid sequences from which they were derived. When compared with BLASTx analyses of the DNA sequences, the MS-validated protein-sequences-analyzed using BLASTp showed differences in gene ontology, had more identified BLAST hits, and contained more KEGG pathway enzymes. The MS-validated protein sequences also differ from datasets containing longest open reading frame (ORF) protein sequences. We also note a poor correlation between the levels of protein and mRNA abundance, a comparison not previously performed for dinoflagellates. The differences observed between analyses of MS-validated protein sequence and nucleic acid sequence datasets suggest use of the former may provide a more accurate representation of cellular capacity than the latter. Developing MS-validated protein sequence datasets may also speed interpretation of MS-MS spectra in bottom up proteomics experiments.
dc.identifier.doi10.1002/pmic.201700271
dc.identifier.urihttp://hdl.handle.net/1866/33555
dc.publisherWiley
dc.subjectDinoflagellate
dc.subjectMS-sequencing
dc.subjectProteomics
dc.subjectTranscriptome
dc.titleRefining transcriptome gene catalogs by MS-validation of expressed proteins
dc.typeArticle
dcterms.isPartOfurn:ISSN:1615-9853
dcterms.isPartOfurn:ISSN:1615-9861
dcterms.languageeng
oaire.citationIssue1
oaire.citationTitleProteomics : proteomics and systems biology
oaire.citationVolume18

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Refining transcriptomes by MS V2.pdf
Size:
623.64 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.69 KB
Format:
Item-specific license agreed upon to submission
Description: