Introduction to Mass Spectrometry in R

This introduction provides hands-on examples of working with mass spectrometry data in R using the R for Mass Spectrometry ecosystem. We’ll explore real datasets and demonstrate key functionalities.

Loading Essential Packages

Packages loaded successfully!

Exploring Example Datasets

The msdata package provides various example MS datasets for learning and testing.

Proteomics Data

Available proteomics files:
1. MRM-standmix-5.mzML.gz
2. MS3TMT10_01022016_32917-33481.mzML.gz
3. MS3TMT11.mzML
4. TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML.gz
5. TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML.gz

Selected file: TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML.gz 

Loading and Examining MS Data


Dataset summary:
  Total spectra: 7534 
  MS levels: 1, 2 
  RT range: 0.46 3601.98 seconds
  m/z range: 100 2008.5 
MSn data (Spectra) with 7534 spectra in a MsBackendMzR backend:
       msLevel     rtime scanIndex
     <integer> <numeric> <integer>
1            1    0.4584         1
2            1    0.9725         2
3            1    1.8524         3
4            1    2.7424         4
5            1    3.6124         5
...        ...       ...       ...
7530         2   3600.47      7530
7531         2   3600.83      7531
7532         2   3601.18      7532
7533         2   3601.57      7533
7534         2   3601.98      7534
 ... 34 more variables/columns.

file(s):
TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML.gz

Different Types of MS Data


SWATH/DIA Dataset:
  File: 8d086cf5642a_7862 
  Spectra count: 8999 
  MS levels: 2, 1 

Metabolomics Dataset 1:
  File: 8d082daf3251_7859 
  Spectra count: 931 
  Polarity: 1 

Metabolomics Dataset 2:
  File: 8d08322782d_7860 
  Spectra count: 931 

TMT Proteomics Dataset:
  File: 8d0814e73714_7858 
  Spectra count: 7534 
  MS levels: 1, 2 

Peptide Fragment Calculation

The PSMatch package provides tools for theoretical peptide fragmentation.

Calculating theoretical fragments for: THSQEEMQHMQR 
Theoretical fragments:
          mz ion type pos z         seq      peptide
1   102.0550  b1    b   1 1           T THSQEEMQHMQR
2   239.1139  b2    b   2 1          TH THSQEEMQHMQR
3   326.1459  b3    b   3 1         THS THSQEEMQHMQR
4   454.2045  b4    b   4 1        THSQ THSQEEMQHMQR
5   583.2471  b5    b   5 1       THSQE THSQEEMQHMQR
6   712.2897  b6    b   6 1      THSQEE THSQEEMQHMQR
7   843.3301  b7    b   7 1     THSQEEM THSQEEMQHMQR
8   971.3887  b8    b   8 1    THSQEEMQ THSQEEMQHMQR
9  1108.4476  b9    b   9 1   THSQEEMQH THSQEEMQHMQR
10 1239.4881 b10    b  10 1  THSQEEMQHM THSQEEMQHMQR
11 1367.5467 b11    b  11 1 THSQEEMQHMQ THSQEEMQHMQR
12  175.1190  y1    y   1 1           R THSQEEMQHMQR
13  303.1775  y2    y   2 1          QR THSQEEMQHMQR
14  434.2180  y3    y   3 1         MQR THSQEEMQHMQR
15  571.2769  y4    y   4 1        HMQR THSQEEMQHMQR

Fragment types: b, y, b_, y_, b*, y* 
Total fragments: 58 

Peptide-Spectrum Matching

Working with identification results from database searches.

Available identification files:
1. TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzid

PSM Summary:
  Total PSMs: 5802 
  Unique peptides: 4938 
  Unique proteins: 3148 

First few PSMs:
PSM with 5 rows and 35 columns.
names(35): sequence spectrumID ... subReplacementResidue subLocation

Biological Annotation and Enrichment

Connecting MS data to biological databases for functional analysis.

Gene Ontology database loaded
Available columns: DEFINITION, GOID, ONTOLOGY, TERM 

GO Term Information for GO:0005925 :
        GOID
1 GO:0005925
                                                                                                                                                                                                                  DEFINITION
1 A cell-substrate junction that anchors the cell to the extracellular matrix and that forms a point of termination of actin filaments. In insects focal adhesion has also been referred to as hemi-adherens junction (HAJ).
  ONTOLOGY           TERM
1       CC focal adhesion

Genes associated with Focal Adhesion (GO:0005925):
  Total genes: 424 
# A tibble: 10 × 5
   GO         EVIDENCE ONTOLOGY ENTREZID SYMBOL
   <chr>      <chr>    <chr>    <chr>    <chr> 
 1 GO:0005925 HDA      CC       60       ACTB  
 2 GO:0005925 HDA      CC       70       ACTC1 
 3 GO:0005925 ISS      CC       71       ACTG1 
 4 GO:0005925 HDA      CC       81       ACTN4 
 5 GO:0005925 HDA      CC       87       ACTN1 
 6 GO:0005925 IMP      CC       88       ACTN2 
 7 GO:0005925 IMP      CC       89       ACTN3 
 8 GO:0005925 HDA      CC       102      ADAM10
 9 GO:0005925 HDA      CC       118      ADD1  
10 GO:0005925 HDA      CC       214      ALCAM 

Genes associated with Centrosome (GO:0005813):
  Total genes: 633 
# A tibble: 10 × 5
   GO         EVIDENCE ONTOLOGY ENTREZID SYMBOL
   <chr>      <chr>    <chr>    <chr>    <chr> 
 1 GO:0005813 IDA      CC       35       ACADS 
 2 GO:0005813 IDA      CC       324      APC   
 3 GO:0005813 IDA      CC       328      APEX1 
 4 GO:0005813 IDA      CC       402      ARL2  
 5 GO:0005813 IDA      CC       403      ARL3  
 6 GO:0005813 IDA      CC       468      ATF4  
 7 GO:0005813 ISS      CC       472      ATM   
 8 GO:0005813 IBA      CC       582      BBS1  
 9 GO:0005813 IDA      CC       585      BBS4  
10 GO:0005813 IDA      CC       598      BCL2L1

Summary

This introduction demonstrated:

  • Loading and examining various MS datasets (proteomics, metabolomics, DIA/SWATH)
  • Using the Spectra infrastructure with different backends
  • Calculating theoretical peptide fragments with PSMatch
  • Working with peptide-spectrum match (PSM) data
  • Connecting MS results to biological annotations (GO terms)

These examples provide a foundation for the detailed analyses covered in subsequent chapters.