Packages loaded successfully!
Introduction to Mass Spectrometry in R
This introduction provides hands-on examples of working with mass spectrometry data in R using the R for Mass Spectrometry ecosystem. We’ll explore real datasets and demonstrate key functionalities.
Loading Essential Packages
Exploring Example Datasets
The msdata package provides various example MS datasets for learning and testing.
Proteomics Data
Available proteomics files:
1. MRM-standmix-5.mzML.gz
2. MS3TMT10_01022016_32917-33481.mzML.gz
3. MS3TMT11.mzML
4. TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML.gz
5. TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML.gz
Selected file: TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML.gz
Loading and Examining MS Data
Dataset summary:
Total spectra: 7534
MS levels: 1, 2
RT range: 0.46 3601.98 seconds
m/z range: 100 2008.5
MSn data (Spectra) with 7534 spectra in a MsBackendMzR backend:
msLevel rtime scanIndex
<integer> <numeric> <integer>
1 1 0.4584 1
2 1 0.9725 2
3 1 1.8524 3
4 1 2.7424 4
5 1 3.6124 5
... ... ... ...
7530 2 3600.47 7530
7531 2 3600.83 7531
7532 2 3601.18 7532
7533 2 3601.57 7533
7534 2 3601.98 7534
... 34 more variables/columns.
file(s):
TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML.gz
Different Types of MS Data
SWATH/DIA Dataset:
File: 8d086cf5642a_7862
Spectra count: 8999
MS levels: 2, 1
Metabolomics Dataset 1:
File: 8d082daf3251_7859
Spectra count: 931
Polarity: 1
Metabolomics Dataset 2:
File: 8d08322782d_7860
Spectra count: 931
TMT Proteomics Dataset:
File: 8d0814e73714_7858
Spectra count: 7534
MS levels: 1, 2
Peptide Fragment Calculation
The PSMatch package provides tools for theoretical peptide fragmentation.
Calculating theoretical fragments for: THSQEEMQHMQR
Theoretical fragments:
mz ion type pos z seq peptide
1 102.0550 b1 b 1 1 T THSQEEMQHMQR
2 239.1139 b2 b 2 1 TH THSQEEMQHMQR
3 326.1459 b3 b 3 1 THS THSQEEMQHMQR
4 454.2045 b4 b 4 1 THSQ THSQEEMQHMQR
5 583.2471 b5 b 5 1 THSQE THSQEEMQHMQR
6 712.2897 b6 b 6 1 THSQEE THSQEEMQHMQR
7 843.3301 b7 b 7 1 THSQEEM THSQEEMQHMQR
8 971.3887 b8 b 8 1 THSQEEMQ THSQEEMQHMQR
9 1108.4476 b9 b 9 1 THSQEEMQH THSQEEMQHMQR
10 1239.4881 b10 b 10 1 THSQEEMQHM THSQEEMQHMQR
11 1367.5467 b11 b 11 1 THSQEEMQHMQ THSQEEMQHMQR
12 175.1190 y1 y 1 1 R THSQEEMQHMQR
13 303.1775 y2 y 2 1 QR THSQEEMQHMQR
14 434.2180 y3 y 3 1 MQR THSQEEMQHMQR
15 571.2769 y4 y 4 1 HMQR THSQEEMQHMQR
Fragment types: b, y, b_, y_, b*, y*
Total fragments: 58
Peptide-Spectrum Matching
Working with identification results from database searches.
Available identification files:
1. TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzid
PSM Summary:
Total PSMs: 5802
Unique peptides: 4938
Unique proteins: 3148
First few PSMs:
PSM with 5 rows and 35 columns.
names(35): sequence spectrumID ... subReplacementResidue subLocation
Biological Annotation and Enrichment
Connecting MS data to biological databases for functional analysis.
Gene Ontology database loaded
Available columns: DEFINITION, GOID, ONTOLOGY, TERM
GO Term Information for GO:0005925 :
GOID
1 GO:0005925
DEFINITION
1 A cell-substrate junction that anchors the cell to the extracellular matrix and that forms a point of termination of actin filaments. In insects focal adhesion has also been referred to as hemi-adherens junction (HAJ).
ONTOLOGY TERM
1 CC focal adhesion
Genes associated with Focal Adhesion (GO:0005925):
Total genes: 424
# A tibble: 10 × 5
GO EVIDENCE ONTOLOGY ENTREZID SYMBOL
<chr> <chr> <chr> <chr> <chr>
1 GO:0005925 HDA CC 60 ACTB
2 GO:0005925 HDA CC 70 ACTC1
3 GO:0005925 ISS CC 71 ACTG1
4 GO:0005925 HDA CC 81 ACTN4
5 GO:0005925 HDA CC 87 ACTN1
6 GO:0005925 IMP CC 88 ACTN2
7 GO:0005925 IMP CC 89 ACTN3
8 GO:0005925 HDA CC 102 ADAM10
9 GO:0005925 HDA CC 118 ADD1
10 GO:0005925 HDA CC 214 ALCAM
Genes associated with Centrosome (GO:0005813):
Total genes: 633
# A tibble: 10 × 5
GO EVIDENCE ONTOLOGY ENTREZID SYMBOL
<chr> <chr> <chr> <chr> <chr>
1 GO:0005813 IDA CC 35 ACADS
2 GO:0005813 IDA CC 324 APC
3 GO:0005813 IDA CC 328 APEX1
4 GO:0005813 IDA CC 402 ARL2
5 GO:0005813 IDA CC 403 ARL3
6 GO:0005813 IDA CC 468 ATF4
7 GO:0005813 ISS CC 472 ATM
8 GO:0005813 IBA CC 582 BBS1
9 GO:0005813 IDA CC 585 BBS4
10 GO:0005813 IDA CC 598 BCL2L1
Summary
This introduction demonstrated:
- Loading and examining various MS datasets (proteomics, metabolomics, DIA/SWATH)
- Using the
Spectrainfrastructure with different backends - Calculating theoretical peptide fragments with
PSMatch - Working with peptide-spectrum match (PSM) data
- Connecting MS results to biological annotations (GO terms)
These examples provide a foundation for the detailed analyses covered in subsequent chapters.