3  R Fundamentals for Mass Spectrometry

This chapter introduces R programming concepts specifically relevant to mass spectrometry data analysis.

3.1 R Environment Setup

3.1.1 Installing Required Packages

The R for Mass Spectrometry ecosystem is built on Bioconductor, a collection of R packages for biological data analysis.

3.1.2 Verifying Installation

Code
# Check if key packages are installed correctly
required_packages <- c("Spectra", "QFeatures", "xcms", "tidyverse")

for (pkg in required_packages) {
  if (requireNamespace(pkg, quietly = TRUE)) {
    cat("✓", pkg, "is installed\n")
  } else {
    cat("✗", pkg, "is NOT installed\n")
  }
}

3.1.3 Loading Essential Libraries

Spectra version: 1.18.2 
tidyverse version: 2.0.0 

3.2 Understanding R for Mass Spectrometry Ecosystem

3.2.1 Package Architecture

The R for Mass Spectrometry initiative provides a modular ecosystem:

  • Core Infrastructure: Spectra, QFeatures for data structures
  • Data Access: MsBackendMzR, MsBackendSql for reading files
  • Proteomics: PSMatch, ProtGenerics for peptide/protein analysis
  • Metabolomics: xcms, CAMERA for small molecule analysis
  • Utilities: MsCoreUtils, MetaboCoreUtils for common operations
Available Spectra Backends:
  • MsBackendMzR - Read mzML/mzXML files 
  • MsBackendDataFrame - In-memory storage 
  • MsBackendHdf5Peaks - HDF5-based storage 
  • MsBackendSql - SQL database storage 

3.3 Data Structures in R for MS

3.3.1 Vectors and Matrices

Mass spectra are fundamentally collections of m/z and intensity pairs, which map naturally to R’s vector and matrix structures.

     mz intensity
1 100.1      1000
2 200.2      2500
3 300.3       800
4 400.4      1200

3.3.2 Lists for Complex Data

MS experiments often contain metadata alongside spectral data.

$instrument
[1] "Orbitrap Fusion"

$ionization
[1] "ESI"

$polarity
[1] "positive"

$acquisition_date
[1] "2025-11-26"

$spectra_count
[1] 1000

3.4 Data Import/Export Basics

3.4.1 Common File Formats in Mass Spectrometry

Format Type Description Use Case
mzML XML Vendor-neutral standard format Raw MS data storage
mzXML XML Older standard format Legacy data
MGF Text Mascot Generic Format MS/MS for database search
CDF Binary NetCDF format GC-MS data
mzTab Text Tab-delimited results Analysis results
Example file: MRM-standmix-5.mzML.gz 

Note: mzR compatibility issue detected, using synthetic data
Error: BiocParallel errors
  1 remote errors, element index: 1
  0 unevaluated and other errors
  first remote error:
Error in DataFrame(..., check.names = FALSE): different row counts implied by arguments
 

Dataset summary:
  Total spectra: 50 
  MS levels: 2, 1 
  RT range: 100 3000 seconds

3.4.2 Working with Spectral Data

First spectrum has 98 peaks
m/z range: 100.45 1978.89 
Intensity range: 7.07 436079.1 

3.4.3 Exporting Data

Code
# Export spectra to different formats

# Export to mzML
export(ms_data[1:10], file = "subset.mzML")

# Export to MGF (for MS2 spectra)
ms2_data <- filterMsLevel(ms_data, 2)
export(ms2_data, file = "ms2_spectra.mgf")

# Export metadata to CSV
metadata <- spectraData(ms_data) %>%
  as.data.frame() %>%
  select(msLevel, rtime, precursorMz, precursorCharge)

write.csv(metadata, "spectra_metadata.csv", row.names = FALSE)

3.5 Exercises

  1. Package Installation: Install the core R for Mass Spectrometry packages and verify the installation
  2. Data Structures: Create vectors representing m/z and intensity values for a hypothetical spectrum with at least 10 peaks
  3. Data Frames: Build a data frame combining multiple spectra with metadata (RT, precursor m/z, charge)
  4. File I/O: Practice loading MS data from the msdata package and explore different file formats
  5. Backend Comparison: Load the same file using different backends and compare memory usage

3.6 Summary

This chapter covered the fundamental R concepts needed for MS data analysis:

  • Package ecosystem: Core Bioconductor packages for MS analysis (Spectra, QFeatures, xcms)
  • Data structures: Vectors, matrices, data frames, and lists for MS data
  • R for MS architecture: Understanding backends and modular design
  • File formats: Common MS formats (mzML, MGF) and how to read/write them
  • Basic operations: Loading MS data and accessing spectral information

With these fundamentals in place, you’re ready to proceed to more advanced MS data processing and analysis workflows in the following chapters.