Proteomics focuses on the large-scale study of proteins, including their identification, quantification, and functional analysis. This chapter covers computational methods for bottom-up proteomics data analysis using the R for Mass Spectrometry ecosystem.
10.1 Setting Up Proteomics Environment
The R for Mass Spectrometry ecosystem provides specialized packages for proteomics analysis:
The typical bottom-up proteomics workflow involves:
Sample preparation: Protein extraction, digestion (usually with trypsin)
LC-MS/MS analysis: Liquid chromatography coupled to tandem mass spectrometry
Database searching: Matching MS/MS spectra to peptide sequences
Protein inference: Assembling peptides into protein identifications
Quantitative analysis: Comparing protein abundances across samples
flowchart TD
subgraph Sample["Sample Preparation"]
A[Protein Extraction] --> B[Reduction & Alkylation]
B --> C[Enzymatic Digestion<br/>Trypsin]
C --> D[Peptide Cleanup<br/>Desalting]
end
subgraph MS["LC-MS/MS Analysis"]
D --> E[LC Separation<br/>Reverse Phase]
E --> F[MS1 Scan<br/>Precursor Selection]
F --> G[MS2 Fragmentation<br/>HCD/CID/ETD]
G --> H[Raw Data<br/>mzML Files]
end
subgraph Search["Database Searching"]
H --> I[Spectra Object<br/>R/Spectra]
I --> J{Search Engine}
J --> K1[Mascot]
J --> K2[MaxQuant]
J --> K3[MSFragger]
K1 --> L[PSM Table<br/>PSMatch]
K2 --> L
K3 --> L
end
subgraph Inference["Protein Inference"]
L --> M[Filter PSMs<br/>FDR < 1%]
M --> N[Peptide Assembly<br/>Unique + Shared]
N --> O[Protein Grouping<br/>Parsimony Principle]
end
subgraph Quant["Quantification"]
O --> P{Quant Method?}
P -->|Label-Free| Q1[XIC Integration<br/>MS1 Intensity]
P -->|TMT/iTRAQ| Q2[Reporter Ions<br/>MS2 Intensity]
P -->|SILAC| Q3[Heavy/Light Ratio<br/>MS1 Intensity]
Q1 --> R[QFeatures Object]
Q2 --> R
Q3 --> R
end
subgraph Analysis["Statistical Analysis"]
R --> S[PSM → Peptide<br/>Aggregation]
S --> T[Peptide → Protein<br/>Summarization]
T --> U[Differential Analysis<br/>limma/DEqMS]
U --> V[Results<br/>Volcano/Heatmap]
end
style Sample fill:#D7E6FB,stroke:#27408B,stroke-width:3px,color:#102A43
style MS fill:#FBE0FA,stroke:#B000B0,stroke-width:3px,color:#102A43
style Search fill:#D7E6FB,stroke:#27408B,stroke-width:3px,color:#102A43
style Inference fill:#FBE0FA,stroke:#B000B0,stroke-width:3px,color:#102A43
style Quant fill:#D7E6FB,stroke:#27408B,stroke-width:3px,color:#102A43
style Analysis fill:#FBE0FA,stroke:#B000B0,stroke-width:3px,color:#102A43
Key Proteomics Concepts
PSM (Peptide-Spectrum Match): One MS/MS spectrum matched to one peptide sequence
FDR (False Discovery Rate): Typically controlled at 1% using target-decoy approach
Protein Parsimony: Minimal set of proteins explaining observed peptides
Missing Values: Can occur at PSM, peptide, or protein level - handle appropriately
10.2.2 Data Structures in Proteomics
Proteomics data has a hierarchical structure: - Spectra: Raw MS and MS/MS data - PSMs: Peptide-Spectrum Matches from database search - Peptides: Unique peptide sequences - Proteins: Protein groups inferred from peptides
10.3 MS/MS Spectral Data Processing
10.3.1 Loading and Examining MS/MS Data
Note: Using synthetic data due to mzR compatibility issues
Error details: BiocParallel errors
1 remote errors, element index: 1
0 unevaluated and other errors
first remote error:
Error in DataFrame(..., check.names = FALSE): different row counts implied by arguments
Created database with 100 proteins and 995 peptides
10.4.2 Simulate Peptide-Spectrum Matches (PSMs)
Generated 0 PSMs
10.4.3 PSM Quality Assessment and Filtering
10.4.4 PSM Filtering and FDR Control
10.5 Protein Inference and Quantification
10.5.1 Protein Grouping
10.5.2 Label-Free Quantification
10.5.3 Data Normalization and Preprocessing
10.6 Differential Expression Analysis
10.6.1 Statistical Testing with limma
10.6.2 Volcano Plot
10.6.3 Protein Set Analysis
10.7 Data Visualization and Reporting
10.7.1 Heat Map of Significant Proteins
10.8 Exercises
Analyze real proteomics data from a public repository
Implement different protein inference algorithms
Compare various normalization methods for label-free quantification
Perform time-course proteomics analysis
Integrate proteomics with other omics data types
10.9 Summary
This chapter covered comprehensive proteomics data analysis workflows, including MS/MS data processing, protein identification, quantification, and differential expression analysis. These methods are essential for extracting biological insights from bottom-up proteomics experiments.