flowchart LR
subgraph Input["MS Feature Matrix"]
A[Features × Samples<br/>Intensity Data]
end
subgraph QC["Quality Control"]
B[Missing Value<br/>Assessment]
C[CV Analysis<br/>Technical Replicates]
D[Outlier Detection<br/>PCA/Clustering]
end
subgraph Norm["Normalization"]
E[Total Ion Current<br/>TIC]
F[Internal Standard<br/>IS]
G[Median/Quantile<br/>Normalization]
end
subgraph Univariate["Univariate Tests"]
H[t-test / Wilcoxon]
I[ANOVA / Kruskal-Wallis]
J[Linear Models<br/>limma]
end
subgraph Multivariate["Multivariate Analysis"]
K[PCA<br/>Dimensionality Reduction]
L[PLS-DA<br/>Supervised]
M[Hierarchical<br/>Clustering]
end
subgraph Results["Results & Interpretation"]
N[Volcano Plot<br/>FC vs p-value]
O[Heatmap<br/>Expression Patterns]
P[Pathway Analysis<br/>Enrichment]
end
A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
G --> H
G --> K
H --> I
I --> J
K --> L
L --> M
J --> N
M --> O
N --> P
O --> P
style Input fill:#D7E6FB,stroke:#27408B,stroke-width:2px,color:#102A43
style QC fill:#FBE0FA,stroke:#B000B0,stroke-width:2px,color:#102A43
style Norm fill:#D7E6FB,stroke:#27408B,stroke-width:2px,color:#102A43
style Univariate fill:#FBE0FA,stroke:#B000B0,stroke-width:2px,color:#102A43
style Multivariate fill:#D7E6FB,stroke:#27408B,stroke-width:2px,color:#102A43
style Results fill:#FBE0FA,stroke:#B000B0,stroke-width:2px,color:#102A43
8 Statistical Analysis of MS Data
Statistical analysis is fundamental to extracting meaningful biological insights from mass spectrometry data. This chapter covers statistical methods integrated with the R for Mass Spectrometry ecosystem, including univariate and multivariate approaches.
Statistical Analysis Best Practices
- Quality Control First: Remove low-quality features before analysis
- Appropriate Normalization: Choose method based on experimental design
- Multiple Testing Correction: Always apply FDR/Bonferroni correction
- Effect Size: Report fold-changes alongside p-values
- Validation: Confirm findings with orthogonal methods
8.1 Setting Up the Statistical Environment
Dataset created:
Samples: 30
Features: 100
Design: 2 conditions × 3 timepoints × 5 replicates
8.2 Descriptive Statistics
8.2.1 Basic Summary Statistics
feature mean median sd cv min max
Feature_1 Feature_1 38539.95 30587.06 36945.31 95.86238 7683.549 154764.7
Feature_2 Feature_2 43532.54 32455.23 40455.91 92.93257 6380.475 205635.6
Feature_3 Feature_3 37879.54 27653.87 34593.29 91.32447 5644.510 184580.6
Feature_4 Feature_4 36481.56 23965.46 31656.02 86.77265 6037.189 131991.5
Feature_5 Feature_5 33341.32 23482.69 30827.79 92.46123 4261.610 118194.5
Feature_6 Feature_6 57532.78 31559.10 104430.56 181.51490 8037.520 568784.5
Feature_7 Feature_7 38466.44 23346.28 37830.52 98.34684 7718.372 165906.0
Feature_8 Feature_8 38978.13 25492.49 34694.03 89.00897 8192.558 173304.9
Feature_9 Feature_9 50905.26 27036.39 52624.87 103.37805 11720.590 224033.0
Feature_10 Feature_10 42747.82 33157.06 28850.95 67.49104 5802.359 116445.9
8.2.2 Distribution Analysis

8.2.3 Missing Value Analysis

8.3 Hypothesis Testing
8.3.1 Two-Sample t-tests
Number of significant features (FDR < 0.05): 0
[1] p.value statistic estimate_diff feature p.adjusted
<0 rows> (or 0-length row.names)
8.3.2 Volcano Plot

8.4 ANOVA for Multiple Groups
Number of significant features (ANOVA FDR < 0.05): 0
8.5 Correlation Analysis
8.5.1 Feature-Feature Correlations

8.5.2 Correlation with Experimental Factors

8.6 Principal Component Analysis (PCA)
8.6.1 Performing PCA
8.6.2 PCA Visualization

8.6.3 Scree Plot

8.6.4 PCA Loadings

8.7 Clustering Analysis
8.7.1 Hierarchical Clustering

8.7.2 K-means Clustering

8.7.3 Cluster Validation
cluster size ave.sil.width
1 1 22 0.10
2 2 7 -0.03
3 3 1 0.00

Average silhouette width: 0.071
8.8 Heat Map Analysis
8.8.1 Feature Heat Map

8.9 Power Analysis
8.9.1 Sample Size Calculation

8.10 Exercises
- Perform statistical analysis on your own MS dataset
- Implement different multiple testing correction methods and compare results
- Conduct time-series analysis for longitudinal MS data
- Apply machine learning classification to distinguish sample groups
- Develop quality control metrics based on statistical properties
8.11 Summary
This chapter covered essential statistical methods for MS data analysis, including descriptive statistics, hypothesis testing, multivariate analysis, and clustering. These statistical tools are fundamental for extracting meaningful biological insights from mass spectrometry experiments.