flowchart TD
subgraph QF["QFeatures Object Structure"]
direction TB
A[QFeatures Container] --> B[colData<br/>Sample Metadata]
A --> C[Assays<br/>Hierarchical Levels]
A --> D[rowData<br/>Feature Annotations]
C --> E1[PSMs Assay<br/>Rows: 5000 PSMs<br/>Cols: 10 Samples]
C --> E2[Peptides Assay<br/>Rows: 2500 Peptides<br/>Cols: 10 Samples]
C --> E3[Proteins Assay<br/>Rows: 800 Proteins<br/>Cols: 10 Samples]
E1 -->|aggregateFeatures<br/>by Sequence| E2
E2 -->|aggregateFeatures<br/>by Protein| E3
end
subgraph Meta["Metadata Propagation"]
F[Sample Info<br/>Condition, Batch, etc.] --> B
G1[PSM Annotations<br/>Scores, RT, m/z] --> D
G2[Peptide Info<br/>Sequence, Modifications] --> D
G3[Protein Info<br/>Accession, Gene] --> D
end
subgraph Process["Data Processing"]
E3 --> H[filterNA<br/>Remove Missing]
H --> I[normalize<br/>Median/Quantile]
I --> J[impute<br/>KNN/MinProb]
J --> K[logTransform<br/>log2]
K --> L[limma Analysis<br/>Differential Expression]
end
style QF fill:#D7E6FB,stroke:#27408B,stroke-width:3px,color:#102A43
style Meta fill:#FBE0FA,stroke:#B000B0,stroke-width:3px,color:#102A43
style Process fill:#D7E6FB,stroke:#27408B,stroke-width:3px,color:#102A43
11 Quantitative Proteomics with QFeatures
Quantitative proteomics involves the measurement and comparison of protein abundances across different conditions. This chapter introduces the QFeatures infrastructure for handling quantitative MS data.
11.1 Understanding Quantitative MS Data
11.1.1 Quantitation Methodologies
There are several approaches to quantitative proteomics, each with distinct advantages:
Label-free MS1: Extracted Ion Chromatograms (XIC)
In label-free quantitation, precursor peaks matching identified peptides are integrated over retention time.
Labelled MS2: Isobaric Tagging (TMT/iTRAQ)
Isobaric tags allow multiplexed quantitation where peptides from different samples are chemically labeled and analyzed together.
Label-free MS2: Spectral Counting
Simple counting of peptide-spectrum matches assigned to each protein.
Labelled MS1: SILAC
Stable isotope labeling allows direct comparison between heavy and light labeled samples.
11.2 The QFeatures Framework
11.2.1 QFeatures Class Structure
QFeatures extends the MultiAssayExperiment class to handle the hierarchical nature of MS data (spectra → peptides → proteins).
- Traceability: Links between PSMs, peptides, and proteins maintained throughout
- Flexibility: Multiple assays can coexist (different processing strategies)
- Metadata: Sample and feature annotations travel with the data
- Reproducibility: Complete processing pipeline encoded in object
An instance of class QFeatures containing 1 set(s):
[1] psms: SummarizedExperiment with 10 rows and 2 columns
DataFrame with 2 rows and 1 column
Group
<integer>
S1 1
S2 2
class: SummarizedExperiment
dim: 10 2
metadata(0):
assays(1): ''
rownames(10): PSM1 PSM2 ... PSM9 PSM10
rowData names(5): Sequence Protein Var location pval
colnames(2): S1 S2
colData names(0):
S1 S2
PSM1 1 11
PSM2 2 12
PSM3 3 13
PSM4 4 14
PSM5 5 15
PSM6 6 16
PSM7 7 17
PSM8 8 18
PSM9 9 19
PSM10 10 20
DataFrame with 10 rows and 5 columns
Sequence Protein Var location pval
<character> <character> <integer> <character> <numeric>
PSM1 SYGFNAAR ProtA 1 Mitochondr... 0.084
PSM2 SYGFNAAR ProtA 2 Mitochondr... 0.077
PSM3 SYGFNAAR ProtA 3 Mitochondr... 0.063
PSM4 ELGNDAYK ProtA 4 Mitochondr... 0.073
PSM5 ELGNDAYK ProtA 5 Mitochondr... 0.012
PSM6 ELGNDAYK ProtA 6 Mitochondr... 0.011
PSM7 IAEESNFPFI... ProtB 7 unknown 0.075
PSM8 IAEESNFPFI... ProtB 8 unknown 0.038
PSM9 IAEESNFPFI... ProtB 9 unknown 0.028
PSM10 IAEESNFPFI... ProtB 10 unknown 0.097
11.2.2 Feature Aggregation
A key feature of QFeatures is the ability to aggregate features from lower to higher levels while maintaining traceability.
An instance of class QFeatures containing 2 set(s):
[1] psms: SummarizedExperiment with 10 rows and 2 columns
[2] peptides: SummarizedExperiment with 3 rows and 2 columns
S1 S2
ELGNDAYK 5.0 15.0
IAEESNFPFIK 8.5 18.5
SYGFNAAR 2.0 12.0
DataFrame with 3 rows and 4 columns
Sequence Protein location .n
<character> <character> <character> <integer>
ELGNDAYK ELGNDAYK ProtA Mitochondr... 3
IAEESNFPFIK IAEESNFPFI... ProtB unknown 4
SYGFNAAR SYGFNAAR ProtA Mitochondr... 3
An instance of class QFeatures containing 3 set(s):
[1] psms: SummarizedExperiment with 10 rows and 2 columns
[2] peptides: SummarizedExperiment with 3 rows and 2 columns
[3] proteins: SummarizedExperiment with 2 rows and 2 columns
S1 S2
ProtA 3.5 13.5
ProtB 8.5 18.5
11.2.3 Subsetting and Filtering
QFeatures maintains relationships between assays during subsetting operations.
An instance of class QFeatures containing 3 set(s):
[1] psms: SummarizedExperiment with 6 rows and 2 columns
[2] peptides: SummarizedExperiment with 2 rows and 2 columns
[3] proteins: SummarizedExperiment with 1 rows and 2 columns
An instance of class QFeatures containing 3 set(s):
[1] psms: SummarizedExperiment with 4 rows and 2 columns
[2] peptides: SummarizedExperiment with 0 rows and 2 columns
[3] proteins: SummarizedExperiment with 0 rows and 2 columns
11.3 Working with Real Data: CPTAC Dataset
11.3.1 Data Import
An instance of class QFeatures containing 1 set(s):
[1] peptides: SummarizedExperiment with 1000 rows and 6 columns
11.3.2 Data Preprocessing Pipeline
An instance of class QFeatures containing 3 set(s):
[1] peptides: SummarizedExperiment with 1000 rows and 6 columns
[2] log_peptides: SummarizedExperiment with 1000 rows and 6 columns
[3] norm_peptides: SummarizedExperiment with 1000 rows and 6 columns
11.3.3 Missing Value Analysis
Overall missing values: 10 %
DataFrame with 10 rows and 3 columns
name nNA pNA
<character> <integer> <numeric>
1 Peptide_1 1 0.166667
2 Peptide_2 2 0.333333
3 Peptide_3 1 0.166667
4 Peptide_4 2 0.333333
5 Peptide_5 2 0.333333
6 Peptide_6 1 0.166667
7 Peptide_7 1 0.166667
8 Peptide_8 0 0.000000
9 Peptide_9 1 0.166667
10 Peptide_10 0 0.000000
Peptides after filtering: 1000
11.3.4 Protein Aggregation
An instance of class QFeatures containing 4 set(s):
[1] peptides: SummarizedExperiment with 1000 rows and 6 columns
[2] log_peptides: SummarizedExperiment with 1000 rows and 6 columns
[3] norm_peptides: SummarizedExperiment with 1000 rows and 6 columns
[4] proteins: SummarizedExperiment with 197 rows and 6 columns
.n
1 2 3 4 5 6 7 8 9 10 11 13 14
6 16 27 42 32 28 18 13 6 5 1 2 1
11.4 Quality Control and Visualization
11.4.1 Principal Component Analysis


11.4.2 Expression Profile Visualization

11.5 Statistical Analysis
11.5.1 Differential Expression with limma
# A tibble: 6 × 7
protein logFC AveExpr t P.Value adj.P.Val B
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 PROT157 -2.27 0.226 -3.88 0.00153 0.301 -1.33
2 PROT137 -1.83 0.372 -3.24 0.00558 0.549 -2.22
3 PROT140 1.70 0.859 2.94 0.0102 0.650 -2.64
4 PROT188 -1.84 -0.551 -2.77 0.0144 0.650 -2.88
5 PROT78 1.51 -0.224 2.71 0.0165 0.650 -2.97
6 PROT187 1.93 0.220 2.46 0.0277 0.824 -3.36

Significantly changed proteins: 0
Up-regulated: 0
Down-regulated: 0
11.5.2 Heatmap of Significant Proteins
11.6 Advanced Aggregation Methods
11.6.1 Robust Summarization

11.7 Working with QFeatures Workflows
11.7.1 Visualization of Data Relationships

11.7.2 Custom Processing Functions
Custom normalization applied successfully
Available assays: peptides log_peptides norm_peptides proteins quantile_norm
11.8 Exercises
- Load the CPTAC dataset and perform complete preprocessing pipeline
- Compare different aggregation methods (mean, median, robust)
- Implement missing value imputation strategies
- Perform differential expression analysis with multiple comparisons
- Create custom visualization functions for QFeatures objects
11.9 Summary
This chapter introduced the QFeatures framework for quantitative proteomics analysis. Key concepts covered include:
- Different quantitation methodologies in proteomics
- The hierarchical structure of MS quantitative data
- Feature aggregation strategies
- Quality control and missing value handling
- Statistical analysis workflows
- Visualization of quantitative proteomics data
The QFeatures infrastructure provides a robust foundation for reproducible quantitative proteomics analysis in R.