11  Quantitative Proteomics with QFeatures

Quantitative proteomics involves the measurement and comparison of protein abundances across different conditions. This chapter introduces the QFeatures infrastructure for handling quantitative MS data.

11.1 Understanding Quantitative MS Data

11.1.1 Quantitation Methodologies

There are several approaches to quantitative proteomics, each with distinct advantages:

Label-free MS1: Extracted Ion Chromatograms (XIC)

In label-free quantitation, precursor peaks matching identified peptides are integrated over retention time.

Labelled MS2: Isobaric Tagging (TMT/iTRAQ)

Isobaric tags allow multiplexed quantitation where peptides from different samples are chemically labeled and analyzed together.

Label-free MS2: Spectral Counting

Simple counting of peptide-spectrum matches assigned to each protein.

Labelled MS1: SILAC

Stable isotope labeling allows direct comparison between heavy and light labeled samples.

11.2 The QFeatures Framework

11.2.1 QFeatures Class Structure

QFeatures extends the MultiAssayExperiment class to handle the hierarchical nature of MS data (spectra → peptides → proteins).

flowchart TD
    subgraph QF["QFeatures Object Structure"]
        direction TB
        A[QFeatures Container] --> B[colData<br/>Sample Metadata]
        A --> C[Assays<br/>Hierarchical Levels]
        A --> D[rowData<br/>Feature Annotations]
        
        C --> E1[PSMs Assay<br/>Rows: 5000 PSMs<br/>Cols: 10 Samples]
        C --> E2[Peptides Assay<br/>Rows: 2500 Peptides<br/>Cols: 10 Samples]
        C --> E3[Proteins Assay<br/>Rows: 800 Proteins<br/>Cols: 10 Samples]
        
        E1 -->|aggregateFeatures<br/>by Sequence| E2
        E2 -->|aggregateFeatures<br/>by Protein| E3
    end
    
    subgraph Meta["Metadata Propagation"]
        F[Sample Info<br/>Condition, Batch, etc.] --> B
        G1[PSM Annotations<br/>Scores, RT, m/z] --> D
        G2[Peptide Info<br/>Sequence, Modifications] --> D
        G3[Protein Info<br/>Accession, Gene] --> D
    end
    
    subgraph Process["Data Processing"]
        E3 --> H[filterNA<br/>Remove Missing]
        H --> I[normalize<br/>Median/Quantile]
        I --> J[impute<br/>KNN/MinProb]
        J --> K[logTransform<br/>log2]
        K --> L[limma Analysis<br/>Differential Expression]
    end
    
  style QF fill:#D7E6FB,stroke:#27408B,stroke-width:3px,color:#102A43
  style Meta fill:#FBE0FA,stroke:#B000B0,stroke-width:3px,color:#102A43
  style Process fill:#D7E6FB,stroke:#27408B,stroke-width:3px,color:#102A43

QFeatures Advantages
  • Traceability: Links between PSMs, peptides, and proteins maintained throughout
  • Flexibility: Multiple assays can coexist (different processing strategies)
  • Metadata: Sample and feature annotations travel with the data
  • Reproducibility: Complete processing pipeline encoded in object
An instance of class QFeatures containing 1 set(s):
 [1] psms: SummarizedExperiment with 10 rows and 2 columns 
DataFrame with 2 rows and 1 column
       Group
   <integer>
S1         1
S2         2
class: SummarizedExperiment 
dim: 10 2 
metadata(0):
assays(1): ''
rownames(10): PSM1 PSM2 ... PSM9 PSM10
rowData names(5): Sequence Protein Var location pval
colnames(2): S1 S2
colData names(0):
      S1 S2
PSM1   1 11
PSM2   2 12
PSM3   3 13
PSM4   4 14
PSM5   5 15
PSM6   6 16
PSM7   7 17
PSM8   8 18
PSM9   9 19
PSM10 10 20
DataFrame with 10 rows and 5 columns
           Sequence     Protein       Var      location      pval
        <character> <character> <integer>   <character> <numeric>
PSM1       SYGFNAAR       ProtA         1 Mitochondr...     0.084
PSM2       SYGFNAAR       ProtA         2 Mitochondr...     0.077
PSM3       SYGFNAAR       ProtA         3 Mitochondr...     0.063
PSM4       ELGNDAYK       ProtA         4 Mitochondr...     0.073
PSM5       ELGNDAYK       ProtA         5 Mitochondr...     0.012
PSM6       ELGNDAYK       ProtA         6 Mitochondr...     0.011
PSM7  IAEESNFPFI...       ProtB         7       unknown     0.075
PSM8  IAEESNFPFI...       ProtB         8       unknown     0.038
PSM9  IAEESNFPFI...       ProtB         9       unknown     0.028
PSM10 IAEESNFPFI...       ProtB        10       unknown     0.097

11.2.2 Feature Aggregation

A key feature of QFeatures is the ability to aggregate features from lower to higher levels while maintaining traceability.

An instance of class QFeatures containing 2 set(s):
 [1] psms: SummarizedExperiment with 10 rows and 2 columns 
 [2] peptides: SummarizedExperiment with 3 rows and 2 columns 
             S1   S2
ELGNDAYK    5.0 15.0
IAEESNFPFIK 8.5 18.5
SYGFNAAR    2.0 12.0
DataFrame with 3 rows and 4 columns
                 Sequence     Protein      location        .n
              <character> <character>   <character> <integer>
ELGNDAYK         ELGNDAYK       ProtA Mitochondr...         3
IAEESNFPFIK IAEESNFPFI...       ProtB       unknown         4
SYGFNAAR         SYGFNAAR       ProtA Mitochondr...         3
An instance of class QFeatures containing 3 set(s):
 [1] psms: SummarizedExperiment with 10 rows and 2 columns 
 [2] peptides: SummarizedExperiment with 3 rows and 2 columns 
 [3] proteins: SummarizedExperiment with 2 rows and 2 columns 
       S1   S2
ProtA 3.5 13.5
ProtB 8.5 18.5

11.2.3 Subsetting and Filtering

QFeatures maintains relationships between assays during subsetting operations.

An instance of class QFeatures containing 3 set(s):
 [1] psms: SummarizedExperiment with 6 rows and 2 columns 
 [2] peptides: SummarizedExperiment with 2 rows and 2 columns 
 [3] proteins: SummarizedExperiment with 1 rows and 2 columns 
An instance of class QFeatures containing 3 set(s):
 [1] psms: SummarizedExperiment with 4 rows and 2 columns 
 [2] peptides: SummarizedExperiment with 0 rows and 2 columns 
 [3] proteins: SummarizedExperiment with 0 rows and 2 columns 

11.3 Working with Real Data: CPTAC Dataset

11.3.1 Data Import

An instance of class QFeatures containing 1 set(s):
 [1] peptides: SummarizedExperiment with 1000 rows and 6 columns 

11.3.2 Data Preprocessing Pipeline

An instance of class QFeatures containing 3 set(s):
 [1] peptides: SummarizedExperiment with 1000 rows and 6 columns 
 [2] log_peptides: SummarizedExperiment with 1000 rows and 6 columns 
 [3] norm_peptides: SummarizedExperiment with 1000 rows and 6 columns 

11.3.3 Missing Value Analysis

Overall missing values: 10 %
DataFrame with 10 rows and 3 columns
          name       nNA       pNA
   <character> <integer> <numeric>
1    Peptide_1         1  0.166667
2    Peptide_2         2  0.333333
3    Peptide_3         1  0.166667
4    Peptide_4         2  0.333333
5    Peptide_5         2  0.333333
6    Peptide_6         1  0.166667
7    Peptide_7         1  0.166667
8    Peptide_8         0  0.000000
9    Peptide_9         1  0.166667
10  Peptide_10         0  0.000000
Peptides after filtering: 1000 

11.3.4 Protein Aggregation

An instance of class QFeatures containing 4 set(s):
 [1] peptides: SummarizedExperiment with 1000 rows and 6 columns 
 [2] log_peptides: SummarizedExperiment with 1000 rows and 6 columns 
 [3] norm_peptides: SummarizedExperiment with 1000 rows and 6 columns 
 [4] proteins: SummarizedExperiment with 197 rows and 6 columns 
.n
 1  2  3  4  5  6  7  8  9 10 11 13 14 
 6 16 27 42 32 28 18 13  6  5  1  2  1 

11.4 Quality Control and Visualization

11.4.1 Principal Component Analysis

11.4.2 Expression Profile Visualization

11.5 Statistical Analysis

11.5.1 Differential Expression with limma

# A tibble: 6 × 7
  protein logFC AveExpr     t P.Value adj.P.Val     B
  <chr>   <dbl>   <dbl> <dbl>   <dbl>     <dbl> <dbl>
1 PROT157 -2.27   0.226 -3.88 0.00153     0.301 -1.33
2 PROT137 -1.83   0.372 -3.24 0.00558     0.549 -2.22
3 PROT140  1.70   0.859  2.94 0.0102      0.650 -2.64
4 PROT188 -1.84  -0.551 -2.77 0.0144      0.650 -2.88
5 PROT78   1.51  -0.224  2.71 0.0165      0.650 -2.97
6 PROT187  1.93   0.220  2.46 0.0277      0.824 -3.36

Significantly changed proteins: 0 
Up-regulated: 0 
Down-regulated: 0 

11.5.2 Heatmap of Significant Proteins

11.6 Advanced Aggregation Methods

11.6.1 Robust Summarization

11.7 Working with QFeatures Workflows

11.7.1 Visualization of Data Relationships

11.7.2 Custom Processing Functions

Custom normalization applied successfully
Available assays: peptides log_peptides norm_peptides proteins quantile_norm 

11.8 Exercises

  1. Load the CPTAC dataset and perform complete preprocessing pipeline
  2. Compare different aggregation methods (mean, median, robust)
  3. Implement missing value imputation strategies
  4. Perform differential expression analysis with multiple comparisons
  5. Create custom visualization functions for QFeatures objects

11.9 Summary

This chapter introduced the QFeatures framework for quantitative proteomics analysis. Key concepts covered include:

  • Different quantitation methodologies in proteomics
  • The hierarchical structure of MS quantitative data
  • Feature aggregation strategies
  • Quality control and missing value handling
  • Statistical analysis workflows
  • Visualization of quantitative proteomics data

The QFeatures infrastructure provides a robust foundation for reproducible quantitative proteomics analysis in R.