Skip to contents

Introduction

METAFLUX employs constraint-based metabolic modeling to infer intracellular metabolic fluxes from gene expression data. This vignette provides a comprehensive overview of the mathematical framework underlying the algorithm.

Theoretical Background

Genome-Scale Metabolic Models

A genome-scale metabolic model (GEM) represents the complete set of metabolic reactions in an organism. Human-GEM, used by METAFLUX, contains:

  • 8,378 metabolites
  • 13,082 reactions
  • 3,625 metabolic genes

Stoichiometric Matrix

The stoichiometric matrix 𝐒\mathbf{S} encodes the structure of the metabolic network:

𝐒m×n\mathbf{S} \in \mathbb{R}^{m \times n}

where: - mm = number of metabolites (8,378) - nn = number of reactions (13,082) - SijS_{ij} = stoichiometric coefficient of metabolite ii in reaction jj

Flux Balance Analysis (FBA)

Steady-State Assumption

At metabolic steady state, the rate of production equals the rate of consumption for each metabolite:

𝐒𝐯=0\mathbf{S} \cdot \mathbf{v} = 0

where 𝐯\mathbf{v} is the flux vector.

Optimization Problem

METAFLUX solves a quadratic programming problem:

min𝐯𝐯T𝐏𝐯+𝐪T𝐯\min_{\mathbf{v}} \quad \mathbf{v}^T \mathbf{P} \mathbf{v} + \mathbf{q}^T \mathbf{v}

subject to: 𝐒𝐯=0\mathbf{S} \cdot \mathbf{v} = 0𝐥𝐛𝐯𝐮𝐛\mathbf{lb} \leq \mathbf{v} \leq \mathbf{ub}

where: - 𝐏\mathbf{P} = identity matrix (regularization) - 𝐪\mathbf{q} = objective coefficients (biomass maximization) - 𝐥𝐛,𝐮𝐛\mathbf{lb}, \mathbf{ub} = flux bounds from MRAS

Biomass Objective

The objective function maximizes biomass production:

𝐪={10000if reaction is biomass0otherwise\mathbf{q} = \begin{cases} -10000 & \text{if reaction is biomass} \\ 0 & \text{otherwise} \end{cases}

Metabolic Reaction Activity Scores (MRAS)

Gene-Protein-Reaction (GPR) Rules

GPR rules define the relationship between genes, proteins, and reactions:

  1. Isoenzymes (OR relationship): Multiple genes can catalyze the same reaction
  2. Enzyme complexes (AND relationship): Multiple genes required for one reaction

Scoring Algorithm

Isoenzyme Score (OR)

For reactions with multiple isoenzymes:

Scoreiso=i=1neiwi\text{Score}_{\text{iso}} = \sum_{i=1}^{n} \frac{e_i}{w_i}

where: - eie_i = expression level of gene ii - wiw_i = gene occurrence weight (frequency in GPR rules)

Complex Score (AND)

For enzyme complexes:

Scorecomplex=mini=1neiwi\text{Score}_{\text{complex}} = \min_{i=1}^{n} \frac{e_i}{w_i}

The rate-limiting subunit determines the complex activity.

Mixed Relationships

For complex GPR rules with both AND and OR:

Scoremixed=f(hierarchical combination)\text{Score}_{\text{mixed}} = f(\text{hierarchical combination})

Normalization

Scores are normalized per sample:

MRASij=Scoreijmaxj(Scoreij)\text{MRAS}_{ij} = \frac{\text{Score}_{ij}}{\max_j(\text{Score}_{ij})}

Flux Bounds Construction

Reversible Reactions

For reversible reactions (rev=1\text{rev} = 1):

lbi=MRASi,ubi=MRASilb_i = -\text{MRAS}_i, \quad ub_i = \text{MRAS}_i

Irreversible Reactions

For irreversible reactions (rev=0\text{rev} = 0):

lbi=0,ubi=MRASilb_i = 0, \quad ub_i = \text{MRAS}_i

Exchange Reactions

Exchange reactions represent nutrient uptake/secretion:

lbi={1if nutrient in medium0otherwiselb_i = \begin{cases} -1 & \text{if nutrient in medium} \\ 0 & \text{otherwise} \end{cases}

Community Modeling (Single-Cell)

Community Stoichiometric Matrix

For kk cell types, the community matrix is:

𝐒community=[𝐀1𝐀2𝐀k𝐄𝐒1𝟎𝟎𝟎𝟎𝐒2𝟎𝟎𝟎𝟎𝐒k𝟎]\mathbf{S}_{\text{community}} = \begin{bmatrix} \mathbf{A}_1 & \mathbf{A}_2 & \cdots & \mathbf{A}_k & \mathbf{E} \\ \mathbf{S}_1 & \mathbf{0} & \cdots & \mathbf{0} & \mathbf{0} \\ \mathbf{0} & \mathbf{S}_2 & \cdots & \mathbf{0} & \mathbf{0} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ \mathbf{0} & \mathbf{0} & \cdots & \mathbf{S}_k & \mathbf{0} \end{bmatrix}

where: - 𝐀i\mathbf{A}_i = exchange matrix for cell type ii - 𝐒i\mathbf{S}_i = intracellular stoichiometry - 𝐄\mathbf{E} = external medium exchange

Weighted Objective

Cell type fractions weight the objective:

𝐪community=i=1kfi𝐪i\mathbf{q}_{\text{community}} = \sum_{i=1}^{k} f_i \cdot \mathbf{q}_i

where fif_i is the fraction of cell type ii.

Solver: OSQP

METAFLUX uses the OSQP (Operator Splitting Quadratic Program) solver:

Algorithm Settings

settings <- osqp::osqpSettings(
  max_iter = 1000000L,    # Maximum iterations
  eps_abs = 1e-04,         # Absolute tolerance
  eps_rel = 1e-04,         # Relative tolerance
  adaptive_rho_interval = 50,
  verbose = FALSE
)

Convergence

The solver converges when:

𝐒𝐯<ϵ\|\mathbf{S} \cdot \mathbf{v}\|_{\infty} < \epsilon

Typical precision: ϵ105\epsilon \approx 10^{-5}

Computational Complexity

Operation Complexity
MRAS calculation O(ng)O(n \cdot g)
FBA (per sample) O(mn2)O(m \cdot n^2)
Community model O(k2mn2)O(k^2 \cdot m \cdot n^2)

where: - nn = number of reactions - mm = number of metabolites - gg = number of genes - kk = number of cell types

References

  1. Huang Y, et al. (2023). Characterizing cancer metabolism from bulk and single-cell RNA-seq data using METAFlux. Nature Communications, 14, 4883.

  2. Robinson JL, et al. (2020). An atlas of human metabolism. Science Signaling, 13, eaaz1482.

  3. Orth JD, et al. (2010). What is flux balance analysis? Nature Biotechnology, 28, 245-248.

  4. Stellato B, et al. (2020). OSQP: An operator splitting solver for quadratic programs. Mathematical Programming Computation, 12, 637-672.

Author

Zaoqu Liu - Package maintainer and optimization implementation - GitHub: Zaoqu-Liu - ORCID: 0000-0002-0452-742X