Algorithm and Mathematical Framework
Zaoqu Liu
2026-01-26
Source:vignettes/algorithm-details.Rmd
algorithm-details.RmdIntroduction
TorchDecon implements a deep learning-based approach for cell type deconvolution, originally proposed by Menden et al.Β (2020) in their Scaden algorithm. This vignette provides a comprehensive overview of the mathematical framework and algorithmic principles underlying TorchDecon.
Author: Zaoqu Liu (liuzaoqu@163.com)
Problem Formulation
The Deconvolution Problem
Cell type deconvolution aims to estimate the cellular composition of bulk tissue samples. Given a bulk expression profile (where is the number of genes), we seek to estimate the cell type fraction vector (where is the number of cell types) such that:
Traditional Approaches vs.Β Deep Learning
Traditional deconvolution methods (e.g., CIBERSORT, MuSiC) rely on:
- Signature matrices: Pre-defined gene expression signatures for each cell type
- Linear mixing models: Assumption that bulk expression is a linear combination of cell type signatures
where is the signature matrix.
TorchDeconβs deep learning approach instead learns a non-linear mapping directly from bulk expression to cell type fractions:
where is a neural network with parameters .
Algorithmic Pipeline
Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TorchDecon Pipeline β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β scRNA-seq βββββΆβ Bulk βββββΆβ Training β β
β β Reference β β Simulation β β Data β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Cell Type ββββββ Trained ββββββ Training β β
β β Fractions β β Ensemble β β Loop β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β² β
β β β
β ββββββββββββββββ β
β β Real Bulk β β
β β Data β β
β ββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Step 1: Bulk RNA-seq Simulation
Mathematical Formulation
For each simulated sample :
-
Generate random fractions: Sample or uniform distribution, then normalize:
Sample cells: For each cell type , sample cells (with replacement), where is the total cells per sample.
-
Aggregate expression: Sum the expression across sampled cells:
where is the set of sampled cells from type , and is the count for gene in cell .
Step 2: Data Preprocessing
Log Transformation
This transformation: - Reduces the dynamic range of expression values - Stabilizes variance - Makes the data more normally distributed
Step 3: Neural Network Architecture
Fully Connected Network
Each model in the ensemble is a fully connected neural network:
where: - is the hidden representation at layer - are learnable weights and biases - is the ReLU activation:
Architecture Specifications
| Model | Layer Dimensions | Dropout Rates | Total Parameters* |
|---|---|---|---|
| M256 | G β 256 β 128 β 64 β 32 β K | 0, 0, 0, 0 | ~GΓ256 + 50K |
| M512 | G β 512 β 256 β 128 β 64 β K | 0, 0.3, 0.2, 0.1 | ~GΓ512 + 200K |
| M1024 | G β 1024 β 512 β 256 β 128 β K | 0, 0.6, 0.3, 0.1 | ~GΓ1024 + 800K |
*Approximate; depends on number of genes (G) and cell types (K)
Theoretical Considerations
Universal Approximation
Deep neural networks are universal function approximators (Hornik, 1991). Given sufficient capacity, TorchDecon can theoretically approximate any continuous mapping from expression space to fraction space.
References
Menden, K., et al.Β (2020). Deep learning-based cell composition analysis from tissue expression profiles. Science Advances, 6(30), eaba2619.
Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2), 251-257.
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. ICLR.
Package Author: Zaoqu Liu
Contact: liuzaoqu@163.com
GitHub: https://github.com/Zaoqu-Liu/TorchDecon