Skip to contents

Introduction

TorchDecon implements a deep learning-based approach for cell type deconvolution, originally proposed by Menden et al.Β (2020) in their Scaden algorithm. This vignette provides a comprehensive overview of the mathematical framework and algorithmic principles underlying TorchDecon.

Author: Zaoqu Liu ()

Problem Formulation

The Deconvolution Problem

Cell type deconvolution aims to estimate the cellular composition of bulk tissue samples. Given a bulk expression profile π±βˆˆβ„G\mathbf{x} \in \mathbb{R}^G (where GG is the number of genes), we seek to estimate the cell type fraction vector πŸβˆˆβ„K\mathbf{f} \in \mathbb{R}^K (where KK is the number of cell types) such that:

βˆ‘k=1Kfk=1,fkβ‰₯0βˆ€k\sum_{k=1}^{K} f_k = 1, \quad f_k \geq 0 \quad \forall k

Traditional Approaches vs.Β Deep Learning

Traditional deconvolution methods (e.g., CIBERSORT, MuSiC) rely on:

  1. Signature matrices: Pre-defined gene expression signatures for each cell type
  2. Linear mixing models: Assumption that bulk expression is a linear combination of cell type signatures

𝐱=π’β‹…πŸ+π›œ\mathbf{x} = \mathbf{S} \cdot \mathbf{f} + \boldsymbol{\epsilon}

where π’βˆˆβ„GΓ—K\mathbf{S} \in \mathbb{R}^{G \times K} is the signature matrix.

TorchDecon’s deep learning approach instead learns a non-linear mapping directly from bulk expression to cell type fractions:

𝐟=β„±ΞΈ(𝐱)\mathbf{f} = \mathcal{F}_\theta(\mathbf{x})

where β„±ΞΈ\mathcal{F}_\theta is a neural network with parameters ΞΈ\theta.

Algorithmic Pipeline

Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    TorchDecon Pipeline                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚  β”‚  scRNA-seq   │───▢│    Bulk      │───▢│   Training   β”‚      β”‚
β”‚  β”‚  Reference   β”‚    β”‚  Simulation  β”‚    β”‚     Data     β”‚      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β”‚                                                 β”‚               β”‚
β”‚                                                 β–Ό               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚  β”‚  Cell Type   │◀───│   Trained    │◀───│   Training   β”‚      β”‚
β”‚  β”‚  Fractions   β”‚    β”‚   Ensemble   β”‚    β”‚    Loop      β”‚      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β”‚         β–²                                                       β”‚
β”‚         β”‚                                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                              β”‚
β”‚  β”‚  Real Bulk   β”‚                                              β”‚
β”‚  β”‚    Data      β”‚                                              β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                              β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Step 1: Bulk RNA-seq Simulation

Mathematical Formulation

For each simulated sample ss:

  1. Generate random fractions: Sample 𝐟(s)∼Dirichlet(𝛂)\mathbf{f}^{(s)} \sim \text{Dirichlet}(\boldsymbol{\alpha}) or uniform distribution, then normalize:

    fk(s)=ukβˆ‘j=1Kuj,uk∼Uniform(0,1)f_k^{(s)} = \frac{u_k}{\sum_{j=1}^K u_j}, \quad u_k \sim \text{Uniform}(0, 1)

  2. Sample cells: For each cell type kk, sample nk=⌊fk(s)β‹…NβŒ‹n_k = \lfloor f_k^{(s)} \cdot N \rfloor cells (with replacement), where NN is the total cells per sample.

  3. Aggregate expression: Sum the expression across sampled cells:

    xg(s)=βˆ‘k=1Kβˆ‘i∈Ck(s)cgix_g^{(s)} = \sum_{k=1}^{K} \sum_{i \in C_k^{(s)}} c_{gi}

    where Ck(s)C_k^{(s)} is the set of sampled cells from type kk, and cgic_{gi} is the count for gene gg in cell ii.

Sparse Sample Generation

To improve model generalization, TorchDecon generates β€œsparse” samples where some cell types are absent:

fk(s)={fΜƒk/βˆ‘jβˆˆπ’œfΜƒjif kβˆˆπ’œ0otherwisef_k^{(s)} = \begin{cases} \tilde{f}_k / \sum_{j \in \mathcal{A}} \tilde{f}_j & \text{if } k \in \mathcal{A} \\ 0 & \text{otherwise} \end{cases}

where π’œβŠ‚{1,...,K}\mathcal{A} \subset \{1, ..., K\} is a randomly selected subset of cell types.

Step 2: Data Preprocessing

Log Transformation

xΜƒg=log⁑2(xg+1)\tilde{x}_g = \log_2(x_g + 1)

This transformation: - Reduces the dynamic range of expression values - Stabilizes variance - Makes the data more normally distributed

Sample-wise Min-Max Normalization

For each sample ss:

xΜ‚g(s)=xΜƒg(s)βˆ’minj(xΜƒj(s))maxj(xΜƒj(s))βˆ’minj(xΜƒj(s))\hat{x}_g^{(s)} = \frac{\tilde{x}_g^{(s)} - \min_j(\tilde{x}_j^{(s)})}{\max_j(\tilde{x}_j^{(s)}) - \min_j(\tilde{x}_j^{(s)})}

This ensures: - All features are in [0,1][0, 1] - Sample-specific technical variations are mitigated - Neural network training stability is improved

Gene Filtering

Genes are filtered based on variance across samples:

Var(xg)=1nβˆ’1βˆ‘s=1n(xg(s)βˆ’xβ€Ύg)2>Ο„\text{Var}(x_g) = \frac{1}{n-1} \sum_{s=1}^{n} (x_g^{(s)} - \bar{x}_g)^2 > \tau

where Ο„\tau is the variance threshold (default: 0.1).

Step 3: Neural Network Architecture

Fully Connected Network

Each model in the ensemble is a fully connected neural network:

𝐑(l)=Οƒ(𝐖(l)𝐑(lβˆ’1)+𝐛(l))\mathbf{h}^{(l)} = \sigma(\mathbf{W}^{(l)} \mathbf{h}^{(l-1)} + \mathbf{b}^{(l)})

where: - 𝐑(l)\mathbf{h}^{(l)} is the hidden representation at layer ll - 𝐖(l),𝐛(l)\mathbf{W}^{(l)}, \mathbf{b}^{(l)} are learnable weights and biases - Οƒ(β‹…)\sigma(\cdot) is the ReLU activation: Οƒ(z)=max⁑(0,z)\sigma(z) = \max(0, z)

Output Layer (Softmax)

The final layer uses softmax to ensure valid probability distribution:

fk=exp⁑(zk)βˆ‘j=1Kexp⁑(zj)f_k = \frac{\exp(z_k)}{\sum_{j=1}^K \exp(z_j)}

This guarantees: - fk∈(0,1)f_k \in (0, 1) for all kk - βˆ‘kfk=1\sum_k f_k = 1

Dropout Regularization

During training, dropout randomly zeroes elements:

hΜƒi={hi/(1βˆ’p)with probability 1βˆ’p0with probability p\tilde{h}_i = \begin{cases} h_i / (1-p) & \text{with probability } 1-p \\ 0 & \text{with probability } p \end{cases}

Step 4: Training

Loss Function

Mean Squared Error (MSE) between predicted and true fractions:

β„’(ΞΈ)=1nβ‹…Kβˆ‘s=1nβˆ‘k=1K(fΜ‚k(s)βˆ’fk(s))2\mathcal{L}(\theta) = \frac{1}{n \cdot K} \sum_{s=1}^{n} \sum_{k=1}^{K} (\hat{f}_k^{(s)} - f_k^{(s)})^2

Adam Optimizer

Parameter updates using Adam:

mt=Ξ²1mtβˆ’1+(1βˆ’Ξ²1)gtvt=Ξ²2vtβˆ’1+(1βˆ’Ξ²2)gt2mΜ‚t=mt/(1βˆ’Ξ²1t)vΜ‚t=vt/(1βˆ’Ξ²2t)ΞΈt+1=ΞΈtβˆ’Ξ·β‹…mΜ‚t/(vΜ‚t+Ο΅)\begin{aligned} m_t &= \beta_1 m_{t-1} + (1 - \beta_1) g_t \\ v_t &= \beta_2 v_{t-1} + (1 - \beta_2) g_t^2 \\ \hat{m}_t &= m_t / (1 - \beta_1^t) \\ \hat{v}_t &= v_t / (1 - \beta_2^t) \\ \theta_{t+1} &= \theta_t - \eta \cdot \hat{m}_t / (\sqrt{\hat{v}_t} + \epsilon) \end{aligned}

Default hyperparameters: Ξ²1=0.9\beta_1 = 0.9, Ξ²2=0.999\beta_2 = 0.999, Ξ·=10βˆ’4\eta = 10^{-4}

Step 5: Ensemble Prediction

Final prediction is the arithmetic mean across three models:

πŸΜ‚=13βˆ‘m∈{256,512,1024}β„±ΞΈm(𝐱)\hat{\mathbf{f}} = \frac{1}{3} \sum_{m \in \{256, 512, 1024\}} \mathcal{F}_{\theta_m}(\mathbf{x})

Ensemble benefits: - Reduced variance in predictions - Improved robustness to initialization - Better generalization

Architecture Specifications

Model Layer Dimensions Dropout Rates Total Parameters*
M256 G β†’ 256 β†’ 128 β†’ 64 β†’ 32 β†’ K 0, 0, 0, 0 ~GΓ—256 + 50K
M512 G β†’ 512 β†’ 256 β†’ 128 β†’ 64 β†’ K 0, 0.3, 0.2, 0.1 ~GΓ—512 + 200K
M1024 G β†’ 1024 β†’ 512 β†’ 256 β†’ 128 β†’ K 0, 0.6, 0.3, 0.1 ~GΓ—1024 + 800K

*Approximate; depends on number of genes (G) and cell types (K)

Theoretical Considerations

Universal Approximation

Deep neural networks are universal function approximators (Hornik, 1991). Given sufficient capacity, TorchDecon can theoretically approximate any continuous mapping from expression space to fraction space.

Advantages over Linear Models

  1. Non-linear relationships: Can capture complex gene-cell type associations
  2. Feature learning: Automatically learns relevant features from data
  3. Scalability: Handles high-dimensional data efficiently
  4. No signature matrix: Eliminates bias from pre-defined signatures

Limitations

  1. Training data dependency: Performance depends on quality of simulated training data
  2. Batch effects: May be sensitive to technical differences between reference and target data
  3. Novel cell types: Cannot predict cell types not present in training data

References

  1. Menden, K., et al.Β (2020). Deep learning-based cell composition analysis from tissue expression profiles. Science Advances, 6(30), eaba2619.

  2. Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2), 251-257.

  3. Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. ICLR.


Package Author: Zaoqu Liu
Contact:
GitHub: https://github.com/Zaoqu-Liu/TorchDecon