TorchDecon - Deep Learning Cell Type Deconvolution • TorchDecon

📖 Documentation: https://zaoqu-liu.github.io/TorchDecon/

Overview

TorchDecon is an R package implementing deep neural network-based cell type deconvolution for bulk RNA-sequencing data. The package provides a complete computational framework for estimating cell type proportions from heterogeneous tissue samples using single-cell RNA-seq reference data.

This implementation is based on the Scaden algorithm (Menden et al., Science Advances, 2020), rebuilt natively in R using the torch framework (LibTorch C++ backend), eliminating Python dependencies while maintaining full GPU acceleration capabilities.

Methodological Framework

Algorithm Overview

TorchDecon employs an ensemble deep learning approach consisting of three distinct neural network architectures trained on simulated bulk expression profiles:

Data Simulation: Generate artificial bulk RNA-seq samples by computationally aggregating single-cell expression profiles with known cell type proportions
Feature Engineering: Log2 transformation followed by sample-wise min-max normalization; variance-based gene filtering
Model Training: Supervised learning using Mean Squared Error (MSE) loss with Adam optimization (β₁=0.9, β₂=0.999)
Ensemble Prediction: Final cell type fractions computed as the arithmetic mean across three independently trained networks

Neural Network Architectures

Model	Architecture	Dropout Configuration	Parameters
M256	256 → 128 → 64 → 32 → k	None	Baseline model
M512	512 → 256 → 128 → 64 → k	0, 0.3, 0.2, 0.1	Regularized
M1024	1024 → 512 → 256 → 128 → k	0, 0.6, 0.3, 0.1	High capacity

All architectures employ ReLU activation functions with softmax output layer, where k represents the number of cell types.

Installation

From R-universe (Recommended)

install.packages("TorchDecon", repos = "https://zaoqu-liu.r-universe.dev")

From GitHub

# Install devtools if not available
if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")

devtools::install_github("Zaoqu-Liu/TorchDecon")

Backend Installation

# Install LibTorch backend (required, execute once)
torch::install_torch()

# For CUDA-enabled GPU acceleration (optional)
torch::install_torch(type = "cuda")

System Requirements

R: ≥ 4.0.0
Dependencies: torch (≥ 0.9.0), Seurat (≥ 4.0.0), Matrix, data.table
Hardware: CPU (default) or NVIDIA GPU with CUDA support
Platform: Cross-platform (Windows, macOS, Linux)

Usage

Integrated Workflow

library(TorchDecon)
library(Seurat)

# Load reference scRNA-seq data (Seurat object with cell type annotations)
reference <- readRDS("scrna_reference.rds")

# Load bulk RNA-seq data (genes × samples matrix)
bulk_expr <- as.matrix(read.table("bulk_expression.txt", header = TRUE, row.names = 1))

# Execute complete deconvolution pipeline
result <- RunTorchDecon(
  seurat_object = reference,
  bulk_data = bulk_expr,
  celltype_col = "cell_type",
  n_samples = 2000,           # Number of simulated training samples
  num_steps = 5000,           # Training iterations
  batch_size = 128,
  learning_rate = 1e-4
)

# Extract predicted cell type fractions
cell_fractions <- result$predictions

Modular Workflow

For advanced users requiring granular control:

# Step 1: Simulate bulk samples from scRNA-seq reference
simulation <- SimulateBulk(
  object = reference,
  n_samples = 2000,
  cells_per_sample = 100,
  celltype_col = "cell_type",
  sparse_fraction = 0.5        # Fraction of samples with incomplete cell type coverage
)

# Step 2: Preprocess training data
processed <- ProcessTrainingData(
  simulation = simulation,
  prediction_data = bulk_expr,
  var_cutoff = 0.1,            # Variance threshold for gene filtering
  scaling = "log_min_max"
)

# Step 3: Initialize ensemble model
ensemble <- CreateTorchDeconEnsemble(
  n_features = processed$n_genes,
  n_classes = length(processed$celltypes),
  device = "auto"              # Automatic GPU detection
)

# Step 4: Train model
ensemble <- TrainModel(
  model = ensemble,
  data = processed,
  num_steps = 5000,
  batch_size = 128,
  learning_rate = 1e-4,
  validation_split = 0.1,
  early_stopping = TRUE,
  patience = 500
)

# Step 5: Predict cell type fractions
predictions <- PredictFractions(
  model = ensemble,
  data = bulk_expr,
  return_all = TRUE            # Return individual model predictions
)

# Step 6: Model persistence
SaveModel(ensemble, path = "trained_model")
loaded_model <- LoadModel("trained_model")

Performance Evaluation

# Evaluate predictions against ground truth (if available)
metrics <- EvaluatePredictions(
  predictions = predictions$average,
  truth = ground_truth_fractions
)

# Output: RMSE, MAE, Pearson correlation (overall and per cell type)
print(metrics)

Key Features

Feature	Description
Seurat Integration	Native support for Seurat v4/v5 objects; direct extraction of count matrices and cell annotations
GPU Acceleration	Automatic CUDA detection; seamless CPU/GPU switching
Ensemble Learning	Three architectures for robust, variance-reduced predictions
Reproducibility	Seed control for deterministic results
Model Persistence	Save/load trained models for deployment
Cross-Platform	Pure R implementation without Python dependencies

Documentation

Citation

If you use TorchDecon in your research, please cite:

@software{liu2026torchdecon,
  author = {Liu, Zaoqu},
  title = {{TorchDecon}: Deep Learning-Based Cell Type Deconvolution in {R}},
  year = {2026},
  url = {https://github.com/Zaoqu-Liu/TorchDecon},
  note = {R package version 1.0.0}
}

The underlying methodology is described in:

Menden, K., Marouf, M., Oller, S., Anber, A., Oliveira, P., Bonn, S., & Zafar, H. (2020). Deep learning-based cell composition analysis from tissue expression profiles. Science Advances, 6(30), eaba2619. https://doi.org/10.1126/sciadv.aba2619

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Zaoqu Liu

GitHub: @Zaoqu-Liu
Email: liuzaoqu@163.com

TorchDecon: Bridging single-cell and bulk transcriptomics through deep learning