Skip to contents

📖 Documentation: https://zaoqu-liu.github.io/TorchDecon/

Overview

TorchDecon is an R package implementing deep neural network-based cell type deconvolution for bulk RNA-sequencing data. The package provides a complete computational framework for estimating cell type proportions from heterogeneous tissue samples using single-cell RNA-seq reference data.

This implementation is based on the Scaden algorithm (Menden et al., Science Advances, 2020), rebuilt natively in R using the torch framework (LibTorch C++ backend), eliminating Python dependencies while maintaining full GPU acceleration capabilities.

Methodological Framework

Algorithm Overview

TorchDecon employs an ensemble deep learning approach consisting of three distinct neural network architectures trained on simulated bulk expression profiles:

  1. Data Simulation: Generate artificial bulk RNA-seq samples by computationally aggregating single-cell expression profiles with known cell type proportions
  2. Feature Engineering: Log2 transformation followed by sample-wise min-max normalization; variance-based gene filtering
  3. Model Training: Supervised learning using Mean Squared Error (MSE) loss with Adam optimization (β₁=0.9, β₂=0.999)
  4. Ensemble Prediction: Final cell type fractions computed as the arithmetic mean across three independently trained networks

Neural Network Architectures

Model Architecture Dropout Configuration Parameters
M256 256 → 128 → 64 → 32 → k None Baseline model
M512 512 → 256 → 128 → 64 → k 0, 0.3, 0.2, 0.1 Regularized
M1024 1024 → 512 → 256 → 128 → k 0, 0.6, 0.3, 0.1 High capacity

All architectures employ ReLU activation functions with softmax output layer, where k represents the number of cell types.

Installation

install.packages("TorchDecon", repos = "https://zaoqu-liu.r-universe.dev")

From GitHub

# Install devtools if not available
if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")

devtools::install_github("Zaoqu-Liu/TorchDecon")

Backend Installation

# Install LibTorch backend (required, execute once)
torch::install_torch()

# For CUDA-enabled GPU acceleration (optional)
torch::install_torch(type = "cuda")

System Requirements

  • R: ≥ 4.0.0
  • Dependencies: torch (≥ 0.9.0), Seurat (≥ 4.0.0), Matrix, data.table
  • Hardware: CPU (default) or NVIDIA GPU with CUDA support
  • Platform: Cross-platform (Windows, macOS, Linux)

Usage

Integrated Workflow

library(TorchDecon)
library(Seurat)

# Load reference scRNA-seq data (Seurat object with cell type annotations)
reference <- readRDS("scrna_reference.rds")

# Load bulk RNA-seq data (genes × samples matrix)
bulk_expr <- as.matrix(read.table("bulk_expression.txt", header = TRUE, row.names = 1))

# Execute complete deconvolution pipeline
result <- RunTorchDecon(
  seurat_object = reference,
  bulk_data = bulk_expr,
  celltype_col = "cell_type",
  n_samples = 2000,           # Number of simulated training samples
  num_steps = 5000,           # Training iterations
  batch_size = 128,
  learning_rate = 1e-4
)

# Extract predicted cell type fractions
cell_fractions <- result$predictions

Modular Workflow

For advanced users requiring granular control:

# Step 1: Simulate bulk samples from scRNA-seq reference
simulation <- SimulateBulk(
  object = reference,
  n_samples = 2000,
  cells_per_sample = 100,
  celltype_col = "cell_type",
  sparse_fraction = 0.5        # Fraction of samples with incomplete cell type coverage
)

# Step 2: Preprocess training data
processed <- ProcessTrainingData(
  simulation = simulation,
  prediction_data = bulk_expr,
  var_cutoff = 0.1,            # Variance threshold for gene filtering
  scaling = "log_min_max"
)

# Step 3: Initialize ensemble model
ensemble <- CreateTorchDeconEnsemble(
  n_features = processed$n_genes,
  n_classes = length(processed$celltypes),
  device = "auto"              # Automatic GPU detection
)

# Step 4: Train model
ensemble <- TrainModel(
  model = ensemble,
  data = processed,
  num_steps = 5000,
  batch_size = 128,
  learning_rate = 1e-4,
  validation_split = 0.1,
  early_stopping = TRUE,
  patience = 500
)

# Step 5: Predict cell type fractions
predictions <- PredictFractions(
  model = ensemble,
  data = bulk_expr,
  return_all = TRUE            # Return individual model predictions
)

# Step 6: Model persistence
SaveModel(ensemble, path = "trained_model")
loaded_model <- LoadModel("trained_model")

Performance Evaluation

# Evaluate predictions against ground truth (if available)
metrics <- EvaluatePredictions(
  predictions = predictions$average,
  truth = ground_truth_fractions
)

# Output: RMSE, MAE, Pearson correlation (overall and per cell type)
print(metrics)

Key Features

Feature Description
Seurat Integration Native support for Seurat v4/v5 objects; direct extraction of count matrices and cell annotations
GPU Acceleration Automatic CUDA detection; seamless CPU/GPU switching
Ensemble Learning Three architectures for robust, variance-reduced predictions
Reproducibility Seed control for deterministic results
Model Persistence Save/load trained models for deployment
Cross-Platform Pure R implementation without Python dependencies

Citation

If you use TorchDecon in your research, please cite:

@software{liu2026torchdecon,
  author = {Liu, Zaoqu},
  title = {{TorchDecon}: Deep Learning-Based Cell Type Deconvolution in {R}},
  year = {2026},
  url = {https://github.com/Zaoqu-Liu/TorchDecon},
  note = {R package version 1.0.0}
}

The underlying methodology is described in:

Menden, K., Marouf, M., Oller, S., Anber, A., Oliveira, P., Bonn, S., & Zafar, H. (2020). Deep learning-based cell composition analysis from tissue expression profiles. Science Advances, 6(30), eaba2619. https://doi.org/10.1126/sciadv.aba2619

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Zaoqu Liu


TorchDecon: Bridging single-cell and bulk transcriptomics through deep learning