📖 Documentation: https://zaoqu-liu.github.io/TorchDecon/
Overview
TorchDecon is an R package implementing deep neural network-based cell type deconvolution for bulk RNA-sequencing data. The package provides a complete computational framework for estimating cell type proportions from heterogeneous tissue samples using single-cell RNA-seq reference data.
This implementation is based on the Scaden algorithm (Menden et al., Science Advances, 2020), rebuilt natively in R using the torch framework (LibTorch C++ backend), eliminating Python dependencies while maintaining full GPU acceleration capabilities.
Methodological Framework
Algorithm Overview
TorchDecon employs an ensemble deep learning approach consisting of three distinct neural network architectures trained on simulated bulk expression profiles:
- Data Simulation: Generate artificial bulk RNA-seq samples by computationally aggregating single-cell expression profiles with known cell type proportions
- Feature Engineering: Log2 transformation followed by sample-wise min-max normalization; variance-based gene filtering
- Model Training: Supervised learning using Mean Squared Error (MSE) loss with Adam optimization (β₁=0.9, β₂=0.999)
- Ensemble Prediction: Final cell type fractions computed as the arithmetic mean across three independently trained networks
Neural Network Architectures
| Model | Architecture | Dropout Configuration | Parameters |
|---|---|---|---|
| M256 | 256 → 128 → 64 → 32 → k | None | Baseline model |
| M512 | 512 → 256 → 128 → 64 → k | 0, 0.3, 0.2, 0.1 | Regularized |
| M1024 | 1024 → 512 → 256 → 128 → k | 0, 0.6, 0.3, 0.1 | High capacity |
All architectures employ ReLU activation functions with softmax output layer, where k represents the number of cell types.
Installation
From R-universe (Recommended)
install.packages("TorchDecon", repos = "https://zaoqu-liu.r-universe.dev")From GitHub
# Install devtools if not available
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("Zaoqu-Liu/TorchDecon")Backend Installation
# Install LibTorch backend (required, execute once)
torch::install_torch()
# For CUDA-enabled GPU acceleration (optional)
torch::install_torch(type = "cuda")System Requirements
- R: ≥ 4.0.0
- Dependencies: torch (≥ 0.9.0), Seurat (≥ 4.0.0), Matrix, data.table
- Hardware: CPU (default) or NVIDIA GPU with CUDA support
- Platform: Cross-platform (Windows, macOS, Linux)
Usage
Integrated Workflow
library(TorchDecon)
library(Seurat)
# Load reference scRNA-seq data (Seurat object with cell type annotations)
reference <- readRDS("scrna_reference.rds")
# Load bulk RNA-seq data (genes × samples matrix)
bulk_expr <- as.matrix(read.table("bulk_expression.txt", header = TRUE, row.names = 1))
# Execute complete deconvolution pipeline
result <- RunTorchDecon(
seurat_object = reference,
bulk_data = bulk_expr,
celltype_col = "cell_type",
n_samples = 2000, # Number of simulated training samples
num_steps = 5000, # Training iterations
batch_size = 128,
learning_rate = 1e-4
)
# Extract predicted cell type fractions
cell_fractions <- result$predictionsModular Workflow
For advanced users requiring granular control:
# Step 1: Simulate bulk samples from scRNA-seq reference
simulation <- SimulateBulk(
object = reference,
n_samples = 2000,
cells_per_sample = 100,
celltype_col = "cell_type",
sparse_fraction = 0.5 # Fraction of samples with incomplete cell type coverage
)
# Step 2: Preprocess training data
processed <- ProcessTrainingData(
simulation = simulation,
prediction_data = bulk_expr,
var_cutoff = 0.1, # Variance threshold for gene filtering
scaling = "log_min_max"
)
# Step 3: Initialize ensemble model
ensemble <- CreateTorchDeconEnsemble(
n_features = processed$n_genes,
n_classes = length(processed$celltypes),
device = "auto" # Automatic GPU detection
)
# Step 4: Train model
ensemble <- TrainModel(
model = ensemble,
data = processed,
num_steps = 5000,
batch_size = 128,
learning_rate = 1e-4,
validation_split = 0.1,
early_stopping = TRUE,
patience = 500
)
# Step 5: Predict cell type fractions
predictions <- PredictFractions(
model = ensemble,
data = bulk_expr,
return_all = TRUE # Return individual model predictions
)
# Step 6: Model persistence
SaveModel(ensemble, path = "trained_model")
loaded_model <- LoadModel("trained_model")Performance Evaluation
# Evaluate predictions against ground truth (if available)
metrics <- EvaluatePredictions(
predictions = predictions$average,
truth = ground_truth_fractions
)
# Output: RMSE, MAE, Pearson correlation (overall and per cell type)
print(metrics)Key Features
| Feature | Description |
|---|---|
| Seurat Integration | Native support for Seurat v4/v5 objects; direct extraction of count matrices and cell annotations |
| GPU Acceleration | Automatic CUDA detection; seamless CPU/GPU switching |
| Ensemble Learning | Three architectures for robust, variance-reduced predictions |
| Reproducibility | Seed control for deterministic results |
| Model Persistence | Save/load trained models for deployment |
| Cross-Platform | Pure R implementation without Python dependencies |
Citation
If you use TorchDecon in your research, please cite:
@software{liu2026torchdecon,
author = {Liu, Zaoqu},
title = {{TorchDecon}: Deep Learning-Based Cell Type Deconvolution in {R}},
year = {2026},
url = {https://github.com/Zaoqu-Liu/TorchDecon},
note = {R package version 1.0.0}
}The underlying methodology is described in:
Menden, K., Marouf, M., Oller, S., Anber, A., Oliveira, P., Bonn, S., & Zafar, H. (2020). Deep learning-based cell composition analysis from tissue expression profiles. Science Advances, 6(30), eaba2619. https://doi.org/10.1126/sciadv.aba2619
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author
Zaoqu Liu
- GitHub: @Zaoqu-Liu
- Email: liuzaoqu@163.com
TorchDecon: Bridging single-cell and bulk transcriptomics through deep learning
