Introduction to TorchDecon
Zaoqu Liu
2026-01-26
Source:vignettes/TorchDecon-introduction.Rmd
TorchDecon-introduction.RmdOverview
TorchDecon is an R package for deep learning-based cell type deconvolution of bulk RNA-seq data. It estimates the proportions of different cell types in bulk tissue samples by training deep neural networks on simulated bulk samples generated from single-cell RNA-seq reference data.
Key Features
- Native R implementation: Built on the torch package (LibTorch C++ backend), no Python required
- Seurat integration: Works seamlessly with Seurat objects (v4 and v5)
- GPU acceleration: Automatic CUDA support for faster training
- Ensemble model: Uses three neural networks with different architectures for robust predictions
- Cross-platform: Works on Windows, macOS, and Linux
Installation
# Install from GitHub
devtools::install_github("Zaoqu-Liu/TorchDecon")
# Install the torch backend (required)
torch::install_torch()
# For GPU support (requires CUDA)
# torch::install_torch(type = "cuda")Quick Start
The simplest way to use TorchDecon is with the
RunTorchDecon() function:
library(TorchDecon)
library(Seurat)
# Load your single-cell reference data
seurat_obj <- readRDS("scRNA_reference.rds")
# Load bulk RNA-seq data for deconvolution
bulk_data <- read.table("bulk_expression.txt", header = TRUE, row.names = 1)
# Run the complete workflow
result <- RunTorchDecon(
seurat_object = seurat_obj,
bulk_data = bulk_data,
celltype_col = "cell_type",
n_samples = 2000,
num_steps = 5000,
seed = 42
)
# View predictions
head(result$predictions)Step-by-Step Workflow
For more control, you can run each step separately:
Step 1: Simulate Bulk Data
Generate artificial bulk RNA-seq samples from your scRNA-seq reference:
simulation <- SimulateBulk(
object = seurat_obj,
n_samples = 2000,
cells_per_sample = 100,
celltype_col = "cell_type",
sparse_fraction = 0.5, # 50% of samples will have missing cell types
seed = 42
)
print(simulation)Step 2: Process Training Data
Preprocess the simulated data and find common genes with your prediction data:
processed <- ProcessTrainingData(
simulation = simulation,
prediction_data = bulk_data,
var_cutoff = 0.1,
scaling = "log_min_max"
)
print(processed)Step 3: Create and Train Model
Create and train the deep neural network ensemble:
# Create ensemble (3 models: m256, m512, m1024)
ensemble <- CreateTorchDeconEnsemble(
n_features = processed$n_genes,
n_classes = length(processed$celltypes)
)
# Train the model
ensemble <- TrainModel(
model = ensemble,
data = processed,
num_steps = 5000,
batch_size = 128,
learning_rate = 0.0001
)Step 4: Predict Cell Fractions
Use the trained model to predict cell type fractions:
predictions <- PredictFractions(ensemble, bulk_data)
head(predictions)Step 5: Save and Load Model
Save your trained model for future use:
# Save model
SaveModel(ensemble, "my_trained_model")
# Load model later
loaded_model <- LoadModel("my_trained_model")
# Use loaded model for prediction
new_predictions <- PredictFractions(loaded_model, new_bulk_data)Model Architecture
TorchDecon uses an ensemble of three deep neural networks:
| Model | Architecture | Dropout |
|---|---|---|
| M256 | 256 → 128 → 64 → 32 → n_classes | None |
| M512 | 512 → 256 → 128 → 64 → n_classes | 0, 0.3, 0.2, 0.1 |
| M1024 | 1024 → 512 → 256 → 128 → n_classes | 0, 0.6, 0.3, 0.1 |
Each network uses: - ReLU activation functions - Softmax output layer - Adam optimizer (β1 = 0.9, β2 = 0.999) - Mean Squared Error (MSE) loss
The final prediction is the average of all three models.
Evaluation
If you have ground truth data, you can evaluate prediction accuracy:
metrics <- EvaluatePredictions(predictions, true_fractions)
print(metrics$rmse)
print(metrics$correlation)
print(metrics$per_celltype)Tips for Best Results
- Reference data quality: Use high-quality scRNA-seq data with accurate cell type annotations
- Sample size: More simulated samples (2000-5000) generally improve performance
- Training steps: 5000-10000 steps are usually sufficient
- Gene filtering: The default variance cutoff (0.1) works well for most cases
- GPU acceleration: If available, GPU can speed up training significantly
Citation
If you use TorchDecon in your research, please cite:
Liu Z (2026). TorchDecon: Deep Learning-Based Cell Type Deconvolution Using torch.
R package. https://github.com/Zaoqu-Liu/TorchDecon
This package implements the algorithm described in:
Menden K, et al. (2020). Deep learning-based cell composition analysis from
tissue expression profiles. Science Advances, 6(30), eaba2619.