Skip to contents

Overview

TorchDecon is an R package for deep learning-based cell type deconvolution of bulk RNA-seq data. It estimates the proportions of different cell types in bulk tissue samples by training deep neural networks on simulated bulk samples generated from single-cell RNA-seq reference data.

Key Features

  • Native R implementation: Built on the torch package (LibTorch C++ backend), no Python required
  • Seurat integration: Works seamlessly with Seurat objects (v4 and v5)
  • GPU acceleration: Automatic CUDA support for faster training
  • Ensemble model: Uses three neural networks with different architectures for robust predictions
  • Cross-platform: Works on Windows, macOS, and Linux

Installation

# Install from GitHub
devtools::install_github("Zaoqu-Liu/TorchDecon")

# Install the torch backend (required)
torch::install_torch()

# For GPU support (requires CUDA)
# torch::install_torch(type = "cuda")

Quick Start

The simplest way to use TorchDecon is with the RunTorchDecon() function:

library(TorchDecon)
library(Seurat)

# Load your single-cell reference data
seurat_obj <- readRDS("scRNA_reference.rds")

# Load bulk RNA-seq data for deconvolution
bulk_data <- read.table("bulk_expression.txt", header = TRUE, row.names = 1)

# Run the complete workflow
result <- RunTorchDecon(
  seurat_object = seurat_obj,
  bulk_data = bulk_data,
  celltype_col = "cell_type",
  n_samples = 2000,
  num_steps = 5000,
  seed = 42
)

# View predictions
head(result$predictions)

Step-by-Step Workflow

For more control, you can run each step separately:

Step 1: Simulate Bulk Data

Generate artificial bulk RNA-seq samples from your scRNA-seq reference:

simulation <- SimulateBulk(
  object = seurat_obj,
  n_samples = 2000,
  cells_per_sample = 100,
  celltype_col = "cell_type",
  sparse_fraction = 0.5,  # 50% of samples will have missing cell types
  seed = 42
)

print(simulation)

Step 2: Process Training Data

Preprocess the simulated data and find common genes with your prediction data:

processed <- ProcessTrainingData(
  simulation = simulation,
  prediction_data = bulk_data,
  var_cutoff = 0.1,
  scaling = "log_min_max"
)

print(processed)

Step 3: Create and Train Model

Create and train the deep neural network ensemble:

# Create ensemble (3 models: m256, m512, m1024)
ensemble <- CreateTorchDeconEnsemble(
  n_features = processed$n_genes,
  n_classes = length(processed$celltypes)
)

# Train the model
ensemble <- TrainModel(
  model = ensemble,
  data = processed,
  num_steps = 5000,
  batch_size = 128,
  learning_rate = 0.0001
)

Step 4: Predict Cell Fractions

Use the trained model to predict cell type fractions:

predictions <- PredictFractions(ensemble, bulk_data)
head(predictions)

Step 5: Save and Load Model

Save your trained model for future use:

# Save model
SaveModel(ensemble, "my_trained_model")

# Load model later
loaded_model <- LoadModel("my_trained_model")

# Use loaded model for prediction
new_predictions <- PredictFractions(loaded_model, new_bulk_data)

Model Architecture

TorchDecon uses an ensemble of three deep neural networks:

Model Architecture Dropout
M256 256 → 128 → 64 → 32 → n_classes None
M512 512 → 256 → 128 → 64 → n_classes 0, 0.3, 0.2, 0.1
M1024 1024 → 512 → 256 → 128 → n_classes 0, 0.6, 0.3, 0.1

Each network uses: - ReLU activation functions - Softmax output layer - Adam optimizer (β1 = 0.9, β2 = 0.999) - Mean Squared Error (MSE) loss

The final prediction is the average of all three models.

Evaluation

If you have ground truth data, you can evaluate prediction accuracy:

metrics <- EvaluatePredictions(predictions, true_fractions)
print(metrics$rmse)
print(metrics$correlation)
print(metrics$per_celltype)

Tips for Best Results

  1. Reference data quality: Use high-quality scRNA-seq data with accurate cell type annotations
  2. Sample size: More simulated samples (2000-5000) generally improve performance
  3. Training steps: 5000-10000 steps are usually sufficient
  4. Gene filtering: The default variance cutoff (0.1) works well for most cases
  5. GPU acceleration: If available, GPU can speed up training significantly

Citation

If you use TorchDecon in your research, please cite:

Liu Z (2026). TorchDecon: Deep Learning-Based Cell Type Deconvolution Using torch.
R package. https://github.com/Zaoqu-Liu/TorchDecon

This package implements the algorithm described in:

Menden K, et al. (2020). Deep learning-based cell composition analysis from
tissue expression profiles. Science Advances, 6(30), eaba2619.

Session Info