Skip to contents

πŸ“– Documentation: https://zaoqu-liu.github.io/CellOracleR/

Overview

CellOracleR is a comprehensive R implementation of the CellOracle framework for in silico gene perturbation analysis in single-cell RNA sequencing data. This package enables systematic prediction of cell state transitions following transcription factor (TF) perturbations by integrating gene regulatory network (GRN) inference with single-cell trajectory analysis.

Scientific Background

Understanding how transcription factors regulate cell fate decisions is fundamental to developmental biology and regenerative medicine. CellOracleR leverages the mathematical framework of GRN-based signal propagation to simulate the transcriptomic consequences of TF knockouts or overexpression, enabling researchers to:

  • Predict perturbation outcomes before conducting experiments
  • Identify key regulators of cell fate transitions
  • Dissect regulatory mechanisms underlying cellular differentiation
  • Prioritize targets for functional validation studies

Methodological Framework

The CellOracleR workflow comprises four interconnected analytical modules:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        CellOracleR Pipeline                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   Base GRN   │───▢│ GRN Fitting  │───▢│ Perturbation         β”‚  β”‚
β”‚  β”‚ Construction β”‚    β”‚ (Ridge Reg.) β”‚    β”‚ Simulation           β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚        β”‚                    β”‚                      β”‚                β”‚
β”‚        β–Ό                    β–Ό                      β–Ό                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Motif        β”‚    β”‚ Cluster-     β”‚    β”‚ Transition           β”‚  β”‚
β”‚  β”‚ Scanning     β”‚    β”‚ specific GRN β”‚    β”‚ Probability          β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                   β”‚                 β”‚
β”‚                                                   β–Ό                 β”‚
β”‚                                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚                                          β”‚ Cell Fate            β”‚  β”‚
β”‚                                          β”‚ Prediction           β”‚  β”‚
β”‚                                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Installation

# Install from R-universe
install.packages("CellOracleR", repos = "https://zaoqu-liu.r-universe.dev")

From GitHub

# Install development version from GitHub
if (!requireNamespace("remotes", quietly = TRUE))
    install.packages("remotes")
remotes::install_github("Zaoqu-Liu/CellOracleR")

System Requirements

  • R β‰₯ 4.0.0
  • C++ compiler with C++17 support
  • Dependencies: Seurat (V4/V5), glmnet, igraph, Matrix, R6

For motif analysis functionality, Bioconductor packages are required:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c("TFBSTools", "motifmatchr", "JASPAR2020", 
                       "BSgenome", "GenomicRanges"))

Quick Start

Basic Workflow

library(CellOracleR)
library(Seurat)

# 1. Create Oracle object from Seurat
oracle <- create_oracle(
    seurat_obj, 
    cluster_column = "cell_type",
    embedding_name = "umap"
)

# 2. Import TF-target gene regulatory information
oracle$import_TF_data(TFdict = tf_target_dictionary)

# 3. Perform dimensionality reduction and imputation
oracle$perform_PCA(n_components = 50)
oracle$knn_imputation(k = 30)

# 4. Fit cluster-specific GRNs for simulation
oracle$fit_GRN_for_simulation(
    GRN_unit = "cluster",
    alpha = 10
)

# 5. Simulate TF knockout
oracle$simulate_shift(
    perturb_condition = list(GATA1 = 0),  # Knockout GATA1
    n_propagation = 3
)

# 6. Estimate transition probabilities and visualize
oracle$estimate_transition_prob()
oracle$calculate_embedding_shift()
oracle$calculate_grid_arrows(n_grid = 40)

# 7. Visualize perturbation effects
plot_simulation_flow(oracle)

Network Analysis

# Extract GRN as Links object for network analysis
links <- oracle$get_links(
    alpha = 10,
    bagging_number = 200
)

# Filter to significant regulatory edges
links$filter_links(p = 0.001, threshold_number = 2000)

# Compute network centrality metrics
links$get_network_score()

# Identify hub transcription factors
hubs <- identify_hubs(links, top_n = 20, method = "degree")

# Visualize regulatory network
plot_network_graph(links, cluster = "Progenitor")

Key Features

🧬 GRN Inference

  • Ridge regression with L2 regularization for robust coefficient estimation
  • Bootstrap aggregation (bagging) for variance reduction
  • Cluster-specific or whole-dataset GRN fitting
  • Parallel processing via the future framework

πŸ“Š Perturbation Simulation

  • Signal propagation through regulatory networks
  • Support for knockouts, overexpression, and partial perturbations
  • Out-of-distribution detection and clipping
  • Efficient C++ backend via RcppArmadillo

πŸ”¬ Trajectory Analysis

  • Markov chain simulation of cell state transitions
  • Transition probability estimation from expression shifts
  • Pseudotime computation and fate probability analysis
  • Terminal state identification

πŸ“ˆ Visualization

  • Vector field plots showing predicted cell movements
  • Network graphs with centrality-based layouts
  • Degree distribution analysis
  • Full ggplot2 integration for publication-quality figures

Seurat Compatibility

CellOracleR is designed for seamless integration with the Seurat ecosystem:

Feature Seurat V4 Seurat V5
Data import βœ… βœ…
Assay handling βœ… βœ…
Layer access βœ… βœ…
Reduction extraction βœ… βœ…
Metadata integration βœ… βœ…

Performance

CellOracleR achieves high computational efficiency through:

  • Vectorized R operations for data manipulation
  • Rcpp/RcppArmadillo for performance-critical functions
  • Sparse matrix support via the Matrix package
  • Parallel computation using the future framework

Typical runtime for a dataset of 10,000 cells Γ— 3,000 genes: - GRN fitting (200 bootstrap iterations): ~5-10 minutes - Perturbation simulation: ~30 seconds - Transition probability estimation: ~1 minute

Citation

If you use CellOracleR in your research, please cite both the original CellOracle paper and this R implementation:

Original CellOracle: > Kamimoto, K., Hoffmann, C.M., & Morris, S.A. (2023). CellOracle: Dissecting cell identity via network inference and in silico gene perturbation. Molecular Systems Biology, 19(5), e11547. https://doi.org/10.15252/msb.202211547

CellOracleR (R implementation): > Liu, Z. (2025). CellOracleR: An R implementation of CellOracle for in silico gene perturbation analysis. GitHub repository, https://github.com/Zaoqu-Liu/CellOracleR

License

CellOracleR is released under the Apache License 2.0.

Contact


Deciphering cell fate through computational perturbation