Skip to contents

📖 Documentation: https://zaoqu-liu.github.io/CytoSPACER/

Overview

CytoSPACER is an R implementation of CytoSPACE (Vahid et al., Nature Biotechnology, 2023), a computational framework for high-resolution alignment of single-cell transcriptomes to spatial transcriptomics (ST) data. The algorithm formulates cell-to-spot assignment as a linear assignment problem (LAP) and solves it by minimizing a correlation-based cost function using the Jonker-Volgenant algorithm.

This package provides a native R implementation with high-performance C++ backends via Rcpp, enabling seamless integration with the R/Bioconductor ecosystem and Seurat workflows.

Algorithm

CytoSPACER performs spatial mapping through the following steps:

  1. Cell Type Deconvolution: Estimate cell type fractions per ST spot using reference-based deconvolution (via Seurat’s TransferData)
  2. Cell Count Estimation: Infer the number of cells per spot based on total RNA content
  3. Reference Sampling: Sample single cells from the scRNA-seq reference to match the estimated spatial composition
  4. Cost Matrix Construction: Compute pairwise dissimilarity between single cells and ST spots using Pearson correlation
  5. Optimal Assignment: Solve the LAP using the Jonker-Volgenant algorithm to find the globally optimal cell-to-spot mapping

The Jonker-Volgenant algorithm provides an efficient O(n³) solution with excellent practical performance for dense cost matrices.

Features

Feature Description
High Performance C++ implementation of LAP solver and correlation computation
Cross-Platform Native support for Windows, macOS, and Linux
Parallel Processing Multi-core support via the future framework
Seurat Integration Direct compatibility with Seurat v4/v5 objects
Flexible Input Support for CSV, TSV, sparse MTX, and SpaceRanger output
Multiple Metrics Pearson correlation, Spearman correlation, Euclidean distance

Installation

install.packages("CytoSPACER", repos = "https://zaoqu-liu.r-universe.dev")

From GitHub

# Install remotes if not available
if (!require("remotes")) install.packages("remotes")

# Install CytoSPACER
remotes::install_github("Zaoqu-Liu/CytoSPACER")

System Requirements

  • R ≥ 4.0.0
  • C++ compiler with C++11 support (for Rcpp)

Dependencies

Core dependencies (automatically installed): - Rcpp, data.table, Matrix, future, future.apply, progressr, ggplot2

Optional (for extended functionality): - Seurat (≥ 4.0.0) — Seurat object integration and cell type fraction estimation - viridis — Additional color palettes for visualization

Quick Start

Standard Workflow

library(CytoSPACER)

# Load input data
sc_expr <- read_cytospace_input("scRNA_counts.csv")
st_expr <- read_cytospace_input("ST_counts.csv")
coords <- read.csv("coordinates.csv", row.names = 1)

# Prepare cell type annotations
cell_labels <- read.csv("cell_types.csv", row.names = 1)
cell_types <- setNames(cell_labels$CellType, rownames(cell_labels))

# Run CytoSPACER
results <- run_cytospace(
  sc_data = sc_expr,
  cell_types = cell_types,
  st_data = st_expr,
  coordinates = coords,
  mean_cells_per_spot = 5,
  distance_metric = "pearson",
  seed = 42
)

# Export results
write_cytospace_results(results, output_dir = "cytospace_output/")

# Visualize spatial distribution
plot_cytospace(results, type = "cell_types")

Seurat Integration

library(CytoSPACER)
library(Seurat)

# Load Seurat objects
sc_seurat <- readRDS("scRNA_seurat.rds")
st_seurat <- readRDS("visium_seurat.rds")

# Run analysis directly from Seurat objects
results <- run_cytospace_seurat(
  sc_seurat = sc_seurat,
  st_seurat = st_seurat,
  cell_type_col = "celltype"
)

# Add results to spatial Seurat object
st_seurat <- add_cytospace_to_seurat(st_seurat, results)

# Visualize with Seurat
SpatialDimPlot(st_seurat, group.by = "dominant_celltype_cytospace")

Input Data Format

Expression Matrices

Gene × cell/spot matrices with gene names as row names:

GENES Cell_1 Cell_2 Cell_3
Gene1 10 0 5
Gene2 0 8 2

Cell Type Annotations

CellID CellType
Cell_1 B_cell
Cell_2 T_cell
Cell_3 Macrophage

Spatial Coordinates

SpotID row col
Spot_1 0 0
Spot_2 0 1
Spot_3 1 0

Advanced Usage

Distance Metrics

# Pearson correlation (default, recommended for most cases)
results <- run_cytospace(..., distance_metric = "pearson")

# Spearman correlation (robust to outliers)
results <- run_cytospace(..., distance_metric = "spearman")

# Euclidean distance
results <- run_cytospace(..., distance_metric = "euclidean")

Sampling Strategies

# Duplicates method (default): reuse cells when reference is insufficient
results <- run_cytospace(..., sampling_method = "duplicates")

# Synthetic method: generate synthetic cells via gene-wise sampling
results <- run_cytospace(..., sampling_method = "synthetic")

Single-Cell Spatial Data

For single-cell resolution platforms (MERFISH, seqFISH, Xenium, CosMx):

results <- run_cytospace(
  ...,
  single_cell = TRUE,
  st_cell_types = spatial_cell_labels  # Optional prior cell type information
)

Parallel Processing

# Automatic parallelization
results <- run_cytospace(..., n_workers = 8)

# Custom future plan
library(future)
plan(multisession, workers = 8)
results <- run_cytospace(...)

Output

CytoSPACER generates the following outputs:

File Description
assigned_locations.csv Cell-to-spot assignments with spatial coordinates
cell_type_assignments_by_spot.csv Cell type counts per spot
fractional_abundances_by_spot.csv Cell type proportions per spot
assigned_expression/ Expression matrix for assigned cells
log.txt Analysis log with parameters and runtime

Visualization

# Spatial cell type distribution
plot_cytospace(results, type = "cell_types")

# Add jitter for overlapping points
plot_cytospace(results, type = "cell_types", jitter = 0.3)
  
# Cell counts per spot (faceted)
plot_cytospace(results, type = "by_spot", ncol = 4)

# Cell type composition
plot_composition(results, type = "global")
plot_composition(results, type = "per_spot", top_spots = 20)

# Save publication-quality figures
p <- plot_cytospace(results)
save_cytospace_plot(p, "figures/", formats = c("png", "pdf"), dpi = 300)

Performance Considerations

  • Memory: For large datasets (>50,000 cells), use chunk_size parameter to control memory usage
  • Speed: Enable parallel processing with n_workers for datasets with >10,000 spots
  • Sparse data: CytoSPACER automatically handles sparse matrices efficiently
# Optimized for large datasets
results <- run_cytospace(
  ...,
  chunk_size = 5000,
  n_workers = parallel::detectCores() - 1,
  downsample = TRUE,
  downsample_target = 1500
)

Citation

If you use CytoSPACER in your research, please cite:

Vahid MR, Brown EL, Steen CB, Zhang W, Jeon HS, Kang M, Buj R, Sahu A, Datta R, Afshari A, Newman AM. High-resolution alignment of single-cell and spatial transcriptomes with CytoSPACE. Nature Biotechnology 41, 1543–1548 (2023). https://doi.org/10.1038/s41587-023-01697-9

References

License

This project is licensed under the MIT License. See LICENSE for details.

Author

Zaoqu Liu

Contributing

Contributions are welcome! Please submit issues and pull requests on GitHub.


CytoSPACER: Bridging single-cell and spatial transcriptomics through optimal transport