CytoSPACER - Single-Cell to Spatial Transcriptomics Mapping • CytoSPACER

📖 Documentation: https://zaoqu-liu.github.io/CytoSPACER/

Overview

CytoSPACER is an R implementation of CytoSPACE (Vahid et al., Nature Biotechnology, 2023), a computational framework for high-resolution alignment of single-cell transcriptomes to spatial transcriptomics (ST) data. The algorithm formulates cell-to-spot assignment as a linear assignment problem (LAP) and solves it by minimizing a correlation-based cost function using the Jonker-Volgenant algorithm.

This package provides a native R implementation with high-performance C++ backends via Rcpp, enabling seamless integration with the R/Bioconductor ecosystem and Seurat workflows.

Algorithm

CytoSPACER performs spatial mapping through the following steps:

Cell Type Deconvolution: Estimate cell type fractions per ST spot using reference-based deconvolution (via Seurat’s TransferData)
Cell Count Estimation: Infer the number of cells per spot based on total RNA content
Reference Sampling: Sample single cells from the scRNA-seq reference to match the estimated spatial composition
Cost Matrix Construction: Compute pairwise dissimilarity between single cells and ST spots using Pearson correlation
Optimal Assignment: Solve the LAP using the Jonker-Volgenant algorithm to find the globally optimal cell-to-spot mapping

The Jonker-Volgenant algorithm provides an efficient O(n³) solution with excellent practical performance for dense cost matrices.

Features

Feature	Description
High Performance	C++ implementation of LAP solver and correlation computation
Cross-Platform	Native support for Windows, macOS, and Linux
Parallel Processing	Multi-core support via the `future` framework
Seurat Integration	Direct compatibility with Seurat v4/v5 objects
Flexible Input	Support for CSV, TSV, sparse MTX, and SpaceRanger output
Multiple Metrics	Pearson correlation, Spearman correlation, Euclidean distance

Installation

From R-universe (Recommended)

install.packages("CytoSPACER", repos = "https://zaoqu-liu.r-universe.dev")

From GitHub

# Install remotes if not available
if (!require("remotes")) install.packages("remotes")

# Install CytoSPACER
remotes::install_github("Zaoqu-Liu/CytoSPACER")

System Requirements

R ≥ 4.0.0
C++ compiler with C++11 support (for Rcpp)

Dependencies

Core dependencies (automatically installed): - Rcpp, data.table, Matrix, future, future.apply, progressr, ggplot2

Optional (for extended functionality): - Seurat (≥ 4.0.0) — Seurat object integration and cell type fraction estimation - viridis — Additional color palettes for visualization

Quick Start

Standard Workflow

library(CytoSPACER)

# Load input data
sc_expr <- read_cytospace_input("scRNA_counts.csv")
st_expr <- read_cytospace_input("ST_counts.csv")
coords <- read.csv("coordinates.csv", row.names = 1)

# Prepare cell type annotations
cell_labels <- read.csv("cell_types.csv", row.names = 1)
cell_types <- setNames(cell_labels$CellType, rownames(cell_labels))

# Run CytoSPACER
results <- run_cytospace(
  sc_data = sc_expr,
  cell_types = cell_types,
  st_data = st_expr,
  coordinates = coords,
  mean_cells_per_spot = 5,
  distance_metric = "pearson",
  seed = 42
)

# Export results
write_cytospace_results(results, output_dir = "cytospace_output/")

# Visualize spatial distribution
plot_cytospace(results, type = "cell_types")

Seurat Integration

library(CytoSPACER)
library(Seurat)

# Load Seurat objects
sc_seurat <- readRDS("scRNA_seurat.rds")
st_seurat <- readRDS("visium_seurat.rds")

# Run analysis directly from Seurat objects
results <- run_cytospace_seurat(
  sc_seurat = sc_seurat,
  st_seurat = st_seurat,
  cell_type_col = "celltype"
)

# Add results to spatial Seurat object
st_seurat <- add_cytospace_to_seurat(st_seurat, results)

# Visualize with Seurat
SpatialDimPlot(st_seurat, group.by = "dominant_celltype_cytospace")

Input Data Format

Expression Matrices

Gene × cell/spot matrices with gene names as row names:

GENES	Cell_1	Cell_2	Cell_3	…
Gene1	10	0	5	…
Gene2	0	8	2	…
…	…	…	…	…

Cell Type Annotations

CellID	CellType
Cell_1	B_cell
Cell_2	T_cell
Cell_3	Macrophage

Spatial Coordinates

SpotID	row	col
Spot_1	0	0
Spot_2	0	1
Spot_3	1	0

Advanced Usage

Distance Metrics

# Pearson correlation (default, recommended for most cases)
results <- run_cytospace(..., distance_metric = "pearson")

# Spearman correlation (robust to outliers)
results <- run_cytospace(..., distance_metric = "spearman")

# Euclidean distance
results <- run_cytospace(..., distance_metric = "euclidean")

Sampling Strategies

# Duplicates method (default): reuse cells when reference is insufficient
results <- run_cytospace(..., sampling_method = "duplicates")

# Synthetic method: generate synthetic cells via gene-wise sampling
results <- run_cytospace(..., sampling_method = "synthetic")

Single-Cell Spatial Data

For single-cell resolution platforms (MERFISH, seqFISH, Xenium, CosMx):

results <- run_cytospace(
  ...,
  single_cell = TRUE,
  st_cell_types = spatial_cell_labels  # Optional prior cell type information
)

Parallel Processing

# Automatic parallelization
results <- run_cytospace(..., n_workers = 8)

# Custom future plan
library(future)
plan(multisession, workers = 8)
results <- run_cytospace(...)

Output

CytoSPACER generates the following outputs:

File	Description
`assigned_locations.csv`	Cell-to-spot assignments with spatial coordinates
`cell_type_assignments_by_spot.csv`	Cell type counts per spot
`fractional_abundances_by_spot.csv`	Cell type proportions per spot
`assigned_expression/`	Expression matrix for assigned cells
`log.txt`	Analysis log with parameters and runtime

Visualization

# Spatial cell type distribution
plot_cytospace(results, type = "cell_types")

# Add jitter for overlapping points
plot_cytospace(results, type = "cell_types", jitter = 0.3)
  
# Cell counts per spot (faceted)
plot_cytospace(results, type = "by_spot", ncol = 4)

# Cell type composition
plot_composition(results, type = "global")
plot_composition(results, type = "per_spot", top_spots = 20)

# Save publication-quality figures
p <- plot_cytospace(results)
save_cytospace_plot(p, "figures/", formats = c("png", "pdf"), dpi = 300)

Performance Considerations

Memory: For large datasets (>50,000 cells), use chunk_size parameter to control memory usage
Speed: Enable parallel processing with n_workers for datasets with >10,000 spots
Sparse data: CytoSPACER automatically handles sparse matrices efficiently

# Optimized for large datasets
results <- run_cytospace(
  ...,
  chunk_size = 5000,
  n_workers = parallel::detectCores() - 1,
  downsample = TRUE,
  downsample_target = 1500
)

Citation

If you use CytoSPACER in your research, please cite:

Vahid MR, Brown EL, Steen CB, Zhang W, Jeon HS, Kang M, Buj R, Sahu A, Datta R, Afshari A, Newman AM. High-resolution alignment of single-cell and spatial transcriptomes with CytoSPACE. Nature Biotechnology 41, 1543–1548 (2023). https://doi.org/10.1038/s41587-023-01697-9

References

CytoSPACE: Vahid et al. (2023). Nature Biotechnology. DOI: 10.1038/s41587-023-01697-9
Jonker-Volgenant Algorithm: Jonker R, Volgenant A. (1987). A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38(4):325-340.
Original Implementation: https://github.com/digitalcytometry/cytospace

License

This project is licensed under the MIT License. See LICENSE for details.

Author

Zaoqu Liu

Contributing

Contributions are welcome! Please submit issues and pull requests on GitHub.

CytoSPACER: Bridging single-cell and spatial transcriptomics through optimal transport