Run CytoSPACE Analysis — run_cytospace • CytoSPACER

Main function for assigning single cells to spatial transcriptomics spots using the CytoSPACE algorithm. This is a complete R implementation of the Python CytoSPACE package.

Usage

run_cytospace(
  sc_data,
  cell_types,
  st_data,
  coordinates,
  cell_type_fractions = NULL,
  cells_per_spot = NULL,
  mean_cells_per_spot = 5,
  single_cell = FALSE,
  st_cell_types = NULL,
  distance_metric = c("pearson", "spearman", "euclidean"),
  sampling_method = c("duplicates", "synthetic"),
  downsample = TRUE,
  downsample_target = 1500,
  sampling_sub_spots = FALSE,
  number_of_selected_spots = 10000,
  number_of_selected_sub_spots = 10000,
  n_workers = 1,
  seed = 1,
  verbose = TRUE
)

Arguments

sc_data: Single-cell RNA-seq expression matrix (genes x cells). Can be a matrix, data.frame, or sparse Matrix.
cell_types: Named character vector of cell type labels. Names should match column names of sc_data.
st_data: Spatial transcriptomics expression matrix (genes x spots).
coordinates: Data.frame with spot coordinates. Must have columns 'row' and 'col' (or 'X' and 'Y'), with rownames matching st_data.
cell_type_fractions: Optional data.frame of cell type fractions per spot. If NULL, will be estimated using Seurat.
cells_per_spot: Optional integer vector of cell counts per spot. If NULL, will be estimated from RNA content.
mean_cells_per_spot: Expected mean cells per spot for estimation. Default is 5 (appropriate for 10x Visium).
single_cell: Logical. If TRUE, treat ST data as single-cell spatial (one cell per spot). Default is FALSE.
st_cell_types: Optional named vector of cell types for single-cell ST. Required when single_cell=TRUE for optimal assignment.
distance_metric: Distance metric for cost matrix. One of "pearson" (default), "spearman", or "euclidean".
sampling_method: Method for handling cell count mismatches. One of "duplicates" (default) or "synthetic" (called "place_holders" in Python).
downsample: Logical. If TRUE, downsample scRNA-seq to equal counts. Default is TRUE.
downsample_target: Target count for downsampling. Default is 1500.
sampling_sub_spots: Logical. If TRUE, use sub-spot sampling for large datasets. Default is FALSE.
number_of_selected_spots: Chunk size for single_cell mode. Default 10000.
number_of_selected_sub_spots: Chunk size for sampling_sub_spots mode. Default is 10000.
n_workers: Number of parallel workers. Default is 1.
seed: Random seed for reproducibility. Default is 1.
verbose: Logical. If TRUE, print progress messages. Default is TRUE.

Value

A list containing:

assigned_locations: Data.frame with cell-to-spot assignments
expression: Expression matrix for assigned cells
cell_type_by_spot: Cell type counts per spot (bulk ST only)
fractional_abundances: Cell type fractions per spot (bulk ST only)
parameters: List of input parameters
log: Character vector of log messages
runtime: Total execution time in seconds

Details

The CytoSPACE algorithm assigns single cells to spatial spots by:

Estimating cell type fractions and cell counts per spot
Sampling single cells to match the ST cell composition
Computing a cost matrix based on expression similarity
Solving the linear assignment problem (LAP) using Jonker-Volgenant

For large datasets, use sampling_sub_spots=TRUE to partition the problem into manageable chunks processed in parallel.

References

Vahid, M.R., et al. (2023). High-resolution alignment of single-cell and spatial transcriptomes with CytoSPACE. Nature Biotechnology 41, 1543-1548.

Examples

if (FALSE) { # \dontrun{
# Standard bulk ST analysis
results <- run_cytospace(
  sc_data = sc_expr,
  cell_types = cell_labels,
  st_data = st_expr,
  coordinates = coords,
  mean_cells_per_spot = 5
)

# Single-cell ST with known cell types
results <- run_cytospace(
  sc_data = sc_expr,
  cell_types = cell_labels,
  st_data = st_expr,
  coordinates = coords,
  single_cell = TRUE,
  st_cell_types = st_labels
)

# Large dataset with sub-spot sampling
results <- run_cytospace(
  sc_data = sc_expr,
  cell_types = cell_labels,
  st_data = st_expr,
  coordinates = coords,
  sampling_sub_spots = TRUE,
  number_of_selected_sub_spots = 5000,
  n_workers = 4
)
} # }