Skip to contents

Main function for assigning single cells to spatial transcriptomics spots using the CytoSPACE algorithm. This is a complete R implementation of the Python CytoSPACE package.

Usage

run_cytospace(
  sc_data,
  cell_types,
  st_data,
  coordinates,
  cell_type_fractions = NULL,
  cells_per_spot = NULL,
  mean_cells_per_spot = 5,
  single_cell = FALSE,
  st_cell_types = NULL,
  distance_metric = c("pearson", "spearman", "euclidean"),
  sampling_method = c("duplicates", "synthetic"),
  downsample = TRUE,
  downsample_target = 1500,
  sampling_sub_spots = FALSE,
  number_of_selected_spots = 10000,
  number_of_selected_sub_spots = 10000,
  n_workers = 1,
  seed = 1,
  verbose = TRUE
)

Arguments

sc_data

Single-cell RNA-seq expression matrix (genes x cells). Can be a matrix, data.frame, or sparse Matrix.

cell_types

Named character vector of cell type labels. Names should match column names of sc_data.

st_data

Spatial transcriptomics expression matrix (genes x spots).

coordinates

Data.frame with spot coordinates. Must have columns 'row' and 'col' (or 'X' and 'Y'), with rownames matching st_data.

cell_type_fractions

Optional data.frame of cell type fractions per spot. If NULL, will be estimated using Seurat.

cells_per_spot

Optional integer vector of cell counts per spot. If NULL, will be estimated from RNA content.

mean_cells_per_spot

Expected mean cells per spot for estimation. Default is 5 (appropriate for 10x Visium).

single_cell

Logical. If TRUE, treat ST data as single-cell spatial (one cell per spot). Default is FALSE.

st_cell_types

Optional named vector of cell types for single-cell ST. Required when single_cell=TRUE for optimal assignment.

distance_metric

Distance metric for cost matrix. One of "pearson" (default), "spearman", or "euclidean".

sampling_method

Method for handling cell count mismatches. One of "duplicates" (default) or "synthetic" (called "place_holders" in Python).

downsample

Logical. If TRUE, downsample scRNA-seq to equal counts. Default is TRUE.

downsample_target

Target count for downsampling. Default is 1500.

sampling_sub_spots

Logical. If TRUE, use sub-spot sampling for large datasets. Default is FALSE.

number_of_selected_spots

Chunk size for single_cell mode. Default 10000.

number_of_selected_sub_spots

Chunk size for sampling_sub_spots mode. Default is 10000.

n_workers

Number of parallel workers. Default is 1.

seed

Random seed for reproducibility. Default is 1.

verbose

Logical. If TRUE, print progress messages. Default is TRUE.

Value

A list containing:

assigned_locations

Data.frame with cell-to-spot assignments

expression

Expression matrix for assigned cells

cell_type_by_spot

Cell type counts per spot (bulk ST only)

fractional_abundances

Cell type fractions per spot (bulk ST only)

parameters

List of input parameters

log

Character vector of log messages

runtime

Total execution time in seconds

Details

The CytoSPACE algorithm assigns single cells to spatial spots by:

  1. Estimating cell type fractions and cell counts per spot

  2. Sampling single cells to match the ST cell composition

  3. Computing a cost matrix based on expression similarity

  4. Solving the linear assignment problem (LAP) using Jonker-Volgenant

For large datasets, use sampling_sub_spots=TRUE to partition the problem into manageable chunks processed in parallel.

References

Vahid, M.R., et al. (2023). High-resolution alignment of single-cell and spatial transcriptomes with CytoSPACE. Nature Biotechnology 41, 1543-1548.

Examples

if (FALSE) { # \dontrun{
# Standard bulk ST analysis
results <- run_cytospace(
  sc_data = sc_expr,
  cell_types = cell_labels,
  st_data = st_expr,
  coordinates = coords,
  mean_cells_per_spot = 5
)

# Single-cell ST with known cell types
results <- run_cytospace(
  sc_data = sc_expr,
  cell_types = cell_labels,
  st_data = st_expr,
  coordinates = coords,
  single_cell = TRUE,
  st_cell_types = st_labels
)

# Large dataset with sub-spot sampling
results <- run_cytospace(
  sc_data = sc_expr,
  cell_types = cell_labels,
  st_data = st_expr,
  coordinates = coords,
  sampling_sub_spots = TRUE,
  number_of_selected_sub_spots = 5000,
  n_workers = 4
)
} # }