Main function for assigning single cells to spatial transcriptomics spots using the CytoSPACE algorithm. This is a complete R implementation of the Python CytoSPACE package.
Usage
run_cytospace(
sc_data,
cell_types,
st_data,
coordinates,
cell_type_fractions = NULL,
cells_per_spot = NULL,
mean_cells_per_spot = 5,
single_cell = FALSE,
st_cell_types = NULL,
distance_metric = c("pearson", "spearman", "euclidean"),
sampling_method = c("duplicates", "synthetic"),
downsample = TRUE,
downsample_target = 1500,
sampling_sub_spots = FALSE,
number_of_selected_spots = 10000,
number_of_selected_sub_spots = 10000,
n_workers = 1,
seed = 1,
verbose = TRUE
)Arguments
- sc_data
Single-cell RNA-seq expression matrix (genes x cells). Can be a matrix, data.frame, or sparse Matrix.
- cell_types
Named character vector of cell type labels. Names should match column names of
sc_data.- st_data
Spatial transcriptomics expression matrix (genes x spots).
- coordinates
Data.frame with spot coordinates. Must have columns 'row' and 'col' (or 'X' and 'Y'), with rownames matching
st_data.- cell_type_fractions
Optional data.frame of cell type fractions per spot. If NULL, will be estimated using Seurat.
- cells_per_spot
Optional integer vector of cell counts per spot. If NULL, will be estimated from RNA content.
- mean_cells_per_spot
Expected mean cells per spot for estimation. Default is 5 (appropriate for 10x Visium).
- single_cell
Logical. If TRUE, treat ST data as single-cell spatial (one cell per spot). Default is FALSE.
- st_cell_types
Optional named vector of cell types for single-cell ST. Required when single_cell=TRUE for optimal assignment.
- distance_metric
Distance metric for cost matrix. One of "pearson" (default), "spearman", or "euclidean".
- sampling_method
Method for handling cell count mismatches. One of "duplicates" (default) or "synthetic" (called "place_holders" in Python).
- downsample
Logical. If TRUE, downsample scRNA-seq to equal counts. Default is TRUE.
- downsample_target
Target count for downsampling. Default is 1500.
- sampling_sub_spots
Logical. If TRUE, use sub-spot sampling for large datasets. Default is FALSE.
- number_of_selected_spots
Chunk size for single_cell mode. Default 10000.
- number_of_selected_sub_spots
Chunk size for sampling_sub_spots mode. Default is 10000.
- n_workers
Number of parallel workers. Default is 1.
- seed
Random seed for reproducibility. Default is 1.
- verbose
Logical. If TRUE, print progress messages. Default is TRUE.
Value
A list containing:
- assigned_locations
Data.frame with cell-to-spot assignments
- expression
Expression matrix for assigned cells
- cell_type_by_spot
Cell type counts per spot (bulk ST only)
- fractional_abundances
Cell type fractions per spot (bulk ST only)
- parameters
List of input parameters
- log
Character vector of log messages
- runtime
Total execution time in seconds
Details
The CytoSPACE algorithm assigns single cells to spatial spots by:
Estimating cell type fractions and cell counts per spot
Sampling single cells to match the ST cell composition
Computing a cost matrix based on expression similarity
Solving the linear assignment problem (LAP) using Jonker-Volgenant
For large datasets, use sampling_sub_spots=TRUE to partition the
problem into manageable chunks processed in parallel.
References
Vahid, M.R., et al. (2023). High-resolution alignment of single-cell and spatial transcriptomes with CytoSPACE. Nature Biotechnology 41, 1543-1548.
Examples
if (FALSE) { # \dontrun{
# Standard bulk ST analysis
results <- run_cytospace(
sc_data = sc_expr,
cell_types = cell_labels,
st_data = st_expr,
coordinates = coords,
mean_cells_per_spot = 5
)
# Single-cell ST with known cell types
results <- run_cytospace(
sc_data = sc_expr,
cell_types = cell_labels,
st_data = st_expr,
coordinates = coords,
single_cell = TRUE,
st_cell_types = st_labels
)
# Large dataset with sub-spot sampling
results <- run_cytospace(
sc_data = sc_expr,
cell_types = cell_labels,
st_data = st_expr,
coordinates = coords,
sampling_sub_spots = TRUE,
number_of_selected_sub_spots = 5000,
n_workers = 4
)
} # }