binSpect: Binary Spatial Enrichment Test for SVG Detection

Detect spatially variable genes using the binSpect approach from Giotto. This method binarizes gene expression and tests for spatial enrichment of high-expressing cells using Fisher's exact test.

Identifies spatially variable genes by: 1. Binarizing gene expression (high/low) 2. Building a spatial neighborhood network 3. Testing whether high-expressing cells tend to be neighbors of other high-expressing cells more than expected by chance

Usage

CalSVG_binSpect(
  expr_matrix,
  spatial_coords,
  bin_method = c("kmeans", "rank"),
  rank_percent = 30,
  network_method = c("delaunay", "knn"),
  k = 10L,
  do_fisher_test = TRUE,
  adjust_method = "fdr",
  n_threads = 1L,
  verbose = TRUE
)

Arguments

expr_matrix

Numeric matrix of gene expression values.

Rows: genes
Columns: spatial locations (spots/cells)
Values: normalized expression (e.g., log counts or normalized counts)

spatial_coords

Numeric matrix of spatial coordinates.

Rows: spatial locations (must match columns of expr_matrix)
Columns: x, y (and optionally z) coordinates

bin_method

Character string specifying binarization method.

"kmeans" (default): K-means clustering with k=2. Automatically separates high and low expression groups. Robust to different expression distributions.
"rank": Top percentage by expression rank. More consistent across genes with different distributions. Controlled by rank_percent parameter.

rank_percent

Numeric (0-100). For bin_method = "rank", the percentage of cells to classify as "high expressing". Default is 30 (top 30

Lower values (10-20
Higher values (40-50

network_method

Character string specifying spatial network construction.

"delaunay" (default): Delaunay triangulation
"knn": K-nearest neighbors

k

Integer. Number of neighbors for KNN network. Default is 10.

do_fisher_test

Logical. Whether to perform Fisher's exact test. Default is TRUE.

TRUE: Returns p-values from Fisher's exact test
FALSE: Returns only odds ratios (faster)

adjust_method

Character string for p-value adjustment. Default is "fdr" (Benjamini-Hochberg). See p.adjust() for options.

n_threads

Integer. Number of parallel threads. Default is 1.

verbose

Logical. Print progress messages. Default is TRUE.

Value

A data.frame with SVG detection results, sorted by significance/score. Columns:

gene: Gene identifier
estimate: Odds ratio from 2x2 contingency table. OR > 1 indicates spatial clustering of high-expressing cells.
p.value: P-value from Fisher's exact test (if requested)
p.adj: Adjusted p-value
score: Combined score = -log10(p.value) * estimate
high_expr_count: Number of high-expressing cells

Details

Method Overview:

binSpect constructs a 2x2 contingency table for each gene based on:

Cell A expression: High (1) or Low (0)
Cell B expression: High (1) or Low (0)

For all pairs of neighboring cells (edges in the spatial network):

	Cell B Low	Cell B High
Cell A Low	n_00	n_01
Cell A High	n_10	n_11

Statistical Test: Fisher's exact test is used to test whether n_11 (both neighbors high) is greater than expected under independence.

Odds Ratio Interpretation:

OR = 1: No spatial pattern
OR > 1: High-expressing cells cluster together (positive spatial pattern)
OR < 1: High-expressing cells avoid each other (negative pattern)

Advantages:

Fast computation (no covariance matrix inversion)
Robust to outliers through binarization
Interpretable odds ratio statistic

Considerations:

Binarization threshold affects results
K-means may produce unstable results for bimodal distributions
Rank method more stable but arbitrary threshold

References

Dries, R. et al. (2021) Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biology.

Examples

# Load example data
data(example_svg_data)
expr <- example_svg_data$logcounts[1:20, ]
coords <- example_svg_data$spatial_coords

# \donttest{
# Basic usage (requires RANN package)
if (requireNamespace("RANN", quietly = TRUE)) {
    results <- CalSVG_binSpect(expr, coords, 
                               network_method = "knn", k = 10,
                               verbose = FALSE)
    head(results)
}
#>      gene  estimate       p.value high_expr_count         p.adj      score
#> 1  gene_9 520.89588 1.750567e-258              88 1.167044e-257 134264.467
#> 2  gene_7 212.87179 1.750567e-258             308 1.167044e-257  54869.156
#> 3  gene_1 112.50546 1.750567e-258              76 1.167044e-257  28999.051
#> 4 gene_19  23.82194 1.186133e-249             183 5.930663e-249   5929.898
#> 5  gene_2  48.44525  6.066227e-95              44  9.332657e-95   4564.370
#> 6  gene_5  21.93086 1.122568e-208             363 3.741893e-208   4560.517
# }