binSpect: Binary Spatial Enrichment Test for SVG Detection
Source:R/CalSVG_binSpect.R
CalSVG_binSpect.RdDetect spatially variable genes using the binSpect approach from Giotto. This method binarizes gene expression and tests for spatial enrichment of high-expressing cells using Fisher's exact test.
Identifies spatially variable genes by: 1. Binarizing gene expression (high/low) 2. Building a spatial neighborhood network 3. Testing whether high-expressing cells tend to be neighbors of other high-expressing cells more than expected by chance
Arguments
- expr_matrix
Numeric matrix of gene expression values.
Rows: genes
Columns: spatial locations (spots/cells)
Values: normalized expression (e.g., log counts or normalized counts)
- spatial_coords
Numeric matrix of spatial coordinates.
Rows: spatial locations (must match columns of expr_matrix)
Columns: x, y (and optionally z) coordinates
- bin_method
Character string specifying binarization method.
"kmeans"(default): K-means clustering with k=2. Automatically separates high and low expression groups. Robust to different expression distributions."rank": Top percentage by expression rank. More consistent across genes with different distributions. Controlled byrank_percentparameter.
- rank_percent
Numeric (0-100). For
bin_method = "rank", the percentage of cells to classify as "high expressing". Default is 30 (top 30Lower values (10-20
Higher values (40-50
- network_method
Character string specifying spatial network construction.
"delaunay"(default): Delaunay triangulation"knn": K-nearest neighbors
- k
Integer. Number of neighbors for KNN network. Default is 10.
- do_fisher_test
Logical. Whether to perform Fisher's exact test. Default is TRUE.
TRUE: Returns p-values from Fisher's exact test
FALSE: Returns only odds ratios (faster)
- adjust_method
Character string for p-value adjustment. Default is "fdr" (Benjamini-Hochberg). See
p.adjust()for options.- n_threads
Integer. Number of parallel threads. Default is 1.
- verbose
Logical. Print progress messages. Default is TRUE.
Value
A data.frame with SVG detection results, sorted by significance/score. Columns:
gene: Gene identifierestimate: Odds ratio from 2x2 contingency table. OR > 1 indicates spatial clustering of high-expressing cells.p.value: P-value from Fisher's exact test (if requested)p.adj: Adjusted p-valuescore: Combined score = -log10(p.value) * estimatehigh_expr_count: Number of high-expressing cells
Details
Method Overview:
binSpect constructs a 2x2 contingency table for each gene based on:
Cell A expression: High (1) or Low (0)
Cell B expression: High (1) or Low (0)
For all pairs of neighboring cells (edges in the spatial network):
| Cell B Low | Cell B High | |
| Cell A Low | n_00 | n_01 |
| Cell A High | n_10 | n_11 |
Statistical Test: Fisher's exact test is used to test whether n_11 (both neighbors high) is greater than expected under independence.
Odds Ratio Interpretation:
OR = 1: No spatial pattern
OR > 1: High-expressing cells cluster together (positive spatial pattern)
OR < 1: High-expressing cells avoid each other (negative pattern)
Advantages:
Fast computation (no covariance matrix inversion)
Robust to outliers through binarization
Interpretable odds ratio statistic
Considerations:
Binarization threshold affects results
K-means may produce unstable results for bimodal distributions
Rank method more stable but arbitrary threshold
References
Dries, R. et al. (2021) Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biology.
Examples
# Load example data
data(example_svg_data)
expr <- example_svg_data$logcounts[1:20, ]
coords <- example_svg_data$spatial_coords
# \donttest{
# Basic usage (requires RANN package)
if (requireNamespace("RANN", quietly = TRUE)) {
results <- CalSVG_binSpect(expr, coords,
network_method = "knn", k = 10,
verbose = FALSE)
head(results)
}
#> gene estimate p.value high_expr_count p.adj score
#> 1 gene_9 520.89588 1.750567e-258 88 1.167044e-257 134264.467
#> 2 gene_7 212.87179 1.750567e-258 308 1.167044e-257 54869.156
#> 3 gene_1 112.50546 1.750567e-258 76 1.167044e-257 28999.051
#> 4 gene_19 23.82194 1.186133e-249 183 5.930663e-249 5929.898
#> 5 gene_2 48.44525 6.066227e-95 44 9.332657e-95 4564.370
#> 6 gene_5 21.93086 1.122568e-208 363 3.741893e-208 4560.517
# }