nnSVG: Nearest-Neighbor Gaussian Process SVG Detection

Detect spatially variable genes using nnSVG, a method based on nearest-neighbor Gaussian processes for scalable spatial modeling.

nnSVG uses nearest-neighbor Gaussian processes (NNGP) to model spatial correlation structure in gene expression. It performs likelihood ratio tests comparing spatial vs. non-spatial models to identify SVGs.

Usage

CalSVG_nnSVG(
  expr_matrix,
  spatial_coords,
  X = NULL,
  n_neighbors = 10L,
  order = c("AMMD", "Sum_coords"),
  cov_model = c("exponential", "gaussian", "spherical", "matern"),
  adjust_method = "BH",
  n_threads = 1L,
  verbose = FALSE
)

Arguments

expr_matrix

Numeric matrix of gene expression values.

Rows: genes
Columns: spatial locations (spots/cells)
Values: log-normalized counts (e.g., from scran::logNormCounts)

spatial_coords

Numeric matrix of spatial coordinates.

Rows: spatial locations (must match columns of expr_matrix)
Columns: x, y coordinates

X

Optional numeric matrix of covariates to regress out.

Rows: spatial locations (same order as spatial_coords)
Columns: covariates (e.g., batch, cell type indicators)

Default is NULL (intercept-only model).

n_neighbors

Integer. Number of nearest neighbors for NNGP model. Default is 10.

5-10: Faster, captures local patterns
15-20: Better likelihood estimates, slower

Values > 15 rarely improve results but increase computation time.

order

Character string specifying coordinate ordering scheme.

"AMMD" (default): Approximate Maximum Minimum Distance. Better for most datasets. Requires >= 65 spots.
"Sum_coords": Order by sum of coordinates. Use for very small datasets (< 65 spots).

cov_model

Character string specifying the covariance function. Default is "exponential".

"exponential": Most commonly used, computationally stable
"gaussian": Smoother patterns, requires stabilization
"spherical": Finite range correlation
"matern": Flexible smoothness (includes additional nu parameter)

adjust_method

Character string for p-value adjustment. Default is "BH" (Benjamini-Hochberg).

n_threads

Integer. Number of parallel threads. Default is 1. Set to number of available cores for faster computation.

verbose

Logical. Print progress messages. Default is FALSE.

Value

A data.frame with SVG detection results. Columns:

gene: Gene identifier
sigma.sq: Spatial variance estimate (sigma^2)
tau.sq: Nonspatial variance estimate (tau^2, nugget)
phi: Range parameter estimate (controls spatial correlation decay)
prop_sv: Proportion of spatial variance = sigma.sq / (sigma.sq + tau.sq)
loglik: Log-likelihood of spatial model
loglik_lm: Log-likelihood of non-spatial model (linear model)
LR_stat: Likelihood ratio test statistic = -2 * (loglik_lm - loglik)
rank: Rank by LR statistic (1 = highest)
p.value: P-value from chi-squared distribution (df = 2)
p.adj: Adjusted p-value
runtime: Computation time per gene (seconds)

Details

Method Overview:

nnSVG models gene expression as a Gaussian process: $$y = X\beta + \omega + \epsilon$$

where:

y = expression vector
X = covariate matrix, beta = coefficients
omega ~ GP(0, sigma^2 * C(phi)) = spatial random effect
epsilon ~ N(0, tau^2) = non-spatial noise
C(phi) = covariance function with range phi

Nearest-Neighbor Approximation: Full GP has O(n^3) complexity. NNGP approximates using only k nearest neighbors, reducing complexity to O(n * k^3) = O(n).

Statistical Test: Likelihood ratio test comparing:

H0 (null): y = X*beta + epsilon (no spatial effect)
H1 (alternative): y = X*beta + omega + epsilon (with spatial effect)

LR statistic follows chi-squared with df = 2 (testing sigma.sq and phi).

Effect Size: Proportion of spatial variance (prop_sv) measures effect size:

prop_sv near 1: Strong spatial pattern
prop_sv near 0: Little spatial structure

Computational Notes:

Requires BRISC package for NNGP fitting
O(n) complexity per gene with NNGP approximation
Parallelization over genes provides good speedup
Memory: O(n * k) per gene

References

Weber, L.M. et al. (2023) nnSVG for the scalable identification of spatially variable genes using nearest-neighbor Gaussian processes. Nature Communications.

Datta, A. et al. (2016) Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets. JASA.

Examples

# Load example data
data(example_svg_data)
expr <- example_svg_data$logcounts[1:10, ]  # Small subset
coords <- example_svg_data$spatial_coords

# \donttest{
# Basic usage (requires BRISC package)
if (requireNamespace("BRISC", quietly = TRUE)) {
    results <- CalSVG_nnSVG(expr, coords, verbose = FALSE)
    head(results)
}
# }