Skip to contents

Detect spatially variable genes using nnSVG, a method based on nearest-neighbor Gaussian processes for scalable spatial modeling.

nnSVG uses nearest-neighbor Gaussian processes (NNGP) to model spatial correlation structure in gene expression. It performs likelihood ratio tests comparing spatial vs. non-spatial models to identify SVGs.

Usage

CalSVG_nnSVG(
  expr_matrix,
  spatial_coords,
  X = NULL,
  n_neighbors = 10L,
  order = c("AMMD", "Sum_coords"),
  cov_model = c("exponential", "gaussian", "spherical", "matern"),
  adjust_method = "BH",
  n_threads = 1L,
  verbose = FALSE
)

Arguments

expr_matrix

Numeric matrix of gene expression values.

  • Rows: genes

  • Columns: spatial locations (spots/cells)

  • Values: log-normalized counts (e.g., from scran::logNormCounts)

spatial_coords

Numeric matrix of spatial coordinates.

  • Rows: spatial locations (must match columns of expr_matrix)

  • Columns: x, y coordinates

X

Optional numeric matrix of covariates to regress out.

  • Rows: spatial locations (same order as spatial_coords)

  • Columns: covariates (e.g., batch, cell type indicators)

Default is NULL (intercept-only model).

n_neighbors

Integer. Number of nearest neighbors for NNGP model. Default is 10.

  • 5-10: Faster, captures local patterns

  • 15-20: Better likelihood estimates, slower

Values > 15 rarely improve results but increase computation time.

order

Character string specifying coordinate ordering scheme.

  • "AMMD" (default): Approximate Maximum Minimum Distance. Better for most datasets. Requires >= 65 spots.

  • "Sum_coords": Order by sum of coordinates. Use for very small datasets (< 65 spots).

cov_model

Character string specifying the covariance function. Default is "exponential".

  • "exponential": Most commonly used, computationally stable

  • "gaussian": Smoother patterns, requires stabilization

  • "spherical": Finite range correlation

  • "matern": Flexible smoothness (includes additional nu parameter)

adjust_method

Character string for p-value adjustment. Default is "BH" (Benjamini-Hochberg).

n_threads

Integer. Number of parallel threads. Default is 1. Set to number of available cores for faster computation.

verbose

Logical. Print progress messages. Default is FALSE.

Value

A data.frame with SVG detection results. Columns:

  • gene: Gene identifier

  • sigma.sq: Spatial variance estimate (sigma^2)

  • tau.sq: Nonspatial variance estimate (tau^2, nugget)

  • phi: Range parameter estimate (controls spatial correlation decay)

  • prop_sv: Proportion of spatial variance = sigma.sq / (sigma.sq + tau.sq)

  • loglik: Log-likelihood of spatial model

  • loglik_lm: Log-likelihood of non-spatial model (linear model)

  • LR_stat: Likelihood ratio test statistic = -2 * (loglik_lm - loglik)

  • rank: Rank by LR statistic (1 = highest)

  • p.value: P-value from chi-squared distribution (df = 2)

  • p.adj: Adjusted p-value

  • runtime: Computation time per gene (seconds)

Details

Method Overview:

nnSVG models gene expression as a Gaussian process: $$y = X\beta + \omega + \epsilon$$

where:

  • y = expression vector

  • X = covariate matrix, beta = coefficients

  • omega ~ GP(0, sigma^2 * C(phi)) = spatial random effect

  • epsilon ~ N(0, tau^2) = non-spatial noise

  • C(phi) = covariance function with range phi

Nearest-Neighbor Approximation: Full GP has O(n^3) complexity. NNGP approximates using only k nearest neighbors, reducing complexity to O(n * k^3) = O(n).

Statistical Test: Likelihood ratio test comparing:

  • H0 (null): y = X*beta + epsilon (no spatial effect)

  • H1 (alternative): y = X*beta + omega + epsilon (with spatial effect)

LR statistic follows chi-squared with df = 2 (testing sigma.sq and phi).

Effect Size: Proportion of spatial variance (prop_sv) measures effect size:

  • prop_sv near 1: Strong spatial pattern

  • prop_sv near 0: Little spatial structure

Computational Notes:

  • Requires BRISC package for NNGP fitting

  • O(n) complexity per gene with NNGP approximation

  • Parallelization over genes provides good speedup

  • Memory: O(n * k) per gene

References

Weber, L.M. et al. (2023) nnSVG for the scalable identification of spatially variable genes using nearest-neighbor Gaussian processes. Nature Communications.

Datta, A. et al. (2016) Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets. JASA.

See also

CalSVG, BRISC package documentation

Examples

# Load example data
data(example_svg_data)
expr <- example_svg_data$logcounts[1:10, ]  # Small subset
coords <- example_svg_data$spatial_coords

# \donttest{
# Basic usage (requires BRISC package)
if (requireNamespace("BRISC", quietly = TRUE)) {
    results <- CalSVG_nnSVG(expr, coords, verbose = FALSE)
    head(results)
}
# }