MERINGUE: Moran's I based Spatially Variable Gene Detection

Detect spatially variable genes using the MERINGUE approach based on Moran's I spatial autocorrelation statistic.

Identifies spatially variable genes by computing Moran's I spatial autocorrelation statistic for each gene. Genes with significant positive spatial autocorrelation (similar expression values clustering together) are identified as SVGs.

Usage

CalSVG_MERINGUE(
  expr_matrix,
  spatial_coords,
  network_method = c("delaunay", "knn"),
  k = 10L,
  filter_dist = NA,
  alternative = c("greater", "less", "two.sided"),
  adjust_method = "BH",
  min_pct_cells = 0.05,
  n_threads = 1L,
  use_cpp = TRUE,
  verbose = TRUE
)

Arguments

expr_matrix

Numeric matrix of gene expression values.

Rows: genes
Columns: spatial locations (spots/cells)
Values: normalized expression (e.g., log-transformed counts)

Row names should be gene identifiers; column names should match row names of spatial_coords.

spatial_coords

Numeric matrix of spatial coordinates.

Rows: spatial locations (must match columns of expr_matrix)
Columns: coordinate dimensions (x, y, and optionally z)

network_method

Character string specifying how to construct the spatial neighborhood network.

"delaunay" (default): Delaunay triangulation. Creates natural neighbors based on geometric triangulation. Good for relatively uniform spatial distributions.
"knn": K-nearest neighbors. Each spot connected to its k nearest neighbors. More robust for irregular distributions.

k

Integer. Number of neighbors for KNN method. Default is 10. Ignored when network_method = "delaunay".

Smaller k (e.g., 5-6): More local patterns, faster computation
Larger k (e.g., 15-20): Broader patterns, smoother results

filter_dist

Numeric or NA. Maximum Euclidean distance for neighbors. Pairs with distance > filter_dist are not considered neighbors. Default is NA (no filtering). Useful for:

Removing long-range spurious connections
Focusing on local spatial patterns

alternative

Character string specifying the alternative hypothesis for the Moran's I test.

"greater" (default): Test for positive autocorrelation (clustering of similar values). Most appropriate for SVG detection.
"less": Test for negative autocorrelation (dissimilar values as neighbors).
"two.sided": Test for any autocorrelation.

adjust_method

Character string specifying p-value adjustment method for multiple testing correction. Passed to p.adjust(). Options include: "BH" (default, Benjamini-Hochberg), "bonferroni", "holm", "hochberg", "hommel", "BY", "fdr", "none".

min_pct_cells

Numeric (0-1). Minimum fraction of cells that must contribute to the spatial pattern for a gene to be retained as SVG. Default is 0.05 (5 to filter genes driven by only a few outlier cells. Set to 0 to disable this filter.

n_threads

Integer. Number of threads for parallel computation. Default is 1.

For large datasets: Set to number of available cores
Uses R's parallel::mclapply (not available on Windows)

use_cpp

Logical. Whether to use C++ implementation for faster computation. Default is TRUE. Falls back to R if C++ fails.

verbose

Logical. Whether to print progress messages. Default is TRUE.

Value

A data.frame with SVG detection results, sorted by significance. Columns:

gene: Gene identifier
observed: Observed Moran's I statistic. Range: [-1, 1]. Positive values indicate clustering, negative indicate dispersion.
expected: Expected Moran's I under null (approximately -1/(n-1))
sd: Standard deviation under null hypothesis
z_score: Standardized test statistic (observed - expected) / sd
p.value: Raw p-value from normal approximation
p.adj: Adjusted p-value (multiple testing corrected)

Details

Method Overview:

MERINGUE uses Moran's I, a classic measure of spatial autocorrelation: $$I = \frac{n}{W} \frac{\sum_i \sum_j w_{ij}(x_i - \bar{x})(x_j - \bar{x})}{\sum_i (x_i - \bar{x})^2}$$

where:

n = number of spatial locations
W = sum of all spatial weights
w_ij = spatial weight between locations i and j
x_i = expression value at location i

Interpretation:

I > 0: Positive autocorrelation (similar values cluster)
I = 0: Random spatial distribution
I < 0: Negative autocorrelation (checkerboard pattern)

Statistical Testing: P-values are computed using normal approximation based on analytical formulas for the expected value and variance of Moran's I under the null hypothesis of complete spatial randomness.

Computational Considerations:

Time complexity: O(n^2) for network construction, O(n*m) for testing (n = spots, m = genes)
Memory: O(n^2) for storing spatial weights matrix
For n > 10,000 spots, consider using KNN with small k

References

Miller, B.F. et al. (2021) Characterizing spatial gene expression heterogeneity in spatially resolved single-cell transcriptomic data with nonuniform cellular densities. Genome Research.
Moran, P.A.P. (1950) Notes on Continuous Stochastic Phenomena. Biometrika.
Cliff, A.D. and Ord, J.K. (1981) Spatial Processes: Models & Applications. Pion.

Examples

# Load example data
data(example_svg_data)
expr <- example_svg_data$logcounts[1:20, ]  # Use subset for speed
coords <- example_svg_data$spatial_coords

# \donttest{
# Basic usage (requires RANN package for KNN)
if (requireNamespace("RANN", quietly = TRUE)) {
    results <- CalSVG_MERINGUE(expr, coords, 
                               network_method = "knn", k = 10,
                               verbose = FALSE)
    head(results)
    
    # Get significant SVGs
    sig_genes <- results$gene[results$p.adj < 0.05]
}
# }