MERINGUE: Moran's I based Spatially Variable Gene Detection
Source:R/CalSVG_MERINGUE.R
CalSVG_MERINGUE.RdDetect spatially variable genes using the MERINGUE approach based on Moran's I spatial autocorrelation statistic.
Identifies spatially variable genes by computing Moran's I spatial autocorrelation statistic for each gene. Genes with significant positive spatial autocorrelation (similar expression values clustering together) are identified as SVGs.
Arguments
- expr_matrix
Numeric matrix of gene expression values.
Rows: genes
Columns: spatial locations (spots/cells)
Values: normalized expression (e.g., log-transformed counts)
Row names should be gene identifiers; column names should match row names of
spatial_coords.- spatial_coords
Numeric matrix of spatial coordinates.
Rows: spatial locations (must match columns of expr_matrix)
Columns: coordinate dimensions (x, y, and optionally z)
- network_method
Character string specifying how to construct the spatial neighborhood network.
"delaunay"(default): Delaunay triangulation. Creates natural neighbors based on geometric triangulation. Good for relatively uniform spatial distributions."knn": K-nearest neighbors. Each spot connected to its k nearest neighbors. More robust for irregular distributions.
- k
Integer. Number of neighbors for KNN method. Default is 10. Ignored when
network_method = "delaunay".Smaller k (e.g., 5-6): More local patterns, faster computation
Larger k (e.g., 15-20): Broader patterns, smoother results
- filter_dist
Numeric or NA. Maximum Euclidean distance for neighbors. Pairs with distance > filter_dist are not considered neighbors. Default is NA (no filtering). Useful for:
Removing long-range spurious connections
Focusing on local spatial patterns
- alternative
Character string specifying the alternative hypothesis for the Moran's I test.
"greater"(default): Test for positive autocorrelation (clustering of similar values). Most appropriate for SVG detection."less": Test for negative autocorrelation (dissimilar values as neighbors)."two.sided": Test for any autocorrelation.
- adjust_method
Character string specifying p-value adjustment method for multiple testing correction. Passed to
p.adjust(). Options include: "BH" (default, Benjamini-Hochberg), "bonferroni", "holm", "hochberg", "hommel", "BY", "fdr", "none".- min_pct_cells
Numeric (0-1). Minimum fraction of cells that must contribute to the spatial pattern for a gene to be retained as SVG. Default is 0.05 (5 to filter genes driven by only a few outlier cells. Set to 0 to disable this filter.
- n_threads
Integer. Number of threads for parallel computation. Default is 1.
For large datasets: Set to number of available cores
Uses R's parallel::mclapply (not available on Windows)
- use_cpp
Logical. Whether to use C++ implementation for faster computation. Default is TRUE. Falls back to R if C++ fails.
- verbose
Logical. Whether to print progress messages. Default is TRUE.
Value
A data.frame with SVG detection results, sorted by significance. Columns:
gene: Gene identifierobserved: Observed Moran's I statistic. Range: [-1, 1]. Positive values indicate clustering, negative indicate dispersion.expected: Expected Moran's I under null (approximately -1/(n-1))sd: Standard deviation under null hypothesisz_score: Standardized test statistic (observed - expected) / sdp.value: Raw p-value from normal approximationp.adj: Adjusted p-value (multiple testing corrected)
Details
Method Overview:
MERINGUE uses Moran's I, a classic measure of spatial autocorrelation: $$I = \frac{n}{W} \frac{\sum_i \sum_j w_{ij}(x_i - \bar{x})(x_j - \bar{x})}{\sum_i (x_i - \bar{x})^2}$$
where:
n = number of spatial locations
W = sum of all spatial weights
w_ij = spatial weight between locations i and j
x_i = expression value at location i
Interpretation:
I > 0: Positive autocorrelation (similar values cluster)
I = 0: Random spatial distribution
I < 0: Negative autocorrelation (checkerboard pattern)
Statistical Testing: P-values are computed using normal approximation based on analytical formulas for the expected value and variance of Moran's I under the null hypothesis of complete spatial randomness.
Computational Considerations:
Time complexity: O(n^2) for network construction, O(n*m) for testing (n = spots, m = genes)
Memory: O(n^2) for storing spatial weights matrix
For n > 10,000 spots, consider using KNN with small k
References
Miller, B.F. et al. (2021) Characterizing spatial gene expression heterogeneity in spatially resolved single-cell transcriptomic data with nonuniform cellular densities. Genome Research.
Moran, P.A.P. (1950) Notes on Continuous Stochastic Phenomena. Biometrika.
Cliff, A.D. and Ord, J.K. (1981) Spatial Processes: Models & Applications. Pion.
See also
CalSVG for unified interface,
buildSpatialNetwork for network construction,
moranI_test for individual gene testing
Examples
# Load example data
data(example_svg_data)
expr <- example_svg_data$logcounts[1:20, ] # Use subset for speed
coords <- example_svg_data$spatial_coords
# \donttest{
# Basic usage (requires RANN package for KNN)
if (requireNamespace("RANN", quietly = TRUE)) {
results <- CalSVG_MERINGUE(expr, coords,
network_method = "knn", k = 10,
verbose = FALSE)
head(results)
# Get significant SVGs
sig_genes <- results$gene[results$p.adj < 0.05]
}
# }