Detect spatially variable genes using SPARK-X, a non-parametric method that tests for spatial expression patterns using multiple kernels.
SPARK-X is a scalable non-parametric method for identifying spatially variable genes. It uses variance component score tests with multiple spatial kernels (projection, Gaussian, and cosine) to detect various types of spatial expression patterns.
Usage
CalSVG_SPARKX(
expr_matrix,
spatial_coords,
kernel_option = c("mixture", "single"),
adjust_method = "BY",
n_threads = 1L,
verbose = TRUE
)Arguments
- expr_matrix
Numeric matrix of gene expression values.
Rows: genes
Columns: spatial locations (spots/cells)
Values: raw counts or normalized counts (NOT log-transformed)
Note: SPARK-X works best with count data, not log-transformed data.
- spatial_coords
Numeric matrix of spatial coordinates.
Rows: spatial locations (must match columns of expr_matrix)
Columns: x, y coordinates
- kernel_option
Character string specifying which kernels to use.
"mixture"(default): Test with all 11 kernels: 1 projection + 5 Gaussian + 5 cosine. Most comprehensive but slower. Recommended for detecting diverse spatial patterns."single": Test with projection kernel only. Faster but may miss some pattern types.
- adjust_method
Character string for p-value adjustment. Default is "BY" (Benjamini-Yekutieli), which is more conservative and appropriate when tests may be correlated. Other options: "BH", "bonferroni", "holm", "none".
- n_threads
Integer. Number of parallel threads. Default is 1. Higher values significantly speed up computation for large datasets.
- verbose
Logical. Print progress messages. Default is TRUE.
Value
A data.frame with SVG detection results. Columns:
gene: Gene identifierp.value: Combined p-value across all kernels (ACAT method)p.adj: Multiple testing adjusted p-valueIf
kernel_option = "mixture", additional columns for individual kernel statistics and p-values (stat_*, pval_*)
Details
Method Overview:
SPARK-X uses a variance component score test framework: $$T_g = \frac{n \cdot y_g^T K y_g}{\|y_g\|^2}$$
where:
y_g = expression vector for gene g
K = spatial kernel matrix (derived from coordinates)
n = number of spatial locations
Kernel Types:
Projection kernel: Linear kernel based on scaled coordinates. Detects gradients and linear spatial trends.Gaussian kernels: Multiple bandwidth Gaussian RBF kernels. Detect localized hotspots of different sizes.Cosine kernels: Multiple frequency periodic kernels. Detect periodic/oscillating spatial patterns.
P-value Computation:
Individual kernel p-values: Davies' method for quadratic forms
Combined p-value: ACAT (Aggregated Cauchy Association Test)
Advantages:
Non-parametric: No distributional assumptions
Scalable: O(n) complexity, handles millions of cells
Multiple kernels: Detects diverse pattern types
Robust: ACAT combination handles correlated tests
Computational Considerations:
mixtureoption: ~11x slower thansingleMemory: O(n) per gene, efficient for large datasets
Parallelization provides near-linear speedup
References
Zhu, J., Sun, S., & Zhou, X. (2021). SPARK-X: non-parametric modeling enables scalable and robust detection of spatial expression patterns for large spatial transcriptomic studies. Genome Biology.
Examples
# Load example data
data(example_svg_data)
expr <- example_svg_data$counts[1:20, ] # Use counts (not log)
coords <- example_svg_data$spatial_coords
# Fast mode with single kernel (no extra dependencies)
results <- CalSVG_SPARKX(expr, coords,
kernel_option = "single",
verbose = FALSE)
head(results)
#> gene p.value p.adj stat_linear pval_linear
#> 1 gene_4 2.326245e-170 1.673845e-168 207.02459 2.326245e-170
#> 2 gene_12 6.624465e-154 2.383310e-152 283.48558 6.624465e-154
#> 3 gene_5 4.116361e-153 9.873063e-152 262.55227 4.116361e-153
#> 4 gene_10 1.820755e-149 3.275302e-148 183.99255 1.820755e-149
#> 5 gene_7 5.268820e-105 7.582337e-104 294.45147 5.268820e-105
#> 6 gene_14 1.151904e-51 1.381416e-50 88.57613 1.151904e-51