Generates a simulated spatial transcriptomics dataset with a mixture of spatially variable genes (SVGs) and non-spatially variable genes. Uses scientifically accurate count distributions (Negative Binomial).
Arguments
- n_spots
Integer. Number of spatial locations. Default is 500.
- n_genes
Integer. Total number of genes. Default is 200.
- n_svg
Integer. Number of spatially variable genes. Default is 50.
- grid_type
Character. Type of spatial layout.
"hexagonal"(default): Visium-like hexagonal grid"square": Square grid"random": Random spatial distribution
- pattern_types
Character vector. Types of spatial patterns for SVGs. Any combination of:
"gradient": Linear spatial gradient"hotspot": Localized expression hotspots"periodic": Periodic/oscillating patterns"cluster": Clustered expression
Default is all four types.
- mean_counts
Numeric. Mean expression level for baseline. Default is 50.
- dispersion
Numeric. Dispersion parameter for Negative Binomial. Smaller values = more overdispersion. Default is 5.
Value
A list containing:
counts: Matrix of gene counts (genes × spots)spatial_coords: Matrix of spatial coordinates (spots × 2)gene_info: Data.frame with gene metadata includingis_svg(TRUE/FALSE) andpattern_typelogcounts: Log-normalized counts (log2(counts + 1))
Details
Spatial Patterns:
Gradient: Expression increases linearly along x-axis
Hotspot: High expression in circular regions
Periodic: Sine wave pattern along x-axis
Cluster: Expression in spatially defined clusters
Count Distribution: Counts are drawn from Negative Binomial distribution: $$X \sim NB(\mu, \phi)$$ where μ is the mean (modulated by spatial pattern) and φ is dispersion.
Examples
# Set seed for reproducibility before calling
set.seed(42)
sim_data <- simulate_spatial_data(n_spots = 200, n_genes = 50, n_svg = 10)
str(sim_data, max.level = 1)
#> List of 5
#> $ counts : num [1:50, 1:200] 8 8 0 21 10 13 97 88 38 17 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ logcounts : num [1:50, 1:200] 3.17 3.17 0 4.46 3.46 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ spatial_coords: num [1:200, 1:2] 0 6.45 12.9 19.35 25.81 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ gene_info :'data.frame': 50 obs. of 3 variables:
#> $ params :List of 7
# \donttest{
# Use with SVG detection (requires RANN)
if (requireNamespace("RANN", quietly = TRUE)) {
results <- CalSVG_MERINGUE(sim_data$counts, sim_data$spatial_coords,
network_method = "knn", k = 10, verbose = FALSE)
}
# }