Skip to contents

Generates a simulated spatial transcriptomics dataset with a mixture of spatially variable genes (SVGs) and non-spatially variable genes. Uses scientifically accurate count distributions (Negative Binomial).

Usage

simulate_spatial_data(
  n_spots = 500,
  n_genes = 200,
  n_svg = 50,
  grid_type = c("hexagonal", "square", "random"),
  pattern_types = c("gradient", "hotspot", "periodic", "cluster"),
  mean_counts = 50,
  dispersion = 5
)

Arguments

n_spots

Integer. Number of spatial locations. Default is 500.

n_genes

Integer. Total number of genes. Default is 200.

n_svg

Integer. Number of spatially variable genes. Default is 50.

grid_type

Character. Type of spatial layout.

  • "hexagonal" (default): Visium-like hexagonal grid

  • "square": Square grid

  • "random": Random spatial distribution

pattern_types

Character vector. Types of spatial patterns for SVGs. Any combination of:

  • "gradient": Linear spatial gradient

  • "hotspot": Localized expression hotspots

  • "periodic": Periodic/oscillating patterns

  • "cluster": Clustered expression

Default is all four types.

mean_counts

Numeric. Mean expression level for baseline. Default is 50.

dispersion

Numeric. Dispersion parameter for Negative Binomial. Smaller values = more overdispersion. Default is 5.

Value

A list containing:

  • counts: Matrix of gene counts (genes × spots)

  • spatial_coords: Matrix of spatial coordinates (spots × 2)

  • gene_info: Data.frame with gene metadata including is_svg (TRUE/FALSE) and pattern_type

  • logcounts: Log-normalized counts (log2(counts + 1))

Details

Spatial Patterns:

  • Gradient: Expression increases linearly along x-axis

  • Hotspot: High expression in circular regions

  • Periodic: Sine wave pattern along x-axis

  • Cluster: Expression in spatially defined clusters

Count Distribution: Counts are drawn from Negative Binomial distribution: $$X \sim NB(\mu, \phi)$$ where μ is the mean (modulated by spatial pattern) and φ is dispersion.

Examples

# Set seed for reproducibility before calling
set.seed(42)
sim_data <- simulate_spatial_data(n_spots = 200, n_genes = 50, n_svg = 10)
str(sim_data, max.level = 1)
#> List of 5
#>  $ counts        : num [1:50, 1:200] 8 8 0 21 10 13 97 88 38 17 ...
#>   ..- attr(*, "dimnames")=List of 2
#>  $ logcounts     : num [1:50, 1:200] 3.17 3.17 0 4.46 3.46 ...
#>   ..- attr(*, "dimnames")=List of 2
#>  $ spatial_coords: num [1:200, 1:2] 0 6.45 12.9 19.35 25.81 ...
#>   ..- attr(*, "dimnames")=List of 2
#>  $ gene_info     :'data.frame':	50 obs. of  3 variables:
#>  $ params        :List of 7

# \donttest{
# Use with SVG detection (requires RANN)
if (requireNamespace("RANN", quietly = TRUE)) {
    results <- CalSVG_MERINGUE(sim_data$counts, sim_data$spatial_coords,
                               network_method = "knn", k = 10, verbose = FALSE)
}
# }