Skip to contents

Generate simulated bulk RNA-seq samples from single-cell RNA-seq data stored in a Seurat object. This function creates artificial bulk samples by randomly sampling and aggregating single cells with known cell type proportions.

Usage

SimulateBulk(
  object,
  n_samples = 1000L,
  cells_per_sample = 100L,
  celltype_col = NULL,
  assay = NULL,
  unknown_celltypes = NULL,
  sparse_fraction = 0.5,
  min_celltypes = 1L,
  seed = NULL,
  verbose = TRUE,
  n_cores = 1L
)

Arguments

object

A Seurat object containing single-cell RNA-seq data.

n_samples

Integer. Number of bulk samples to simulate. Default is 1000.

cells_per_sample

Integer. Number of cells to aggregate per sample. Default is 100.

celltype_col

Character. Name of metadata column containing cell type labels. If NULL, uses active identity (Idents). Default is NULL.

assay

Character. Name of assay to use. Default is NULL (uses default assay).

unknown_celltypes

Character vector. Cell types to merge into "Unknown" category. Default is NULL (no merging).

sparse_fraction

Numeric. Fraction of samples that should be "sparse" (missing some cell types). Value between 0 and 1. Default is 0.5.

min_celltypes

Integer. Minimum number of cell types in sparse samples. Default is 1.

seed

Integer. Random seed for reproducibility. Default is NULL.

verbose

Logical. Print progress messages. Default is TRUE.

n_cores

Integer. Number of cores for parallel processing. Default is 1.

Value

A list containing:

bulk_counts

Matrix of simulated bulk expression (genes x samples)

cell_fractions

Data frame of true cell type fractions (samples x cell types)

celltypes

Character vector of cell type names

genes

Character vector of gene names

metadata

List of simulation parameters

Details

The simulation process:

  1. Generate random cell type fractions that sum to 1

  2. Sample cells according to these fractions

  3. Sum expression values across sampled cells

  4. Create both "normal" (all cell types) and "sparse" (subset of cell types) samples

Examples

if (FALSE) { # \dontrun{
# Basic simulation
sim_data <- SimulateBulk(seurat_obj, n_samples = 1000)

# Custom simulation with specific parameters
sim_data <- SimulateBulk(
  seurat_obj,
  n_samples = 2000,
  cells_per_sample = 200,
  celltype_col = "cell_annotation",
  sparse_fraction = 0.3,
  seed = 42
)
} # }