Generate simulated bulk RNA-seq samples from single-cell RNA-seq data stored in a Seurat object. This function creates artificial bulk samples by randomly sampling and aggregating single cells with known cell type proportions.
Usage
SimulateBulk(
object,
n_samples = 1000L,
cells_per_sample = 100L,
celltype_col = NULL,
assay = NULL,
unknown_celltypes = NULL,
sparse_fraction = 0.5,
min_celltypes = 1L,
seed = NULL,
verbose = TRUE,
n_cores = 1L
)Arguments
- object
A Seurat object containing single-cell RNA-seq data.
- n_samples
Integer. Number of bulk samples to simulate. Default is 1000.
- cells_per_sample
Integer. Number of cells to aggregate per sample. Default is 100.
- celltype_col
Character. Name of metadata column containing cell type labels. If NULL, uses active identity (Idents). Default is NULL.
- assay
Character. Name of assay to use. Default is NULL (uses default assay).
- unknown_celltypes
Character vector. Cell types to merge into "Unknown" category. Default is NULL (no merging).
- sparse_fraction
Numeric. Fraction of samples that should be "sparse" (missing some cell types). Value between 0 and 1. Default is 0.5.
- min_celltypes
Integer. Minimum number of cell types in sparse samples. Default is 1.
- seed
Integer. Random seed for reproducibility. Default is NULL.
- verbose
Logical. Print progress messages. Default is TRUE.
- n_cores
Integer. Number of cores for parallel processing. Default is 1.
Value
A list containing:
- bulk_counts
Matrix of simulated bulk expression (genes x samples)
- cell_fractions
Data frame of true cell type fractions (samples x cell types)
- celltypes
Character vector of cell type names
- genes
Character vector of gene names
- metadata
List of simulation parameters
Details
The simulation process:
Generate random cell type fractions that sum to 1
Sample cells according to these fractions
Sum expression values across sampled cells
Create both "normal" (all cell types) and "sparse" (subset of cell types) samples
Examples
if (FALSE) { # \dontrun{
# Basic simulation
sim_data <- SimulateBulk(seurat_obj, n_samples = 1000)
# Custom simulation with specific parameters
sim_data <- SimulateBulk(
seurat_obj,
n_samples = 2000,
cells_per_sample = 200,
celltype_col = "cell_annotation",
sparse_fraction = 0.3,
seed = 42
)
} # }