Apply scGate to filter specific cell types in a query dataset
Usage
scGate(
data,
model,
pos.thr = 0.2,
neg.thr = 0.2,
assay = NULL,
slot = "data",
ncores = 1,
BPPARAM = NULL,
seed = 123,
keep.ranks = FALSE,
reduction = c("calculate", "pca", "umap", "harmony"),
min.cells = 30,
nfeatures = 2000,
pca.dim = 30,
param_decay = 0.25,
maxRank = 1500,
output.col.name = "is.pure",
k.param = 30,
smooth.decay = 0.1,
smooth.up.only = FALSE,
genes.blacklist = "default",
return.CellOntology = TRUE,
multi.asNA = FALSE,
additional.signatures = NULL,
save.levels = FALSE,
verbose = FALSE,
progressbar = TRUE
)Arguments
- data
Seurat object containing a query data set - filtering will be applied to this object
- model
A single scGate model, or a list of scGate models. See Details for this format
- pos.thr
Minimum UCell score value for positive signatures
- neg.thr
Maximum UCell score value for negative signatures
- assay
Seurat assay to use
- slot
Data slot in Seurat object to calculate UCell scores. For Seurat V4: "counts", "data", or "scale.data". For Seurat V5: maps to corresponding layer. Default is "data" (log-normalized).
- ncores
Number of processors for parallel processing
- BPPARAM
A [BiocParallel::bpparam()] object that tells scGate how to parallelize. If provided, it overrides the `ncores` parameter.
- seed
Integer seed for random number generator
- keep.ranks
Store UCell rankings in Seurat object. This will speed up calculations if the same object is applied again with new signatures.
- reduction
Dimensionality reduction to use for knn smoothing. By default, calculates a new reduction based on the given
assay; otherwise you may specify a precalculated dimensionality reduction (e.g. in the case of an integrated dataset after batch-effect correction)- min.cells
Minimum number of cells to cluster or define cell types
- nfeatures
Number of variable genes for dimensionality reduction
- pca.dim
Number of principal components for dimensionality reduction
- param_decay
Controls decrease in parameter complexity at each iteration, between 0 and 1.
param_decay == 0gives no decay, increasingly higherparam_decaygives increasingly stronger decay- maxRank
Maximum number of genes that UCell will rank per cell
- output.col.name
Column name with 'pure/impure' annotation
- k.param
Number of nearest neighbors for knn smoothing
- smooth.decay
Decay parameter for knn weights: (1-decay)^n
- smooth.up.only
If TRUE, only let smoothing increase signature scores
- genes.blacklist
Genes blacklisted from variable features. The default loads the list of genes in
scGate::genes.blacklist.default; you may deactivate blacklisting by settinggenes.blacklist=NULL- return.CellOntology
If TRUE Cell ontology name and id are returned as additional metadata columns when running multiple models.
- multi.asNA
How to label cells that are "Pure" for multiple annotations: "Multi" (FALSE) or NA (TRUE)
- additional.signatures
A list of additional signatures, not included in the model, to be evaluated (e.g. a cycling signature). The scores for this list of signatures will be returned but not used for filtering.
- save.levels
Whether to save in metadata the filtering output for each gating model level
- verbose
Verbose output
- progressbar
Whether to show a progressbar or not
Value
A new metadata column is.pure is added to the query Seurat object, indicating which cells passed the scGate filter.
The active.ident is also set to this variable.
Details
Models for scGate are data frames where each line is a signature for a given filtering level.
A database of models can be downloaded using the function get_scGateDB.
You may directly use the models from the database, or edit one of these models to generate your own custom gating model.
Multiple models can also be evaluated at once, by running scGate with a list of models. Gating for each individual model is
returned as metadata, with a consensus annotation stored in scGate_multi metadata field. This allows using scGate as a
multi-class classifier, where only cells that are "Pure" for a single model are assigned a label, cells that are "Pure" for
more than one gating model are labeled as "Multi", all others cells are annotated as NA.
Examples
# \donttest{
### Test using a small toy set
data(query.seurat)
# Define basic gating model for B cells
my_scGate_model <- gating_model(name = "Bcell", signature = c("MS4A1"))
query.seurat <- scGate(query.seurat, model = my_scGate_model, reduction="pca")
#>
#> ### Detected a total of 28 pure 'Target' cells (9.33% of total)
table(query.seurat$is.pure)
#>
#> Pure Impure
#> 28 272
# }
if (FALSE) { # \dontrun{
### Test with larger datasets
library(Seurat)
testing.datasets <- get_testing_data(version = 'hsa.latest')
seurat_object <- testing.datasets[["JerbyArnon"]]
# Download pre-defined models
models <- get_scGateDB()
seurat_object <- scGate(seurat_object, model=models$human$generic$PanBcell)
DimPlot(seurat_object)
seurat_object_filtered <- subset(seurat_object, subset=is.pure=="Pure")
### Run multiple models at once
models <- get_scGateDB()
model.list <- list("Bcell" = models$human$generic$Bcell,
"Tcell" = models$human$generic$Tcell)
seurat_object <- scGate(seurat_object, model=model.list)
DimPlot(seurat_object, group.by = "scGate_multi")
} # }