Introduction
scGate is a powerful R package for marker-based purification of cell types from heterogeneous single-cell RNA-seq datasets. Unlike reference-based methods, scGate does not require training data or reference gene expression profiles.
Key Features
- Marker-based gating: Define cell populations using positive and negative markers
- Hierarchical models: Build complex gating strategies similar to flow cytometry
- Multi-class classification: Annotate multiple cell types simultaneously
- Cross-platform: Works on Windows, macOS, and Linux
- Integration-friendly: Compatible with Seurat v4/v5 and batch-corrected data
Installation
# From CRAN (stable version)
install.packages("scGate")
# From GitHub (development version)
# remotes::install_github("Zaoqu-Liu/scGate")Quick Example
Load Example Data
scGate provides a small example dataset for testing:
# Load the built-in example dataset
data(query.seurat)
# Check the data
query.seurat
#> An object of class Seurat
#> 20388 features across 300 samples within 1 assay
#> Active assay: RNA (20388 features, 492 variable features)
#> 2 dimensional reductions calculated: pca, umapCreate a Simple Gating Model
The simplest way to use scGate is with a single marker:
# Create a model for B cells using MS4A1 (CD20) as marker
bcell_model <- gating_model(name = "Bcell", signature = c("MS4A1"))
# View the model structure
bcell_model
#> levels use_as name signature
#> 1 level1 positive Bcell MS4A1Building More Complex Models
Positive and Negative Markers
For more accurate gating, combine positive and negative markers:
# T cell model with positive (CD3) and negative (CD19) markers
tcell_model <- gating_model(name = "Tcell", signature = c("CD3D", "CD3E"))
tcell_model <- gating_model(
model = tcell_model,
name = "notBcell",
signature = c("CD19", "MS4A1"),
positive = FALSE # These are negative markers
)
tcell_model
#> levels use_as name signature
#> 1 level1 positive Tcell CD3D;CD3E
#> 2 level1 negative notBcell CD19;MS4A1Using Pre-defined Models
scGate provides a database of curated gating models:
# Download the model database
models_db <- get_scGateDB()
# Available models for human
names(models_db$human$generic)
# Use a pre-defined T cell model
tcell_model <- models_db$human$generic$TcellMulti-class Classification
scGate can annotate multiple cell types simultaneously:
# Define multiple models
model_list <- list(
"Bcell" = gating_model(name = "Bcell", signature = c("MS4A1", "CD19")),
"Tcell" = gating_model(name = "Tcell", signature = c("CD3D", "CD3E"))
)
# Apply all models at once
seurat_obj <- scGate(seurat_obj, model = model_list)
# Results are stored in scGate_multi column
table(seurat_obj$scGate_multi)Key Parameters
| Parameter | Description | Default |
|---|---|---|
pos.thr |
Threshold for positive signatures | 0.2 |
neg.thr |
Threshold for negative signatures | 0.2 |
reduction |
Dimensionality reduction to use | “calculate” |
k.param |
Number of neighbors for smoothing | 30 |
ncores |
Number of cores for parallel processing | 1 |
Session Info
sessionInfo()
#> R version 4.4.0 (2024-04-24)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS 15.6.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
#>
#> locale:
#> [1] C
#>
#> time zone: Asia/Shanghai
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_4.0.1 SeuratObject_4.1.4 Seurat_4.4.0 scGate_1.7.2
#>
#> loaded via a namespace (and not attached):
#> [1] deldir_2.0-4 pbapply_1.7-4 gridExtra_2.3
#> [4] rlang_1.1.7 magrittr_2.0.4 RcppAnnoy_0.0.23
#> [7] otel_0.2.0 spatstat.geom_3.7-0 matrixStats_1.5.0
#> [10] ggridges_0.5.7 compiler_4.4.0 png_0.1-8
#> [13] systemfonts_1.3.1 vctrs_0.7.1 reshape2_1.4.5
#> [16] stringr_1.6.0 pkgconfig_2.0.3 fastmap_1.2.0
#> [19] promises_1.5.0 rmarkdown_2.30 ragg_1.5.0
#> [22] purrr_1.2.1 xfun_0.56 cachem_1.1.0
#> [25] jsonlite_2.0.0 goftest_1.2-3 later_1.4.5
#> [28] BiocParallel_1.40.2 spatstat.utils_3.2-1 irlba_2.3.5.1
#> [31] parallel_4.4.0 cluster_2.1.8.1 R6_2.6.1
#> [34] ica_1.0-3 spatstat.data_3.1-9 stringi_1.8.7
#> [37] bslib_0.9.0 RColorBrewer_1.1-3 reticulate_1.44.1
#> [40] spatstat.univar_3.1-6 parallelly_1.46.1 lmtest_0.9-40
#> [43] jquerylib_0.1.4 scattermore_1.2 Rcpp_1.1.1
#> [46] knitr_1.51 tensor_1.5.1 future.apply_1.20.1
#> [49] zoo_1.8-15 sctransform_0.4.3 httpuv_1.6.16
#> [52] Matrix_1.7-4 splines_4.4.0 igraph_2.2.1
#> [55] tidyselect_1.2.1 abind_1.4-8 dichromat_2.0-0.1
#> [58] yaml_2.3.12 spatstat.random_3.4-4 spatstat.explore_3.7-0
#> [61] codetools_0.2-20 miniUI_0.1.2 listenv_0.10.0
#> [64] plyr_1.8.9 lattice_0.22-7 tibble_3.3.1
#> [67] withr_3.0.2 shiny_1.12.1 S7_0.2.1
#> [70] ROCR_1.0-12 evaluate_1.0.5 Rtsne_0.17
#> [73] future_1.69.0 desc_1.4.3 survival_3.8-3
#> [76] polyclip_1.10-7 fitdistrplus_1.2-5 pillar_1.11.1
#> [79] KernSmooth_2.23-26 plotly_4.11.0 generics_0.1.4
#> [82] sp_2.2-0 scales_1.4.0 globals_0.18.0
#> [85] xtable_1.8-4 glue_1.8.0 lazyeval_0.2.2
#> [88] tools_4.4.0 BiocNeighbors_2.0.1 data.table_1.18.0
#> [91] RANN_2.6.2 dotCall64_1.2 fs_1.6.6
#> [94] leiden_0.4.3.1 cowplot_1.2.0 grid_4.4.0
#> [97] tidyr_1.3.2 colorspace_2.1-2 nlme_3.1-168
#> [100] patchwork_1.3.2 cli_3.6.5 spatstat.sparse_3.1-0
#> [103] textshaping_1.0.4 spam_2.11-3 viridisLite_0.4.2
#> [106] dplyr_1.1.4 uwot_0.2.4 gtable_0.3.6
#> [109] sass_0.4.10 digest_0.6.39 progressr_0.18.0
#> [112] ggrepel_0.9.6 htmlwidgets_1.6.4 farver_2.1.2
#> [115] htmltools_0.5.9 pkgdown_2.1.3 lifecycle_1.0.5
#> [118] httr_1.4.7 mime_0.13 MASS_7.3-65