Skip to contents

Introduction

scGate is a powerful R package for marker-based purification of cell types from heterogeneous single-cell RNA-seq datasets. Unlike reference-based methods, scGate does not require training data or reference gene expression profiles.

Key Features

  • Marker-based gating: Define cell populations using positive and negative markers
  • Hierarchical models: Build complex gating strategies similar to flow cytometry
  • Multi-class classification: Annotate multiple cell types simultaneously
  • Cross-platform: Works on Windows, macOS, and Linux
  • Integration-friendly: Compatible with Seurat v4/v5 and batch-corrected data

Installation

# From CRAN (stable version)
install.packages("scGate")

# From GitHub (development version)
# remotes::install_github("Zaoqu-Liu/scGate")

Quick Example

Load Required Packages

Load Example Data

scGate provides a small example dataset for testing:

# Load the built-in example dataset
data(query.seurat)

# Check the data
query.seurat
#> An object of class Seurat 
#> 20388 features across 300 samples within 1 assay 
#> Active assay: RNA (20388 features, 492 variable features)
#>  2 dimensional reductions calculated: pca, umap

Create a Simple Gating Model

The simplest way to use scGate is with a single marker:

# Create a model for B cells using MS4A1 (CD20) as marker
bcell_model <- gating_model(name = "Bcell", signature = c("MS4A1"))

# View the model structure
bcell_model
#>   levels   use_as  name signature
#> 1 level1 positive Bcell     MS4A1

Apply scGate

# Apply the gating model
# Using pre-computed PCA for speed
query.seurat <- scGate(
  data = query.seurat, 
  model = bcell_model,
  reduction = "pca"  # Use existing PCA
)

# Check results
table(query.seurat$is.pure)

Visualize Results

# Create visualization
p1 <- DimPlot(query.seurat, group.by = "cell_type", label = TRUE, repel = TRUE) +
  ggtitle("Original Annotation") +
  NoLegend()

p2 <- DimPlot(query.seurat, group.by = "is.pure", 
              cols = c("Pure" = "#00ae60", "Impure" = "gray80")) +
  ggtitle("scGate Result")

p1 + p2

Building More Complex Models

Positive and Negative Markers

For more accurate gating, combine positive and negative markers:

# T cell model with positive (CD3) and negative (CD19) markers
tcell_model <- gating_model(name = "Tcell", signature = c("CD3D", "CD3E"))
tcell_model <- gating_model(
  model = tcell_model, 
  name = "notBcell", 
  signature = c("CD19", "MS4A1"),
  positive = FALSE  # These are negative markers
)

tcell_model
#>   levels   use_as     name  signature
#> 1 level1 positive    Tcell  CD3D;CD3E
#> 2 level1 negative notBcell CD19;MS4A1

Using Pre-defined Models

scGate provides a database of curated gating models:

# Download the model database
models_db <- get_scGateDB()

# Available models for human
names(models_db$human$generic)

# Use a pre-defined T cell model
tcell_model <- models_db$human$generic$Tcell

Multi-class Classification

scGate can annotate multiple cell types simultaneously:

# Define multiple models
model_list <- list(
  "Bcell" = gating_model(name = "Bcell", signature = c("MS4A1", "CD19")),
  "Tcell" = gating_model(name = "Tcell", signature = c("CD3D", "CD3E"))
)

# Apply all models at once
seurat_obj <- scGate(seurat_obj, model = model_list)

# Results are stored in scGate_multi column
table(seurat_obj$scGate_multi)

Key Parameters

Parameter Description Default
pos.thr Threshold for positive signatures 0.2
neg.thr Threshold for negative signatures 0.2
reduction Dimensionality reduction to use “calculate”
k.param Number of neighbors for smoothing 30
ncores Number of cores for parallel processing 1

Session Info

sessionInfo()
#> R version 4.4.0 (2024-04-24)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS 15.6.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] C
#> 
#> time zone: Asia/Shanghai
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_4.0.1      SeuratObject_4.1.4 Seurat_4.4.0       scGate_1.7.2      
#> 
#> loaded via a namespace (and not attached):
#>   [1] deldir_2.0-4           pbapply_1.7-4          gridExtra_2.3         
#>   [4] rlang_1.1.7            magrittr_2.0.4         RcppAnnoy_0.0.23      
#>   [7] otel_0.2.0             spatstat.geom_3.7-0    matrixStats_1.5.0     
#>  [10] ggridges_0.5.7         compiler_4.4.0         png_0.1-8             
#>  [13] systemfonts_1.3.1      vctrs_0.7.1            reshape2_1.4.5        
#>  [16] stringr_1.6.0          pkgconfig_2.0.3        fastmap_1.2.0         
#>  [19] promises_1.5.0         rmarkdown_2.30         ragg_1.5.0            
#>  [22] purrr_1.2.1            xfun_0.56              cachem_1.1.0          
#>  [25] jsonlite_2.0.0         goftest_1.2-3          later_1.4.5           
#>  [28] BiocParallel_1.40.2    spatstat.utils_3.2-1   irlba_2.3.5.1         
#>  [31] parallel_4.4.0         cluster_2.1.8.1        R6_2.6.1              
#>  [34] ica_1.0-3              spatstat.data_3.1-9    stringi_1.8.7         
#>  [37] bslib_0.9.0            RColorBrewer_1.1-3     reticulate_1.44.1     
#>  [40] spatstat.univar_3.1-6  parallelly_1.46.1      lmtest_0.9-40         
#>  [43] jquerylib_0.1.4        scattermore_1.2        Rcpp_1.1.1            
#>  [46] knitr_1.51             tensor_1.5.1           future.apply_1.20.1   
#>  [49] zoo_1.8-15             sctransform_0.4.3      httpuv_1.6.16         
#>  [52] Matrix_1.7-4           splines_4.4.0          igraph_2.2.1          
#>  [55] tidyselect_1.2.1       abind_1.4-8            dichromat_2.0-0.1     
#>  [58] yaml_2.3.12            spatstat.random_3.4-4  spatstat.explore_3.7-0
#>  [61] codetools_0.2-20       miniUI_0.1.2           listenv_0.10.0        
#>  [64] plyr_1.8.9             lattice_0.22-7         tibble_3.3.1          
#>  [67] withr_3.0.2            shiny_1.12.1           S7_0.2.1              
#>  [70] ROCR_1.0-12            evaluate_1.0.5         Rtsne_0.17            
#>  [73] future_1.69.0          desc_1.4.3             survival_3.8-3        
#>  [76] polyclip_1.10-7        fitdistrplus_1.2-5     pillar_1.11.1         
#>  [79] KernSmooth_2.23-26     plotly_4.11.0          generics_0.1.4        
#>  [82] sp_2.2-0               scales_1.4.0           globals_0.18.0        
#>  [85] xtable_1.8-4           glue_1.8.0             lazyeval_0.2.2        
#>  [88] tools_4.4.0            BiocNeighbors_2.0.1    data.table_1.18.0     
#>  [91] RANN_2.6.2             dotCall64_1.2          fs_1.6.6              
#>  [94] leiden_0.4.3.1         cowplot_1.2.0          grid_4.4.0            
#>  [97] tidyr_1.3.2            colorspace_2.1-2       nlme_3.1-168          
#> [100] patchwork_1.3.2        cli_3.6.5              spatstat.sparse_3.1-0 
#> [103] textshaping_1.0.4      spam_2.11-3            viridisLite_0.4.2     
#> [106] dplyr_1.1.4            uwot_0.2.4             gtable_0.3.6          
#> [109] sass_0.4.10            digest_0.6.39          progressr_0.18.0      
#> [112] ggrepel_0.9.6          htmlwidgets_1.6.4      farver_2.1.2          
#> [115] htmltools_0.5.9        pkgdown_2.1.3          lifecycle_1.0.5       
#> [118] httr_1.4.7             mime_0.13              MASS_7.3-65