Skip to contents

A comprehensive framework for evaluating and optimizing single-cell RNA-seq clustering results using self-projection machine learning approaches.

Details

The scClustEval package provides tools for:

  • Clustering Assessment: Evaluate the quality of cell clustering using self-projection with various machine learning classifiers

  • Clustering Optimization: Iteratively merge poorly discriminated clusters to achieve robust cell type identification

  • Visualization: ROC curves, confusion matrices, Sankey diagrams, and comprehensive assessment plots

  • Seurat Integration: Seamless workflow with Seurat objects

The core algorithm works by:

  1. Training a classifier to distinguish between clusters

  2. Evaluating prediction accuracy via cross-validation and hold-out testing

  3. Identifying cluster pairs that are difficult to discriminate

  4. Merging confused clusters and iterating until target accuracy is reached

Main Functions

sc_assessment

Core function for clustering assessment

sc_optimize

Single round of clustering optimization

sc_optimize_all

Full iterative optimization pipeline

RunAssessment

Seurat-style assessment function

RunOptimization

Seurat-style optimization function

Classifiers

The package supports multiple classifiers:

  • LR: Logistic Regression (L1/L2 regularization)

  • RF: Random Forest

  • SVM: Support Vector Machine

  • NB: Naive Bayes

  • DT: Decision Tree

  • XGB: XGBoost (if installed)

References

This package is an R implementation inspired by the SCCAF Python package: https://github.com/SCCAF/sccaf

Miao, Z., et al. (2020). Putative cell type discovery from single-cell gene expression data. Nature Methods.

Author

Zaoqu Liu liuzaoqu@163.com