Skip to contents

📖 Documentation: https://zaoqu-liu.github.io/MultiK/

Overview

MultiK is an R package for objective determination of optimal cluster numbers in single-cell RNA sequencing (scRNA-seq) data. It addresses one of the most challenging questions in unsupervised clustering analysis: “How many distinct cell populations exist in my dataset?”

Key Features

  • 🎯 Data-driven K selection using consensus clustering
  • 📊 PAC metric for quantifying clustering stability
  • 📈 Statistical validation via SigClust testing
  • Parallel processing for computational efficiency
  • 🔬 Seurat v4/v5 compatible

Installation

# From R-universe (recommended)
install.packages("MultiK", repos = "https://zaoqu-liu.r-universe.dev")

# From GitHub
# install.packages("remotes")
remotes::install_github("Zaoqu-Liu/MultiK")

Quick Start

library(MultiK)
library(Seurat)

# Load example data
data(p3cl)

# Step 1: Run MultiK algorithm
result <- MultiK(p3cl, reps = 100, cores = 4, seed = 42)

# Step 2: Visualize K selection diagnostics
DiagMultiKPlot(result$k, result$consensus)

# Step 3: Get cluster assignments at optimal K
clusters <- getClusters(p3cl, optK = 3)

# Step 4: Statistical validation
pval <- CalcSigClust(p3cl, clusters$clusters[, 1], nsim = 100)
PlotSigClust(p3cl, clusters$clusters[, 1], pval)

Methodology

Consensus Clustering

MultiK employs a subsampling-based consensus clustering approach:

  1. Subsampling: Randomly sample 80% of cells in each iteration
  2. Clustering: Apply Seurat clustering across resolution parameters
  3. Consensus Matrix: Track co-clustering frequency for each K
  4. PAC Calculation: Quantify clustering ambiguity

Optimal K Selection

The optimal K is selected using a Pareto optimization framework that balances:

  • Frequency: How often K appears across resolutions
  • Stability: Inverse of relative PAC (rPAC)

Statistical Validation

Cluster separability is validated using SigClust, which tests whether observed cluster separation exceeds what would be expected by chance under a Gaussian null hypothesis.

Functions

Function Description
MultiK() Core consensus clustering algorithm
DiagMultiKPlot() Diagnostic visualization for K selection
getClusters() Extract cluster assignments at specified K
CalcSigClust() Pairwise statistical significance testing
PlotSigClust() Visualize cluster hierarchy and significance

Citation

If you use MultiK in your research, please cite:

@software{multik2025,
  author = {Liu, Zaoqu},
  title = {MultiK: Multi-Resolution Consensus Clustering for Single-Cell RNA-seq},
  year = {2025},
  url = {https://github.com/Zaoqu-Liu/MultiK}
}

Author

Zaoqu Liu, PhD

License

MIT License © 2025 Zaoqu Liu