📖 Documentation: https://zaoqu-liu.github.io/MultiK/
Overview
MultiK is an R package for objective determination of optimal cluster numbers in single-cell RNA sequencing (scRNA-seq) data. It addresses one of the most challenging questions in unsupervised clustering analysis: “How many distinct cell populations exist in my dataset?”
Installation
# From R-universe (recommended)
install.packages("MultiK", repos = "https://zaoqu-liu.r-universe.dev")
# From GitHub
# install.packages("remotes")
remotes::install_github("Zaoqu-Liu/MultiK")Quick Start
library(MultiK)
library(Seurat)
# Load example data
data(p3cl)
# Step 1: Run MultiK algorithm
result <- MultiK(p3cl, reps = 100, cores = 4, seed = 42)
# Step 2: Visualize K selection diagnostics
DiagMultiKPlot(result$k, result$consensus)
# Step 3: Get cluster assignments at optimal K
clusters <- getClusters(p3cl, optK = 3)
# Step 4: Statistical validation
pval <- CalcSigClust(p3cl, clusters$clusters[, 1], nsim = 100)
PlotSigClust(p3cl, clusters$clusters[, 1], pval)Methodology
Consensus Clustering
MultiK employs a subsampling-based consensus clustering approach:
- Subsampling: Randomly sample 80% of cells in each iteration
- Clustering: Apply Seurat clustering across resolution parameters
- Consensus Matrix: Track co-clustering frequency for each K
- PAC Calculation: Quantify clustering ambiguity
Functions
| Function | Description |
|---|---|
MultiK() |
Core consensus clustering algorithm |
DiagMultiKPlot() |
Diagnostic visualization for K selection |
getClusters() |
Extract cluster assignments at specified K |
CalcSigClust() |
Pairwise statistical significance testing |
PlotSigClust() |
Visualize cluster hierarchy and significance |
Documentation
📖 Full documentation: https://zaoqu-liu.github.io/MultiK