Skip to contents

Core function for evaluating clustering quality using self-projection

Usage

self_projection(
  X,
  labels,
  classifier = "LR",
  penalty = "l1",
  lambda = NULL,
  test_size = 0.5,
  n_per_class = NULL,
  cv = 5,
  seed = 1,
  n_cores = NULL,
  verbose = TRUE
)

Arguments

X

Expression/feature matrix (cells x features). Can be sparse.

labels

Cluster labels for each cell

classifier

Classifier type: "LR", "RF", "SVM", "NB", "DT", "XGB", "RANGER"

penalty

For LR: regularization type "l1", "l2", or "elasticnet"

lambda

For LR: regularization strength. If NULL, uses CV to select

test_size

Fraction of data for testing (default: 0.5)

n_per_class

Maximum samples per class in training set. If NULL, uses test_size

cv

Number of cross-validation folds on training set (0 to skip CV)

seed

Random seed for reproducibility

n_cores

Number of cores for parallel processing (NULL = auto-detect)

verbose

Print progress messages

Value

A list (class "scClustEval") containing:

accuracy

Overall accuracy on test set

cv_accuracy

Mean cross-validation accuracy (if cv > 0)

train_accuracy

Accuracy on training set

y_pred

Predicted labels for test set

y_test

True labels for test set

y_prob

Prediction probabilities (matrix)

confusion_matrix

Confusion matrix

r1_normalized

R1-normalized confusion matrix

r2_normalized

R2-normalized confusion matrix

per_class_accuracy

Per-cluster accuracy

classifier

Trained classifier object

classes

Unique class labels

max_r1

Maximum R1 confusion value

max_r2

Maximum R2 confusion value

Details

The self-projection method works by:

  1. Splitting data into training and test sets (stratified by cluster)

  2. Training a classifier on the training set

  3. Evaluating prediction accuracy on the held-out test set

  4. Computing confusion matrices to identify poorly discriminated clusters

High accuracy indicates that clusters are well-separated. Pairs of clusters that are frequently confused may need to be merged.

Examples

if (FALSE) { # \dontrun{
# Basic usage with expression matrix
result <- self_projection(
  X = expression_matrix,
  labels = cluster_assignments,
  classifier = "LR"
)
print(result$accuracy)

# With random forest
result <- self_projection(
  X = expression_matrix,
  labels = cluster_assignments,
  classifier = "RF",
  n_per_class = 100
)
} # }