Core function for evaluating clustering quality using self-projection
Usage
self_projection(
X,
labels,
classifier = "LR",
penalty = "l1",
lambda = NULL,
test_size = 0.5,
n_per_class = NULL,
cv = 5,
seed = 1,
n_cores = NULL,
verbose = TRUE
)Arguments
- X
Expression/feature matrix (cells x features). Can be sparse.
- labels
Cluster labels for each cell
- classifier
Classifier type: "LR", "RF", "SVM", "NB", "DT", "XGB", "RANGER"
- penalty
For LR: regularization type "l1", "l2", or "elasticnet"
- lambda
For LR: regularization strength. If NULL, uses CV to select
- test_size
Fraction of data for testing (default: 0.5)
- n_per_class
Maximum samples per class in training set. If NULL, uses test_size
- cv
Number of cross-validation folds on training set (0 to skip CV)
- seed
Random seed for reproducibility
- n_cores
Number of cores for parallel processing (NULL = auto-detect)
- verbose
Print progress messages
Value
A list (class "scClustEval") containing:
- accuracy
Overall accuracy on test set
- cv_accuracy
Mean cross-validation accuracy (if cv > 0)
- train_accuracy
Accuracy on training set
- y_pred
Predicted labels for test set
- y_test
True labels for test set
- y_prob
Prediction probabilities (matrix)
- confusion_matrix
Confusion matrix
- r1_normalized
R1-normalized confusion matrix
- r2_normalized
R2-normalized confusion matrix
- per_class_accuracy
Per-cluster accuracy
- classifier
Trained classifier object
- classes
Unique class labels
- max_r1
Maximum R1 confusion value
- max_r2
Maximum R2 confusion value
Details
The self-projection method works by:
Splitting data into training and test sets (stratified by cluster)
Training a classifier on the training set
Evaluating prediction accuracy on the held-out test set
Computing confusion matrices to identify poorly discriminated clusters
High accuracy indicates that clusters are well-separated. Pairs of clusters that are frequently confused may need to be merged.
Examples
if (FALSE) { # \dontrun{
# Basic usage with expression matrix
result <- self_projection(
X = expression_matrix,
labels = cluster_assignments,
classifier = "LR"
)
print(result$accuracy)
# With random forest
result <- self_projection(
X = expression_matrix,
labels = cluster_assignments,
classifier = "RF",
n_per_class = 100
)
} # }