Skip to contents

Iteratively optimize clustering until target accuracy is reached

Usage

sc_optimize_all(
  X,
  labels,
  min_accuracy = 0.9,
  max_rounds = 10,
  classifier = "LR",
  penalty = "l1",
  lambda = NULL,
  test_size = 0.5,
  n_per_class = 100,
  cv = 5,
  n_iter = 3,
  r1_cutoff = 0.5,
  r2_cutoff = 0.05,
  r1_step = 0.01,
  r2_step = 0.001,
  r1_mode = "1",
  use_r1_only = FALSE,
  use_r2_only = FALSE,
  use_distance = FALSE,
  dist_cutoff = 8,
  use_projection = FALSE,
  under_cluster_labels = NULL,
  min_outer_iter = 3,
  seed = 1,
  n_cores = NULL,
  verbose = TRUE
)

Arguments

X

Expression/feature matrix (cells x features)

labels

Initial cluster labels (should be over-clustered)

min_accuracy

Target minimum accuracy to achieve (default: 0.9)

max_rounds

Maximum optimization rounds (default: 10)

classifier

Classifier type

penalty

For LR: regularization type

lambda

For LR: regularization strength

test_size

Fraction for test set

n_per_class

Max samples per class

cv

CV folds

n_iter

Iterations per round for confusion matrix

r1_cutoff

Initial R1 cutoff (default: 0.5)

r2_cutoff

Initial R2 cutoff (default: 0.05)

r1_step

Step to reduce R1 cutoff each outer iteration (default: 0.01)

r2_step

Step to reduce R2 cutoff each outer iteration (default: 0.001)

r1_mode

R1 normalization mode: "1" or "2" (default: "1")

use_r1_only

Use only R1 for merging

use_r2_only

Use only R2 for merging

use_distance

Use distance matrix in merging decisions (default: FALSE)

dist_cutoff

Distance cutoff for merging (default: 8.0)

use_projection

Use self-projection labels for subsequent iterations (default: FALSE)

under_cluster_labels

Optional: under-clustering labels as constraint

min_outer_iter

Minimum outer iterations before allowing convergence

seed

Random seed

n_cores

Number of cores

verbose

Print progress

Value

A list containing:

final_labels

Final optimized cluster labels

initial_labels

Initial cluster labels

round_history

List of results from each round

accuracy_history

Vector of accuracies per round

n_clusters_history

Vector of cluster counts per round

final_accuracy

Final achieved accuracy

total_rounds

Number of rounds performed

converged

Whether target accuracy was reached

Details

The optimization proceeds in two levels:

  1. Outer iterations: Progressively lower the R1/R2 cutoffs

  2. Inner rounds: Merge clusters based on current cutoffs

The process continues until:

  • Target accuracy is reached

  • Maximum rounds are exceeded

  • No more clusters can be merged

Examples

if (FALSE) { # \dontrun{
# Optimize over-clustered result
result <- sc_optimize_all(
  X = expression_matrix,
  labels = over_clustered_labels,
  min_accuracy = 0.9,
  classifier = "LR"
)

# Get final labels
final_clusters <- result$final_labels
} # }