Stratified train-test split — train_test_split_stratified • scClustEval

Split data into training and test sets while maintaining class proportions

Usage

train_test_split_stratified(
  X,
  y,
  test_size = 0.5,
  n_per_class = NULL,
  seed = NULL
)

Arguments

X: Feature matrix (cells x features)
y: Class labels (factor or character vector)
test_size: Fraction of data for testing (default: 0.5)
n_per_class: Maximum number of samples per class in training set. If NULL, uses test_size fraction
seed: Random seed for reproducibility

Value

A list with components:

X_train: Training feature matrix
X_test: Test feature matrix
y_train: Training labels
y_test: Test labels
train_idx: Indices of training samples
test_idx: Indices of test samples