Skip to contents

Split data into training and test sets while maintaining class proportions

Usage

train_test_split_stratified(
  X,
  y,
  test_size = 0.5,
  n_per_class = NULL,
  seed = NULL
)

Arguments

X

Feature matrix (cells x features)

y

Class labels (factor or character vector)

test_size

Fraction of data for testing (default: 0.5)

n_per_class

Maximum number of samples per class in training set. If NULL, uses test_size fraction

seed

Random seed for reproducibility

Value

A list with components:

X_train

Training feature matrix

X_test

Test feature matrix

y_train

Training labels

y_test

Test labels

train_idx

Indices of training samples

test_idx

Indices of test samples