Split data into training and test sets while maintaining class proportions
Usage
train_test_split_stratified(
X,
y,
test_size = 0.5,
n_per_class = NULL,
seed = NULL
)
Arguments
- X
Feature matrix (cells x features)
- y
Class labels (factor or character vector)
- test_size
Fraction of data for testing (default: 0.5)
- n_per_class
Maximum number of samples per class in training set.
If NULL, uses test_size fraction
- seed
Random seed for reproducibility
Value
A list with components:
- X_train
Training feature matrix
- X_test
Test feature matrix
- y_train
Training labels
- y_test
Test labels
- train_idx
Indices of training samples
- test_idx
Indices of test samples