Get ligand importances from a multi-ligand classfication model.
Source:R/evaluate_model_ligand_prediction.R
get_multi_ligand_importances.Rdget_multi_ligand_importances A classificiation algorithm chosen by the user is trained to construct one model based on the target gene predictions of all ligands of interest (ligands are considered as features) in order to predict the observed response in a particular dataset. Variable importance scores that indicate for each ligand the importance for response prediction, are extracted. It can be assumed that ligands with higher variable importance scores are more likely to be a true active ligand.
Usage
get_multi_ligand_importances(setting,ligand_target_matrix, ligands_position = "cols", algorithm, cv = TRUE, cv_number = 4, cv_repeats = 2, parallel = FALSE, n_cores = 4, ignore_errors = FALSE, continuous = TRUE, known = TRUE, filter_genes = FALSE)Arguments
- setting
A list containing the following elements: .$name: name of the setting; .$from: name(s) of the ligand(s) of which the predictve performance need to be assessed; .$response: the observed target response: indicate for a gene whether it was a target or not in the setting of interest. $ligand: NULL or the name of the ligand(s) that are known to be active in the setting of interest.
- ligand_target_matrix
A matrix of ligand-target probabilty scores (recommended) or discrete target assignments (not-recommended).
- ligands_position
Indicate whether the ligands in the ligand-target matrix are in the rows ("rows") or columns ("cols"). Default: "cols"
- algorithm
The name of the classification algorithm to be applied. Should be supported by the caret package. Examples of algorithms we recommend: with embedded feature selection: "rf","glm","fda","glmnet","sdwd","gam","glmboost"; without: "lda","naive_bayes","pls"(because bug in current version of pls package), "pcaNNet". Please notice that not all these algorithms work when the features (i.e. ligand vectors) are categorical (i.e. discrete class assignments).
- cv
Indicate whether model training and hyperparameter optimization should be done via cross-validation. Default: TRUE. FALSE might be useful for applications only requiring variable importance, or when final model is not expected to be extremely overfit.
- cv_number
The number of folds for the cross-validation scheme: Default: 4; only relevant when cv == TRUE.
- cv_repeats
The number of repeats during cross-validation. Default: 2; only relevant when cv == TRUE.
- parallel
Indiciate whether the model training will occur parallelized. Default: FALSE. TRUE only possible for non-windows OS.
- n_cores
The number of cores used for parallelized model training via cross-validation. Default: 4. Only relevant on non-windows OS.
- ignore_errors
Indiciate whether errors during model training by caret should be ignored such that another model training try will be initiated until model is trained without raising errors. Default: FALSE.
- continuous
Indicate whether during training of the model, model training and evaluation should be done on class probabilities or discrete class labels. For huge class imbalance, we recommend setting this value to TRUE. Default: TRUE.
- known
Indicate whether the true active ligand for a particular dataset is known or not. Default: TRUE. The true ligand will be extracted from the $ligand slot of the setting.
- filter_genes
Indicate whether 50 per cent of the genes that are the least variable in ligand-target scores should be removed in order to reduce the training of the model. Default: FALSE.
Value
A data.frame with for each ligand - data set combination, feature importance scores indicating how important the query ligand is for the prediction of the response in the particular dataset, when prediction is done via a trained classification model with all possible ligands as input. In addition to the importance score(s), the name of the particular setting ($setting), the name of the query ligand($test_ligand), the name of the true active ligand (if known: $ligand).
Examples
if (FALSE) { # \dontrun{
settings <- lapply(expression_settings_validation[1:5], convert_expression_settings_evaluation)
settings_ligand_pred <- convert_settings_ligand_prediction(settings, all_ligands = unlist(extract_ligands_from_settings(settings, combination = FALSE)), validation = TRUE, single = FALSE)
weighted_networks <- construct_weighted_networks(lr_network, sig_network, gr_network, source_weights_df)
ligands <- extract_ligands_from_settings(settings_ligand_pred, combination = FALSE)
ligand_target_matrix <- construct_ligand_target_matrix(weighted_networks, lr_network, ligands)
ligand_importances_glm <- dplyr::bind_rows(lapply(settings_ligand_pred, get_multi_ligand_importances, ligand_target_matrix, algorithm = "glm"))
print(head(ligand_importances_glm))
} # }