Estimate data source weights of data sources of interest based on leave-one-in and leave-one-out characterization performances. — estimate_source_weights

estimate_source_weights_characterization will estimate data source weights of data sources of interest based on a model that was trained to predict weights of data sources based on leave-one-in and leave-one-out characterization performances.

Usage

estimate_source_weights_characterization(loi_performances,loo_performances,source_weights_df, sources_oi, random_forest =FALSE)

Arguments

loi_performances: Performances of models in which a particular data source of interest was the only data source in or the ligand-signaling or the gene regulatory network.
loo_performances: Performances of models in which a particular data source of interest was removed from the ligand-signaling or the gene regulatory network before model construction.
source_weights_df: A data frame / tibble containing the weights associated to each individual data source. Sources with higher weights will contribute more to the final model performance (required columns: source, weight). Note that only interactions described by sources included here, will be retained during model construction.
sources_oi: The names of the data sources of which data source weights should be estimated based on leave-one-in and leave-one-out performances.
random_forest: Indicate whether for the regression between leave-one-in + leave-one-out performances and data source weights a random forest model should be trained (TRUE) or a linear model (FALSE). Default: FALSE

Value

A list containing two elements. $source_weights_df (the input source_weights_df extended by the estimated source_weighs for data sources of interest) and $model (model object of the regression between leave-one-in, leave-one-out performances and data source weights).

Examples

if (FALSE) { # \dontrun{
library(dplyr)
settings <- lapply(expression_settings_validation[1:4], convert_expression_settings_evaluation)
weights_settings_loi <- prepare_settings_leave_one_in_characterization(lr_network = lr_network, sig_network = sig_network, gr_network = gr_network, source_weights_df)
weights_settings_loi <- lapply(weights_settings_loi, add_hyperparameters_parameter_settings, lr_sig_hub = 0.25, gr_hub = 0.5, ltf_cutoff = 0, algorithm = "PPR", damping_factor = 0.2, correct_topology = TRUE)
doMC::registerDoMC(cores = 4)
job_characterization_loi <- parallel::mclapply(weights_settings_loi[1:4], evaluate_model, lr_network = lr_network, sig_network = sig_network, gr_network = gr_network, settings, calculate_popularity_bias_target_prediction = FALSE, calculate_popularity_bias_ligand_prediction = FALSE, ncitations, mc.cores = 4)
loi_performances <- process_characterization_target_prediction_average(job_characterization_loi)
weights_settings_loo <- prepare_settings_leave_one_out_characterization(lr_network = lr_network, sig_network = sig_network, gr_network = gr_network, source_weights_df)
weights_settings_loo <- lapply(weights_settings_loo, add_hyperparameters_parameter_settings, lr_sig_hub = 0.25, gr_hub = 0.5, ltf_cutoff = 0, algorithm = "PPR", damping_factor = 0.2, correct_topology = TRUE)
doMC::registerDoMC(cores = 4)
job_characterization_loo <- parallel::mclapply(weights_settings_loo[1:4], evaluate_model, lr_network = lr_network, sig_network = sig_network, gr_network = gr_network, settings, calculate_popularity_bias_target_prediction = FALSE, calculate_popularity_bias_ligand_prediction = FALSE, ncitations, mc.cores = 4)
loo_performances <- process_characterization_target_prediction_average(job_characterization_loo)
sources_oi <- c("kegg_cytokines")
output <- estimate_source_weights_characterization(loi_performances, loo_performances, source_weights_df %>% filter(source != "kegg_cytokines"), sources_oi, random_forest = FALSE)
} # }