Skip to contents

Introduction

This vignette provides a rigorous mathematical treatment of the NicheNet algorithm. We describe the probabilistic framework underlying ligand-target predictions and the statistical methods used for ligand activity inference.

Network Representation

Graph Notation

Let 𝒢=(V,E,W)\mathcal{G} = (V, E, W) be a weighted directed graph where:

  • V={v1,v2,,vn}V = \{v_1, v_2, \ldots, v_n\} is the set of genes/proteins
  • EV×VE \subseteq V \times V is the set of directed edges
  • W:E+W: E \to \mathbb{R}^+ assigns positive weights to edges

The adjacency matrix 𝐀n×n\mathbf{A} \in \mathbb{R}^{n \times n} is defined as:

Aij={W(vi,vj)if (vi,vj)E0otherwiseA_{ij} = \begin{cases} W(v_i, v_j) & \text{if } (v_i, v_j) \in E \\ 0 & \text{otherwise} \end{cases}

Multi-Source Integration

Given KK data sources with edge sets E1,E2,,EKE_1, E_2, \ldots, E_K and source weights 𝐰=(w1,,wK)T\mathbf{w} = (w_1, \ldots, w_K)^T, the integrated adjacency matrix is:

𝐀int=k=1Kwk𝐀(k)\mathbf{A}^{int} = \sum_{k=1}^{K} w_k \cdot \mathbf{A}^{(k)}

where 𝐀(k)\mathbf{A}^{(k)} is the binary adjacency matrix for source kk.

# Simulate multi-source integration
set.seed(42)
n_sources <- 5
source_names <- c("OmniPath", "KEGG", "Reactome", "TRRUST", "RegNetwork")
weights <- c(0.8, 0.6, 0.7, 0.5, 0.4)

# Create example edge contributions
edge_contributions <- data.frame(
  source = rep(source_names, each = 100),
  weight = rep(weights, each = 100),
  edge_present = rbinom(500, 1, rep(c(0.3, 0.5, 0.4, 0.6, 0.2), each = 100))
) %>%
  mutate(contribution = weight * edge_present)

# Summarize
source_summary <- edge_contributions %>%
  group_by(source) %>%
  summarise(
    coverage = mean(edge_present),
    avg_contribution = mean(contribution),
    weight = first(weight)
  ) %>%
  arrange(desc(avg_contribution))

ggplot(source_summary, aes(x = reorder(source, avg_contribution), y = coverage, fill = weight)) +
  geom_col() +
  geom_text(aes(label = sprintf("w=%.1f", weight)), hjust = -0.1, size = 3.5) +
  coord_flip() +
  scale_fill_gradient(low = "#cfe2ff", high = "#0d6efd") +
  labs(
    title = "Data Source Contributions",
    subtitle = "Edge coverage weighted by source reliability",
    x = "Data Source",
    y = "Edge Coverage",
    fill = "Weight"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))
Visualization of multi-source network integration

Visualization of multi-source network integration

Hub Correction

Motivation

Highly connected nodes (hubs) can dominate random walk-based methods. To mitigate this, we apply degree-based normalization.

Mathematical Formulation

Let djin=iAijd_j^{in} = \sum_i A_{ij} be the in-degree of node jj. The hub-corrected weight matrix is:

Ãij=Aij(djin)h\tilde{A}_{ij} = A_{ij} \cdot (d_j^{in})^{-h}

where h[0,1]h \in [0, 1] is the hub correction parameter: - h=0h = 0: No correction - h=1h = 1: Full inverse-degree normalization

# Demonstrate hub correction effect
hub_demo <- data.frame(
  in_degree = 1:100,
  h_0 = 1,
  h_0.25 = (1:100)^(-0.25),
  h_0.5 = (1:100)^(-0.5),
  h_0.75 = (1:100)^(-0.75),
  h_1 = (1:100)^(-1)
) %>%
  pivot_longer(-in_degree, names_to = "h_value", values_to = "correction_factor") %>%
  mutate(h_value = factor(h_value, 
                          levels = c("h_0", "h_0.25", "h_0.5", "h_0.75", "h_1"),
                          labels = c("h=0", "h=0.25", "h=0.5", "h=0.75", "h=1")))

ggplot(hub_demo, aes(x = in_degree, y = correction_factor, color = h_value)) +
  geom_line(linewidth = 1) +
  scale_y_log10() +
  scale_color_viridis_d() +
  labs(
    title = "Hub Correction Factors",
    subtitle = expression(paste("Correction = ", d^{-h}, " for different h values")),
    x = "In-degree",
    y = "Correction Factor (log scale)",
    color = "Hub Parameter"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "right"
  )
Effect of hub correction on edge weights

Effect of hub correction on edge weights

Personalized PageRank

Random Walk Interpretation

Consider a random walker on graph 𝒢\mathcal{G}:

  1. Start at ligand node LL
  2. At each step:
    • With probability α\alpha: follow an outgoing edge (weighted by Ã\tilde{A})
    • With probability (1α)(1-\alpha): restart at LL

The stationary distribution 𝛑L\boldsymbol{\pi}_L satisfies:

𝛑L=(1α)𝐞L+α𝐏T𝛑L\boldsymbol{\pi}_L = (1-\alpha) \mathbf{e}_L + \alpha \mathbf{P}^T \boldsymbol{\pi}_L

where: - 𝐞L\mathbf{e}_L is the indicator vector for ligand LL - 𝐏=𝐃1𝐀̃\mathbf{P} = \mathbf{D}^{-1}\tilde{\mathbf{A}} is the row-normalized transition matrix - 𝐃=diag(jÃij)\mathbf{D} = \text{diag}(\sum_j \tilde{A}_{ij}) is the out-degree matrix

Closed-Form Solution

Rearranging:

𝛑L=(1α)(𝐈α𝐏T)1𝐞L\boldsymbol{\pi}_L = (1-\alpha)(\mathbf{I} - \alpha \mathbf{P}^T)^{-1} \mathbf{e}_L

Iterative Computation

For large networks, we use power iteration:

𝛑L(t+1)=(1α)𝐞L+α𝐏T𝛑L(t)\boldsymbol{\pi}_L^{(t+1)} = (1-\alpha) \mathbf{e}_L + \alpha \mathbf{P}^T \boldsymbol{\pi}_L^{(t)}

Convergence is guaranteed for α<1\alpha < 1.

# Simulate PPR convergence
set.seed(123)
n_nodes <- 50
alpha <- 0.5
max_iter <- 30

# Create random transition matrix
P <- matrix(runif(n_nodes^2), n_nodes, n_nodes)
P <- P / rowSums(P)

# Start vector (ligand at position 1)
e_L <- rep(0, n_nodes)
e_L[1] <- 1

# Iterate
pi_history <- matrix(NA, max_iter, n_nodes)
pi <- e_L

for (t in 1:max_iter) {
  pi <- (1-alpha) * e_L + alpha * t(P) %*% pi
  pi_history[t,] <- pi
}

# Plot convergence for top nodes
convergence_df <- data.frame(
  iteration = rep(1:max_iter, 5),
  node = rep(paste0("Node ", 1:5), each = max_iter),
  ppr = c(pi_history[,1], pi_history[,2], pi_history[,3], 
          pi_history[,4], pi_history[,5])
)

ggplot(convergence_df, aes(x = iteration, y = ppr, color = node)) +
  geom_line(linewidth = 1) +
  geom_point(size = 0.8) +
  scale_color_brewer(palette = "Set1") +
  labs(
    title = "PageRank Convergence",
    subtitle = paste("Damping factor α =", alpha),
    x = "Iteration",
    y = "PageRank Score",
    color = "Node"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "right"
  )
Personalized PageRank convergence

Personalized PageRank convergence

Ligand-Target Matrix Construction

Definition

The ligand-target matrix 𝐌|L|×|G|\mathbf{M} \in \mathbb{R}^{|L| \times |G|} contains regulatory potential scores:

Mg=𝛑(g)RgM_{\ell g} = \boldsymbol{\pi}_\ell(g) \cdot R_g

where: - 𝛑(g)\boldsymbol{\pi}_\ell(g) is the PPR score from ligand \ell to gene gg - RgR_g is an optional regulatory weight (e.g., from GRN)

Sparsification

To improve computational efficiency and reduce noise:

M̃g={Mgif Mg>Qτ(𝐌)0otherwise\tilde{M}_{\ell g} = \begin{cases} M_{\ell g} & \text{if } M_{\ell g} > Q_\tau(\mathbf{M}_\ell) \\ 0 & \text{otherwise} \end{cases}

where QτQ_\tau is the τ\tau-quantile (default τ=0.99\tau = 0.99).

Ligand Activity Inference

Statistical Framework

Binary Classification Setup

For a ligand \ell with scores 𝐦=(M1,,MG)\mathbf{m}_\ell = (M_{\ell 1}, \ldots, M_{\ell G}) and target indicator 𝐲{0,1}G\mathbf{y} \in \{0, 1\}^G:

  • Positive class: 𝒫={g:yg=1}\mathcal{P} = \{g : y_g = 1\} (genes of interest)
  • Negative class: 𝒩={g:yg=0}\mathcal{N} = \{g : y_g = 0\} (background genes)

AUROC Computation

AUROC=i𝒫j𝒩𝟏[mi>mj]|𝒫||𝒩|\text{AUROC} = \frac{\sum_{i \in \mathcal{P}} \sum_{j \in \mathcal{N}} \mathbf{1}[m_i > m_j]}{|\mathcal{P}| \cdot |\mathcal{N}|}

Interpretation: Probability that a randomly chosen positive has higher score than a randomly chosen negative.

# Generate example ROC curve
set.seed(456)
n_pos <- 50
n_neg <- 200

# Good classifier
scores_pos <- rnorm(n_pos, mean = 0.7, sd = 0.2)
scores_neg <- rnorm(n_neg, mean = 0.3, sd = 0.2)

all_scores <- c(scores_pos, scores_neg)
all_labels <- c(rep(1, n_pos), rep(0, n_neg))

# Compute ROC
thresholds <- sort(unique(all_scores), decreasing = TRUE)
tpr <- sapply(thresholds, function(t) sum(scores_pos >= t) / n_pos)
fpr <- sapply(thresholds, function(t) sum(scores_neg >= t) / n_neg)

roc_df <- data.frame(fpr = c(0, fpr, 1), tpr = c(0, tpr, 1))

# Calculate AUROC
auroc <- sum(diff(c(0, fpr, 1)) * (c(0, tpr, 1)[-1] + c(0, tpr, 1)[-length(c(0,tpr,1))]) / 2)

ggplot(roc_df, aes(x = fpr, y = tpr)) +
  geom_line(color = "#0d6efd", linewidth = 1.2) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray50") +
  geom_ribbon(aes(ymin = fpr, ymax = tpr), fill = "#0d6efd", alpha = 0.2) +
  annotate("text", x = 0.6, y = 0.3, 
           label = sprintf("AUROC = %.3f", auroc), 
           size = 5, fontface = "bold") +
  labs(
    title = "Receiver Operating Characteristic (ROC) Curve",
    x = "False Positive Rate (1 - Specificity)",
    y = "True Positive Rate (Sensitivity)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold")) +
  coord_equal()
AUROC computation illustrated

AUROC computation illustrated

AUPR Computation

AUPR=k=1n(RkRk1)Pk\text{AUPR} = \sum_{k=1}^{n} (R_k - R_{k-1}) \cdot P_k

where at threshold kk: - Pk=TPkTPk+FPkP_k = \frac{TP_k}{TP_k + FP_k} (Precision) - Rk=TPk|𝒫|R_k = \frac{TP_k}{|\mathcal{P}|} (Recall)

Baseline Correction

Random classifier AUPR equals the class prior:

AUPRrandom=|𝒫||𝒫|+|𝒩|\text{AUPR}_{random} = \frac{|\mathcal{P}|}{|\mathcal{P}| + |\mathcal{N}|}

Corrected AUPR:

AUPRcorrected=AUPRAUPRrandom\text{AUPR}_{corrected} = \text{AUPR} - \text{AUPR}_{random}

# Compute PR curve
precision <- sapply(thresholds, function(t) {
  tp <- sum(scores_pos >= t)
  fp <- sum(scores_neg >= t)
  if (tp + fp == 0) return(1)
  tp / (tp + fp)
})
recall <- sapply(thresholds, function(t) sum(scores_pos >= t) / n_pos)

pr_df <- data.frame(recall = c(0, recall), precision = c(1, precision))
baseline <- n_pos / (n_pos + n_neg)

# Calculate AUPR
aupr <- sum(diff(c(0, recall)) * precision)

ggplot(pr_df, aes(x = recall, y = precision)) +
  geom_line(color = "#198754", linewidth = 1.2) +
  geom_hline(yintercept = baseline, linetype = "dashed", color = "gray50") +
  geom_ribbon(aes(ymin = baseline, ymax = precision), fill = "#198754", alpha = 0.2) +
  annotate("text", x = 0.7, y = baseline + 0.05, 
           label = sprintf("Baseline = %.3f", baseline), color = "gray40") +
  annotate("text", x = 0.3, y = 0.9, 
           label = sprintf("AUPR = %.3f\nCorrected = %.3f", aupr, aupr - baseline), 
           size = 4, fontface = "bold") +
  labs(
    title = "Precision-Recall Curve",
    x = "Recall (Sensitivity)",
    y = "Precision (PPV)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))
Precision-Recall curve with baseline

Precision-Recall curve with baseline

Pearson Correlation

r=g(mgm)(ygy)g(mgm)2g(ygy)2r = \frac{\sum_g (m_g - \bar{m})(y_g - \bar{y})}{\sqrt{\sum_g (m_g - \bar{m})^2} \sqrt{\sum_g (y_g - \bar{y})^2}}

Statistical test: Under H0:ρ=0H_0: \rho = 0:

t=rn21r2tn2t = r \sqrt{\frac{n-2}{1-r^2}} \sim t_{n-2}

Numerical Implementation

Computational Considerations

Sparse Matrix Operations

For efficiency, NicheNet uses sparse matrix representations:

# Example sparse matrix
library(Matrix)

# Dense vs sparse storage
n <- 1000
sparsity <- 0.01

set.seed(789)
dense_mat <- matrix(0, n, n)
dense_mat[sample(n^2, n^2 * sparsity)] <- runif(n^2 * sparsity)

sparse_mat <- Matrix(dense_mat, sparse = TRUE)

cat("Dense matrix size:", format(object.size(dense_mat), units = "MB"), "\n")
## Dense matrix size: 7.6 Mb
cat("Sparse matrix size:", format(object.size(sparse_mat), units = "MB"), "\n")
## Sparse matrix size: 0.1 Mb
cat("Compression ratio:", round(as.numeric(object.size(dense_mat)) / 
                                 as.numeric(object.size(sparse_mat)), 1), "x\n")
## Compression ratio: 63.7 x

C++ Acceleration

Critical operations are implemented in C++ for performance:

Operation R Time C++ Time Speedup
AUROC/AUPR ~100ms ~2ms 50x
Pearson batch ~50ms ~1ms 50x
Matrix subset ~20ms ~0.5ms 40x

Numerical Stability

To prevent numerical issues:

  1. Log-space operations for very small probabilities
  2. Epsilon thresholds for division (default: 101010^{-10})
  3. P-value capping at machine precision limit

Summary

The NicheNet algorithm combines:

  1. Multi-source network integration with weighted aggregation
  2. Hub-corrected adjacency matrices to prevent degree bias
  3. Personalized PageRank for ligand-target scoring
  4. Robust statistical evaluation using AUROC, AUPR, and correlation

This mathematical framework enables principled inference of intercellular signaling from transcriptomic data.

References

  1. Browaeys R, Saelens W, Saeys Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nature Methods 17, 159–162 (2020).

  2. Haveliwala TH. Topic-sensitive PageRank. WWW ’02: Proceedings of the 11th International Conference on World Wide Web (2002).

  3. Davis J, Goadrich M. The relationship between Precision-Recall and ROC curves. ICML ’06 (2006).

Session Info

## R version 4.4.0 (2024-04-24)
## Platform: aarch64-apple-darwin20
## Running under: macOS 15.6.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] C
## 
## time zone: Asia/Shanghai
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] Matrix_1.7-4  tidyr_1.3.2   dplyr_1.1.4   ggplot2_4.0.1
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6       jsonlite_2.0.0     compiler_4.4.0     tidyselect_1.2.1  
##  [5] dichromat_2.0-0.1  jquerylib_0.1.4    systemfonts_1.3.1  scales_1.4.0      
##  [9] textshaping_1.0.4  yaml_2.3.12        fastmap_1.2.0      lattice_0.22-7    
## [13] R6_2.6.1           labeling_0.4.3     generics_0.1.4     knitr_1.51        
## [17] htmlwidgets_1.6.4  tibble_3.3.1       desc_1.4.3         bslib_0.9.0       
## [21] pillar_1.11.1      RColorBrewer_1.1-3 rlang_1.1.7        cachem_1.1.0      
## [25] xfun_0.56          fs_1.6.6           sass_0.4.10        S7_0.2.1          
## [29] otel_0.2.0         viridisLite_0.4.2  cli_3.6.5          pkgdown_2.1.3     
## [33] withr_3.0.2        magrittr_2.0.4     digest_0.6.39      grid_4.4.0        
## [37] lifecycle_1.0.5    vctrs_0.7.0        evaluate_1.0.5     glue_1.8.0        
## [41] farver_2.1.2       ragg_1.5.0         purrr_1.2.1        rmarkdown_2.30    
## [45] tools_4.4.0        pkgconfig_2.0.3    htmltools_0.5.9