Algorithm & Methodology

Overview

MultiNicheNet is a computational framework designed for differential cell-cell communication (CCC) analysis in multi-sample, multi-condition single-cell RNA sequencing experiments. This document provides a comprehensive overview of the algorithmic foundations and methodological principles underlying MultiNicheNet.

Theoretical Framework

The Cell-Cell Communication Inference Problem

Cell-cell communication (CCC) involves the transmission of biological signals between cells through ligand-receptor (L-R) interactions. In single-cell transcriptomics, we infer potential CCC events by:

Identifying expressed ligands in “sender” cell types
Identifying expressed receptors in “receiver” cell types
Predicting downstream signaling effects

Why Multi-Sample Analysis?

Traditional cell-level differential expression analysis suffers from several limitations:

Comparison of Cell-Level vs Sample-Level Analysis
Aspect	Cell-Level	Sample-Level (MultiNicheNet)
Statistical Unit	Individual cells	Samples/patients
Sample Variability	Ignored	Properly modeled
False Positive Rate	Inflated	Controlled
Complex Designs	Limited	Fully supported
Batch Effects	Problematic	Can be corrected

Core Algorithms

1. Pseudobulk Aggregation

For each cell type $c$ and sample $s$ , we aggregate single-cell expression profiles:

$\bar{X}_{g,c,s} = \frac{1}{n_{c,s}} \sum_{i \in \text{cells}(c,s)} X_{g,i}$

where: - $X_{g,i}$ is the expression of gene $g$ in cell $i$ - $n_{c,s}$ is the number of cells of type $c$ in sample $s$

Benefits: - Reduces technical noise through averaging - Enables proper statistical inference at sample level - Respects experimental design structure

2. Differential Expression Analysis

MultiNicheNet employs the muscat framework for differential state analysis. For each gene $g$ in cell type $c$ :

$\log_2(\text{CPM}_{g,c,s} + 1) = \beta_0 + \beta_1 \cdot \text{condition}_s + \epsilon_{g,c,s}$

Statistical Testing: - Uses negative binomial generalized linear models (edgeR) - Accounts for library size differences - Supports complex designs with covariates

Empirical P-value Correction: MultiNicheNet implements an empirical null distribution approach to control for multiple testing across many cell types:

$p_{\text{empirical}} = \frac{\#\{|t_{\text{null}}| \geq |t_{\text{observed}}|\}}{N_{\text{null}}}$

3. NicheNet Ligand Activity Inference

MultiNicheNet integrates the NicheNet ligand-target prior knowledge model to infer ligand activities based on downstream gene expression changes.

Ligand-Target Matrix: The ligand-target matrix $\mathbf{W}$ contains regulatory potential scores:

$W_{l,t} = P(\text{gene } t \text{ regulated by ligand } l)$

Activity Score Calculation: For a set of differentially expressed target genes $G_{\text{DE}}$ :

$\text{Activity}_l = \text{AUROC}(W_{l,\cdot}, G_{\text{DE}})$

This measures how well ligand $l$ ’s predicted targets are enriched in the observed DE genes.

4. Multi-Criteria Prioritization

The key innovation of MultiNicheNet is integrating multiple biological criteria into a unified prioritization score.

Prioritization Components:

Criterion	Symbol	Description
Ligand DE	$S_{\text{DE}}^L$	Differential expression of ligand
Receptor DE	$S_{\text{DE}}^R$	Differential expression of receptor
Ligand specificity	$S_{\text{spec}}^L$	Cell-type specificity of ligand
Receptor specificity	$S_{\text{spec}}^R$	Cell-type specificity of receptor
Expression fraction	$S_{\text{frac}}$	Fraction of samples expressing L-R
Ligand activity	$S_{\text{act}}$	NicheNet activity score

Unified Score:

$S_{\text{priority}} = \prod_{i} S_i^{w_i}$

where $w_i$ are scenario-specific weights.

Biological Scenarios

MultiNicheNet supports different biological scenarios with pre-defined weight configurations:

Mathematical Details

Expression Fraction Calculation

For a ligand-receptor pair in sender cell type $s$ and receiver cell type $r$ :

$F_{L,s} = \frac{\#\{\text{samples where } \bar{X}_{L,s} > \theta\}}{N_{\text{samples}}}$

where $\theta$ is the expression threshold.

Cell-Type Specificity Score

Specificity is calculated using the Gini coefficient or entropy-based measures:

$\text{Specificity}_g = 1 - H(p_{g,1}, p_{g,2}, ..., p_{g,C}) / \log(C)$

where $p_{g,c}$ is the proportion of gene $g$ expression in cell type $c$ .

Prioritization Score Aggregation

The final prioritization score uses quantile normalization followed by geometric mean:

$S_{\text{final}} = \left(\prod_{i=1}^{K} Q_i^{w_i}\right)^{1/\sum w_i}$

where $Q_i$ is the quantile-normalized score for criterion $i$ .

Comparison with Other Methods

Performance Considerations

Computational Complexity

Operation	Complexity	Typical Time
Pseudobulk aggregation	O(n cells × m genes)	Seconds
DE analysis	O(k cell types × m genes)	Minutes
Ligand activity	O(l ligands × t targets)	Minutes
Prioritization	O(p pairs × c criteria)	Seconds

Parallelization

MultiNicheNet supports parallel processing for: - Ligand activity inference across receivers - Permutation-based empirical p-values - Multiple contrast calculations

References

MultiNicheNet: Browaeys, R. et al. bioRxiv (2023). https://doi.org/10.1101/2023.06.13.544751
NicheNet: Browaeys, R. et al. Nat Methods 17, 159–162 (2020). https://doi.org/10.1038/s41592-019-0667-5
muscat: Crowell, H.L. et al. Nat Commun 11, 6077 (2020). https://doi.org/10.1038/s41467-020-19894-4
edgeR: Robinson, M.D. et al. Bioinformatics 26, 139–140 (2010).

Maintained by Zaoqu Liu

Zaoqu Liu

2026-01-24