Skip to contents

Overview

MultiNicheNet is a computational framework designed for differential cell-cell communication (CCC) analysis in multi-sample, multi-condition single-cell RNA sequencing experiments. This document provides a comprehensive overview of the algorithmic foundations and methodological principles underlying MultiNicheNet.

Theoretical Framework

The Cell-Cell Communication Inference Problem

Cell-cell communication (CCC) involves the transmission of biological signals between cells through ligand-receptor (L-R) interactions. In single-cell transcriptomics, we infer potential CCC events by:

  1. Identifying expressed ligands in “sender” cell types
  2. Identifying expressed receptors in “receiver” cell types
  3. Predicting downstream signaling effects

Why Multi-Sample Analysis?

Traditional cell-level differential expression analysis suffers from several limitations:

Comparison of Cell-Level vs Sample-Level Analysis
Aspect Cell-Level Sample-Level (MultiNicheNet)
Statistical Unit Individual cells Samples/patients
Sample Variability Ignored Properly modeled
False Positive Rate Inflated Controlled
Complex Designs Limited Fully supported
Batch Effects Problematic Can be corrected

Core Algorithms

1. Pseudobulk Aggregation

For each cell type cc and sample ss, we aggregate single-cell expression profiles:

Xg,c,s=1nc,sicells(c,s)Xg,i\bar{X}_{g,c,s} = \frac{1}{n_{c,s}} \sum_{i \in \text{cells}(c,s)} X_{g,i}

where: - Xg,iX_{g,i} is the expression of gene gg in cell ii - nc,sn_{c,s} is the number of cells of type cc in sample ss

Benefits: - Reduces technical noise through averaging - Enables proper statistical inference at sample level - Respects experimental design structure

2. Differential Expression Analysis

MultiNicheNet employs the muscat framework for differential state analysis. For each gene gg in cell type cc:

log2(CPMg,c,s+1)=β0+β1conditions+ϵg,c,s\log_2(\text{CPM}_{g,c,s} + 1) = \beta_0 + \beta_1 \cdot \text{condition}_s + \epsilon_{g,c,s}

Statistical Testing: - Uses negative binomial generalized linear models (edgeR) - Accounts for library size differences - Supports complex designs with covariates

Empirical P-value Correction: MultiNicheNet implements an empirical null distribution approach to control for multiple testing across many cell types:

pempirical=#{|tnull||tobserved|}Nnullp_{\text{empirical}} = \frac{\#\{|t_{\text{null}}| \geq |t_{\text{observed}}|\}}{N_{\text{null}}}

3. NicheNet Ligand Activity Inference

MultiNicheNet integrates the NicheNet ligand-target prior knowledge model to infer ligand activities based on downstream gene expression changes.

Ligand-Target Matrix: The ligand-target matrix 𝐖\mathbf{W} contains regulatory potential scores:

Wl,t=P(gene t regulated by ligand l)W_{l,t} = P(\text{gene } t \text{ regulated by ligand } l)

Activity Score Calculation: For a set of differentially expressed target genes GDEG_{\text{DE}}:

Activityl=AUROC(Wl,,GDE)\text{Activity}_l = \text{AUROC}(W_{l,\cdot}, G_{\text{DE}})

This measures how well ligand ll’s predicted targets are enriched in the observed DE genes.

4. Multi-Criteria Prioritization

The key innovation of MultiNicheNet is integrating multiple biological criteria into a unified prioritization score.

Prioritization Components:

Criterion Symbol Description
Ligand DE SDELS_{\text{DE}}^L Differential expression of ligand
Receptor DE SDERS_{\text{DE}}^R Differential expression of receptor
Ligand specificity SspecLS_{\text{spec}}^L Cell-type specificity of ligand
Receptor specificity SspecRS_{\text{spec}}^R Cell-type specificity of receptor
Expression fraction SfracS_{\text{frac}} Fraction of samples expressing L-R
Ligand activity SactS_{\text{act}} NicheNet activity score

Unified Score:

Spriority=iSiwiS_{\text{priority}} = \prod_{i} S_i^{w_i}

where wiw_i are scenario-specific weights.

Biological Scenarios

MultiNicheNet supports different biological scenarios with pre-defined weight configurations:

Mathematical Details

Expression Fraction Calculation

For a ligand-receptor pair in sender cell type ss and receiver cell type rr:

FL,s=#{samples where XL,s>θ}NsamplesF_{L,s} = \frac{\#\{\text{samples where } \bar{X}_{L,s} > \theta\}}{N_{\text{samples}}}

where θ\theta is the expression threshold.

Cell-Type Specificity Score

Specificity is calculated using the Gini coefficient or entropy-based measures:

Specificityg=1H(pg,1,pg,2,...,pg,C)/log(C)\text{Specificity}_g = 1 - H(p_{g,1}, p_{g,2}, ..., p_{g,C}) / \log(C)

where pg,cp_{g,c} is the proportion of gene gg expression in cell type cc.

Prioritization Score Aggregation

The final prioritization score uses quantile normalization followed by geometric mean:

Sfinal=(i=1KQiwi)1/wiS_{\text{final}} = \left(\prod_{i=1}^{K} Q_i^{w_i}\right)^{1/\sum w_i}

where QiQ_i is the quantile-normalized score for criterion ii.

Comparison with Other Methods

Performance Considerations

Computational Complexity

Operation Complexity Typical Time
Pseudobulk aggregation O(n cells × m genes) Seconds
DE analysis O(k cell types × m genes) Minutes
Ligand activity O(l ligands × t targets) Minutes
Prioritization O(p pairs × c criteria) Seconds

Parallelization

MultiNicheNet supports parallel processing for: - Ligand activity inference across receivers - Permutation-based empirical p-values - Multiple contrast calculations

References

  1. MultiNicheNet: Browaeys, R. et al. bioRxiv (2023). https://doi.org/10.1101/2023.06.13.544751

  2. NicheNet: Browaeys, R. et al. Nat Methods 17, 159–162 (2020). https://doi.org/10.1038/s41592-019-0667-5

  3. muscat: Crowell, H.L. et al. Nat Commun 11, 6077 (2020). https://doi.org/10.1038/s41467-020-19894-4

  4. edgeR: Robinson, M.D. et al. Bioinformatics 26, 139–140 (2010).


Maintained by Zaoqu Liu