Overview
NOVA implements a comprehensive computational framework for inferring cell-to-cell communication networks based on ligand-receptor co-expression patterns. This document details the mathematical foundations and algorithmic implementations underlying NOVA’s analysis pipeline.
Theoretical Background
Cell-Cell Communication
Intercellular communication is a fundamental biological process where cells exchange information through signaling molecules. In the context of transcriptomic analysis, we focus on ligand-receptor (L-R) interactions, where:
- Ligands: Secreted or membrane-bound signaling molecules produced by sending cells
- Receptors: Cell surface proteins on receiving cells that bind to specific ligands
The strength of a potential communication event can be estimated from gene expression data by examining the co-expression of ligand-receptor pairs between cell populations.
Mathematical Framework
1. Gene Expression Statistics
For a given gene in cluster , NOVA computes:
2. Expression Specificity
The specificity score quantifies how preferentially a gene is expressed in a given cluster relative to all clusters:
where is the set of all clusters. This metric ranges from 0 to 1, with higher values indicating more cluster-specific expression.
Properties: - for each gene - if gene is not expressed in cluster - if gene is only expressed in cluster
Algorithmic Implementation
Cluster Statistics Computation
# Pseudocode for cluster statistics
compute_cluster_stats <- function(expr, clusters) {
for each cluster c:
cells_in_c <- get_cells(clusters, c)
for each gene g:
n_expressing <- sum(expr[g, cells_in_c] > 0)
pct[g,c] <- n_expressing / length(cells_in_c)
mean[g,c] <- mean(expr[g, cells_in_c])
# Compute specificity (row normalization)
for each gene g:
row_sum <- sum(mean[g, ])
if row_sum > 0:
specificity[g, ] <- mean[g, ] / row_sum
return list(pct, mean, specificity)
}Edge Computation
# Pseudocode for edge computation
compute_edges <- function(lr_pairs, lig_stats, rec_stats, clusters) {
edges <- list()
for each pair (L, R) in lr_pairs:
for each sending cluster s:
for each target cluster t:
# Check detection threshold
if lig_stats$pct[L,s] > min_pct AND
rec_stats$pct[R,t] > min_pct:
# Compute weights
weight_expr <- lig_stats$mean[L,s] * rec_stats$mean[R,t]
weight_spec <- lig_stats$spec[L,s] * rec_stats$spec[R,t]
if weight_expr > 0:
edges.add(L, R, s, t, weight_expr, weight_spec)
return edges
}Differential Analysis
Ligand-Receptor Database
connectomeDB2020
NOVA utilizes the connectomeDB2020 database, which provides:
| Database | Description | Pairs |
|---|---|---|
| lrc2p | Literature-curated, high-confidence pairs | 2,293 |
| lrc2a | Extended set including predictions | ~15,000 |
Database Structure
library(NOVA)
# Load database
lr_db <- GetLRDatabase("lrc2p")
str(lr_db)
#> Classes 'data.table' and 'data.frame': 2293 obs. of 2 variables:
#> $ ligand : chr "A2M" "AANAT" "AANAT" "ACE" ...
#> $ receptor: chr "LRP1" "MTNR1A" "MTNR1B" "BDKRB2" ...
#> - attr(*, ".internal.selfref")=<externalptr>
head(lr_db)
#> ligand receptor
#> <char> <char>
#> 1: A2M LRP1
#> 2: AANAT MTNR1A
#> 3: AANAT MTNR1B
#> 4: ACE BDKRB2
#> 5: ADAM10 EPHA3
#> 6: ADAM11 ITGA4Multi-Species Support
Homology Mapping
NOVA supports cross-species analysis through NCBI HomoloGene:
# Get supported species
species_list <- supported_species()
print(species_list)
#> human mouse chimpanzee
#> "9606" "10090" "9598"
#> dog monkey cattle
#> "9615" "9544" "9913"
#> rat chicken frog
#> "10116" "9031" "8364"
#> zebrafish fruitfly mosquito
#> "7955" "7227" "7165"
#> nematode thalecress rice
#> "6239" "3702" "4530"
#> riceblastfungus bakeryeast neurosporacrassa
#> "318829" "4932" "5141"
#> fissionyeast eremotheciumgossypii kluyveromyceslactis
#> "4896" "33169" "28985"
# Convert mouse genes to human
# mouse_genes <- c("Cd4", "Cd8a", "Ptprc")
# human_orthologs <- ConvertGeneSymbols(mouse_genes, from = "mouse", to = "human")Computational Efficiency
Vectorization Strategy
NOVA employs several optimization strategies:
-
Matrix operations: Using
data.tablefor efficient data manipulation -
Sparse matrix support: Via the
Matrixpackage for memory efficiency - C++ acceleration: Critical loops implemented in RcppArmadillo
-
Parallel processing: Optional parallelization via
futurepackage