Skip to contents

Theoretical Foundation

Connectome is built on the premise that cell-cell communication can be inferred from the co-expression patterns of ligand-receptor pairs across distinct cell populations. This document describes the mathematical framework underlying the analysis.

1. Ligand-Receptor Database

FANTOM5 Database

Connectome utilizes the FANTOM5 (Functional Annotation of the Mammalian Genome 5) ligand-receptor database, which provides curated pairs of interacting molecules:

𝒫={(Lk,Rk,Mk)}k=1N\mathcal{P} = \{(L_k, R_k, M_k)\}_{k=1}^{N}

Where: - LkL_k: Ligand gene symbol - RkR_k: Receptor gene symbol
- MkM_k: Signaling mode/family classification - NN: Total number of pairs (~2,557 for human)

Evidence Levels

Pairs are classified by evidence strength:

Level Description
Literature supported Experimentally validated interactions
Putative Computationally predicted interactions

2. Edge Weight Computation

Expression Metrics

For each cell population ii and gene gg:

Normalized Expression: Ei,g=1|Ci|cCiEc,g\bar{E}_{i,g} = \frac{1}{|C_i|} \sum_{c \in C_i} E_{c,g}

Scaled Expression (Z-score): Zi,g=Ei,gμgσgZ_{i,g} = \frac{\bar{E}_{i,g} - \mu_g}{\sigma_g}

Percent Expression: Pi,g=|{cCi:Ec,g>0}||Ci|P_{i,g} = \frac{|\{c \in C_i : E_{c,g} > 0\}|}{|C_i|}

Edge Weight Functions

Given source population ii, target population jj, and ligand-receptor pair (L,R)(L, R):

Product (Default): wijLR=Ei,L×Ej,Rw_{ij}^{LR} = E_{i,L} \times E_{j,R}

Sum: wijLR=Ei,L+Ej,Rw_{ij}^{LR} = E_{i,L} + E_{j,R}

Mean: wijLR=Ei,L+Ej,R2w_{ij}^{LR} = \frac{E_{i,L} + E_{j,R}}{2}

The product formulation captures the multiplicative nature of ligand-receptor binding kinetics.

3. Statistical Testing

Wilcoxon Rank-Sum Test

For each gene gg in cluster ii, we test whether expression differs from background:

H0:median(ECi,g)=median(ECi,g)H_0: \text{median}(E_{C_i,g}) = \text{median}(E_{C_{\backslash i},g})

The test statistic: W=cCiRcW = \sum_{c \in C_i} R_c

Where RcR_c is the rank of cell cc in the combined sample.

Multiple Testing Correction

Adjusted p-values using Bonferroni correction: padj=min(p×m,1)p_{adj} = \min(p \times m, 1)

Where mm is the number of tests performed.

4. Diagnostic Odds Ratio (DOR)

The DOR quantifies gene specificity for a cell cluster using a 2×2 contingency table:

Expressing Non-expressing
In cluster TP FN
Out of cluster FP TN

Standard DOR

DOR=TP×TNFP×FN\text{DOR} = \frac{TP \times TN}{FP \times FN}

Haldane-Anscombe Correction

To handle zero cells, we apply the Haldane-Anscombe correction with pseudocount ϵ=0.5\epsilon = 0.5:

DORcorrected=(TP+ϵ)(TN+ϵ)(FP+ϵ)(FN+ϵ)\text{DOR}_{corrected} = \frac{(TP + \epsilon)(TN + \epsilon)}{(FP + \epsilon)(FN + \epsilon)}

Log-transformed for symmetry: log(DOR)=log(TP+ϵ)+log(TN+ϵ)log(FP+ϵ)log(FN+ϵ)\log(\text{DOR}) = \log(TP + \epsilon) + \log(TN + \epsilon) - \log(FP + \epsilon) - \log(FN + \epsilon)

Interpretation: - log(DOR)>0\log(\text{DOR}) > 0: Gene is enriched in cluster - log(DOR)<0\log(\text{DOR}) < 0: Gene is depleted in cluster - log(DOR)=0\log(\text{DOR}) = 0: No association

5. Network Centrality Analysis

Graph Construction

The connectome is represented as a directed weighted graph: G=(V,E,w)G = (V, E, w)

Where: - VV: Cell populations (nodes) - EE: Signaling edges - ww: Edge weights

Kleinberg’s Hub and Authority Scores

Authority score (receiving importance): ai=jiwjihja_i = \sum_{j \rightarrow i} w_{ji} \cdot h_j

Hub score (sending importance): hi=ijwijajh_i = \sum_{i \rightarrow j} w_{ij} \cdot a_j

These are computed iteratively until convergence:

Initialize: h = a = 1/√n
Repeat until convergence:
    a' = A^T · h
    h' = A · a
    Normalize a' and h'
    a = a', h = h'

6. Differential Connectivity Analysis

Fold Change Computation

For two conditions (reference and test):

LFCijL=log2(Ei,LtestEi,Lref)\text{LFC}_{ij}^{L} = \log_2\left(\frac{E_{i,L}^{test}}{E_{i,L}^{ref}}\right)

LFCijR=log2(Ej,RtestEj,Rref)\text{LFC}_{ij}^{R} = \log_2\left(\frac{E_{j,R}^{test}}{E_{j,R}^{ref}}\right)

Perturbation Score

The overall perturbation score combines ligand and receptor changes:

SijLR=|LFCijL|×|LFCijR|S_{ij}^{LR} = |\text{LFC}_{ij}^{L}| \times |\text{LFC}_{ij}^{R}|

This captures edges where both components are differentially expressed.

7. Implementation Details

Computational Complexity

Operation Complexity
Expression averaging O(n×g)O(n \times g)
Edge construction O(k2×p)O(k^2 \times p)
P-value calculation O(k×g×n)O(k \times g \times n)
Centrality analysis O(k2×p)O(k^2 \times p)

Where: - nn: Number of cells - gg: Number of genes - kk: Number of cell populations - pp: Number of L-R pairs

Memory Efficiency

Connectome uses: - data.table for efficient data manipulation - Sparse matrix operations via Matrix package - Pre-allocated vectors to avoid memory fragmentation

References

  1. FANTOM5 Database: Ramilowski, J.A. et al. A draft network of ligand–receptor-mediated multicellular signalling in human. Nat Commun 6, 7866 (2015).

  2. Kleinberg Algorithm: Kleinberg, J.M. Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999).

  3. Haldane-Anscombe Correction: Agresti, A. Categorical Data Analysis. Wiley, 3rd edition (2013).

Session Info

sessionInfo()
#> R version 4.4.0 (2024-04-24)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS 15.6.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] C
#> 
#> time zone: Asia/Shanghai
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39     desc_1.4.3        R6_2.6.1          fastmap_1.2.0    
#>  [5] xfun_0.56         cachem_1.1.0      knitr_1.51        htmltools_0.5.9  
#>  [9] rmarkdown_2.30    lifecycle_1.0.5   cli_3.6.5         sass_0.4.10      
#> [13] pkgdown_2.1.3     textshaping_1.0.4 jquerylib_0.1.4   systemfonts_1.3.1
#> [17] compiler_4.4.0    tools_4.4.0       ragg_1.5.0        bslib_0.9.0      
#> [21] evaluate_1.0.5    yaml_2.3.12       otel_0.2.0        jsonlite_2.0.0   
#> [25] rlang_1.1.7       fs_1.6.6          htmlwidgets_1.6.4