Introduction
hdWGCNA (High-Dimensional Weighted Gene Co-expression Network Analysis) extends the classical WGCNA framework to handle the unique challenges posed by single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics data. This vignette provides a comprehensive overview of the underlying algorithms and mathematical frameworks.
The Challenge of Single-Cell Data
Traditional WGCNA was designed for bulk RNA-seq data, where each sample provides a robust estimate of gene expression. Single-cell data presents unique challenges:
- Sparsity: Many genes have zero counts in individual cells due to dropout
- High dimensionality: Thousands of cells with thousands of genes
- Biological heterogeneity: Multiple cell types with distinct expression programs
- Technical noise: Cell-to-cell variability from technical factors
hdWGCNA addresses these challenges through metacell aggregation and cell-type-specific network construction.
Part 1: Metacell Aggregation
Motivation
The sparsity of single-cell data ( zeros in many datasets) makes direct correlation computation unreliable. Metacells aggregate similar cells to create “pseudo-samples” with more robust expression estimates.
Algorithm
Step 1: K-Nearest Neighbor (KNN) Graph Construction
For each cell , we identify its nearest neighbors in a reduced dimensional space (typically PCA or UMAP embeddings):
where is the Euclidean distance in the embedding space.
Step 2: Metacell Seed Selection
We use an iterative bootstrapping algorithm to select metacell seeds that maximize coverage while minimizing overlap:
Algorithm: MetacellSeedSelection
Input: Cell set C, KNN graph, max_shared threshold τ
Output: Metacell seeds S
1. Initialize: S = {}, available = C
2. While |available| > 0 and |S| < target:
a. Randomly sample seed cell s from available
b. Get neighbors: N_s = KNN(s) ∪ {s}
c. For each existing seed t ∈ S:
- Compute overlap: O_st = |N_s ∩ N_t|
- If O_st > τ: reject s, goto step 2a
d. Accept s: S = S ∪ {s}
e. Update available (optional)
3. Return S
Part 2: Weighted Gene Co-expression Network Construction
Correlation Matrix
The gene-gene correlation matrix is computed from the metacell expression matrix:
where represents the expression profile of gene across all metacells.
hdWGCNA supports multiple correlation methods: - Pearson correlation: Standard linear correlation - Bicor (biweight midcorrelation): Robust to outliers
Soft Power Thresholding
The adjacency matrix is computed using a soft power transformation:
where is the soft power threshold. This transformation emphasizes strong correlations while preserving weak ones.
Scale-Free Topology Criterion
The optimal is selected to approximate scale-free network topology. For a scale-free network, the connectivity distribution follows:
We assess scale-free topology fit using:
A value of indicates good scale-free topology fit.
# Example: Testing soft powers
library(hdWGCNA)
seurat_obj <- TestSoftPowers(seurat_obj, powers = c(1:10, seq(12, 30, by=2)))Part 3: Module Detection
Hierarchical Clustering
Genes are clustered using average linkage hierarchical clustering on the TOM-based dissimilarity matrix.
Part 4: Module Eigengenes
Part 5: Transcription Factor Network Analysis
Part 6: Statistical Framework
Module Preservation
Module preservation statistics assess whether modules identified in one dataset are reproducible in another:
Computational Considerations
References
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics (2008).
Morabito S, et al. hdWGCNA identifies co-expression networks in high-dimensional transcriptomics data. Cell Reports Methods (2023).
Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology (2005).
Langfelder P, et al. Is My Network Module Preserved and Reproducible? PLoS Computational Biology (2011).
Session Information
## R version 4.4.0 (2024-04-24)
## Platform: aarch64-apple-darwin20
## Running under: macOS 15.6.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] C
##
## time zone: Asia/Shanghai
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.39 desc_1.4.3 R6_2.6.1 fastmap_1.2.0
## [5] xfun_0.56 cachem_1.1.0 knitr_1.51 htmltools_0.5.9
## [9] rmarkdown_2.30 lifecycle_1.0.5 cli_3.6.5 sass_0.4.10
## [13] pkgdown_2.1.3 textshaping_1.0.4 jquerylib_0.1.4 systemfonts_1.3.1
## [17] compiler_4.4.0 tools_4.4.0 ragg_1.5.0 bslib_0.9.0
## [21] evaluate_1.0.5 yaml_2.3.12 otel_0.2.0 jsonlite_2.0.0
## [25] rlang_1.1.7 fs_1.6.6 htmlwidgets_1.6.4
