hdWGCNA: High-Dimensional Weighted Gene Co-expression Network Analysis
Overview
hdWGCNA is a comprehensive R package designed for weighted gene co-expression network analysis (WGCNA) in high-dimensional transcriptomics data. The package extends the classical WGCNA methodology to handle the unique challenges of single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) data.
Background and Motivation
The WGCNA Framework
Weighted Gene Co-expression Network Analysis (WGCNA) is a systems biology method for identifying clusters (modules) of highly correlated genes. Originally developed for bulk RNA-seq data, WGCNA has become a cornerstone of transcriptomics analysis due to its ability to:
- Identify biologically meaningful gene modules
- Relate modules to external traits and conditions
- Discover hub genes with potential regulatory roles
- Enable cross-dataset module comparison
Challenges in Single-Cell Data
Single-cell RNA-seq data presents unique challenges for co-expression analysis:
| Challenge | Description | hdWGCNA Solution |
|---|---|---|
| Sparsity | >90% zeros in expression matrices | Metacell aggregation |
| Heterogeneity | Multiple cell types in one dataset | Cell-type-specific networks |
| Noise | High technical variability | Robust correlation methods |
| Scale | Thousands of cells | Memory-efficient algorithms |
The hdWGCNA Solution
hdWGCNA introduces the concept of metacells - aggregated expression profiles from groups of similar cells. This approach:
- Reduces data sparsity by averaging expression across similar cells
- Maintains biological heterogeneity by respecting cell type boundaries
- Enables robust correlation estimation
- Dramatically reduces computational burden
Key Features
1. Metacell/Metaspot Aggregation
The metacell algorithm uses k-nearest neighbor graphs to identify and aggregate similar cells while controlling for overlap between metacells.
The metacell algorithm uses k-nearest neighbor graphs to identify and aggregate similar cells while controlling for overlap between metacells.
2. Flexible Network Construction
- Signed networks: Capture both positive and negative correlations
- Unsigned networks: Focus on correlation strength
- Consensus networks: Combine multiple datasets
3. Comprehensive Module Analysis
Module Eigengenes: Summarize entire module expression in a single value per cell
Hub Gene Identification: Find central regulators with high intramodular connectivity
Module Preservation: Assess reproducibility across datasets
Package Architecture
hdWGCNA Architecture
├── Data Setup
│ ├── SetupForWGCNA()
│ ├── SelectNetworkGenes()
│ └── FindMajorIsoforms()
│
├── Metacell Construction
│ ├── MetacellsByGroups()
│ ├── MetaspotsByGroups()
│ └── NormalizeMetacells()
│
├── Network Analysis
│ ├── TestSoftPowers()
│ ├── ConstructNetwork()
│ ├── ModuleEigengenes()
│ └── ModuleConnectivity()
│
├── Biological Context
│ ├── RunEnrichr()
│ ├── ModuleTraitCorrelation()
│ ├── FindDMEs()
│ └── ModulePreservation()
│
└── TF Networks
├── MotifScan()
├── ConstructTFNetwork()
└── RegulonScores()
Integration with Seurat
hdWGCNA is designed for seamless integration with the Seurat ecosystem. All hdWGCNA
data is stored within the Seurat object’s @misc slot,
ensuring:
- Easy data management and sharing
- Compatibility with standard Seurat workflows
- Support for both Seurat v4 and v5
Comparison with Other Methods
| Feature | hdWGCNA | WGCNA | Monocle | SCENIC |
|---|---|---|---|---|
| Single-cell support | ✓ | Limited | ✓ | ✓ |
| Spatial transcriptomics | ✓ | ✗ | ✗ | ✗ |
| Module detection | ✓ | ✓ | ✗ | ✓ |
| TF network inference | ✓ | ✗ | ✗ | ✓ |
| Seurat integration | ✓ | ✗ | ✗ | ✗ |
| Module preservation | ✓ | ✓ | ✗ | ✗ |
Citation
If you use hdWGCNA in your research, please cite:
Morabito S, Reese F, Rahimzadeh N, Miyoshi E, Swarup V. hdWGCNA identifies co-expression networks in high-dimensional transcriptomics data. Cell Reports Methods (2023). DOI: 10.1016/j.crmeth.2023.100498
For TF network analysis:
Childs JE, Morabito S, et al. Relapse to cocaine seeking is regulated by medial habenula NR4A2/NURR1 in mice. Cell Reports (2024). DOI: 10.1016/j.celrep.2024.113956
Getting Help
- Documentation: https://zaoqu-liu.github.io/hdWGCNA/
- GitHub Issues: https://github.com/Zaoqu-Liu/hdWGCNA/issues
- R-universe: https://zaoqu-liu.r-universe.dev/hdWGCNA
What’s Next?
Ready to get started? Check out:
- Quick Start Guide - Get running in 10 minutes
- Algorithm Overview - Understand the mathematics
- Single-cell Tutorial - Comprehensive walkthrough
Session Information
## R version 4.4.0 (2024-04-24)
## Platform: aarch64-apple-darwin20
## Running under: macOS 15.6.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] C
##
## time zone: Asia/Shanghai
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.39 desc_1.4.3 R6_2.6.1 fastmap_1.2.0
## [5] xfun_0.56 cachem_1.1.0 knitr_1.51 htmltools_0.5.9
## [9] rmarkdown_2.30 lifecycle_1.0.5 cli_3.6.5 sass_0.4.10
## [13] pkgdown_2.1.3 textshaping_1.0.4 jquerylib_0.1.4 systemfonts_1.3.1
## [17] compiler_4.4.0 tools_4.4.0 ragg_1.5.0 bslib_0.9.0
## [21] evaluate_1.0.5 yaml_2.3.12 otel_0.2.0 jsonlite_2.0.0
## [25] rlang_1.1.7 fs_1.6.6 htmlwidgets_1.6.4
