Skip to contents

hdWGCNA: High-Dimensional Weighted Gene Co-expression Network Analysis

Overview

hdWGCNA is a comprehensive R package designed for weighted gene co-expression network analysis (WGCNA) in high-dimensional transcriptomics data. The package extends the classical WGCNA methodology to handle the unique challenges of single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) data.

Background and Motivation

The WGCNA Framework

Weighted Gene Co-expression Network Analysis (WGCNA) is a systems biology method for identifying clusters (modules) of highly correlated genes. Originally developed for bulk RNA-seq data, WGCNA has become a cornerstone of transcriptomics analysis due to its ability to:

  • Identify biologically meaningful gene modules
  • Relate modules to external traits and conditions
  • Discover hub genes with potential regulatory roles
  • Enable cross-dataset module comparison

Challenges in Single-Cell Data

Single-cell RNA-seq data presents unique challenges for co-expression analysis:

Challenge Description hdWGCNA Solution
Sparsity >90% zeros in expression matrices Metacell aggregation
Heterogeneity Multiple cell types in one dataset Cell-type-specific networks
Noise High technical variability Robust correlation methods
Scale Thousands of cells Memory-efficient algorithms

The hdWGCNA Solution

hdWGCNA introduces the concept of metacells - aggregated expression profiles from groups of similar cells. This approach:

  1. Reduces data sparsity by averaging expression across similar cells
  2. Maintains biological heterogeneity by respecting cell type boundaries
  3. Enables robust correlation estimation
  4. Dramatically reduces computational burden

Key Features

1. Metacell/Metaspot Aggregation

The metacell algorithm uses k-nearest neighbor graphs to identify and aggregate similar cells while controlling for overlap between metacells.

The metacell algorithm uses k-nearest neighbor graphs to identify and aggregate similar cells while controlling for overlap between metacells.

2. Flexible Network Construction

  • Signed networks: Capture both positive and negative correlations
  • Unsigned networks: Focus on correlation strength
  • Consensus networks: Combine multiple datasets

3. Comprehensive Module Analysis

Module Eigengenes: Summarize entire module expression in a single value per cell

Hub Gene Identification: Find central regulators with high intramodular connectivity

Module Preservation: Assess reproducibility across datasets

4. Biological Interpretation

  • Gene set enrichment via Enrichr
  • Module-trait correlation analysis
  • Differential module eigengene testing
  • Integration with protein-protein interaction networks

5. Transcription Factor Networks

hdWGCNA now includes functionality for inferring transcription factor (TF) regulatory networks using XGBoost-based modeling.

Package Architecture

hdWGCNA Architecture
├── Data Setup
│   ├── SetupForWGCNA()
│   ├── SelectNetworkGenes()
│   └── FindMajorIsoforms()
│
├── Metacell Construction
│   ├── MetacellsByGroups()
│   ├── MetaspotsByGroups()
│   └── NormalizeMetacells()
│
├── Network Analysis
│   ├── TestSoftPowers()
│   ├── ConstructNetwork()
│   ├── ModuleEigengenes()
│   └── ModuleConnectivity()
│
├── Biological Context
│   ├── RunEnrichr()
│   ├── ModuleTraitCorrelation()
│   ├── FindDMEs()
│   └── ModulePreservation()
│
└── TF Networks
    ├── MotifScan()
    ├── ConstructTFNetwork()
    └── RegulonScores()

Integration with Seurat

hdWGCNA is designed for seamless integration with the Seurat ecosystem. All hdWGCNA data is stored within the Seurat object’s @misc slot, ensuring:

  • Easy data management and sharing
  • Compatibility with standard Seurat workflows
  • Support for both Seurat v4 and v5

Comparison with Other Methods

Feature hdWGCNA WGCNA Monocle SCENIC
Single-cell support Limited
Spatial transcriptomics
Module detection
TF network inference
Seurat integration
Module preservation

Citation

If you use hdWGCNA in your research, please cite:

Morabito S, Reese F, Rahimzadeh N, Miyoshi E, Swarup V. hdWGCNA identifies co-expression networks in high-dimensional transcriptomics data. Cell Reports Methods (2023). DOI: 10.1016/j.crmeth.2023.100498

For TF network analysis:

Childs JE, Morabito S, et al. Relapse to cocaine seeking is regulated by medial habenula NR4A2/NURR1 in mice. Cell Reports (2024). DOI: 10.1016/j.celrep.2024.113956

What’s Next?

Ready to get started? Check out:

  1. Quick Start Guide - Get running in 10 minutes
  2. Algorithm Overview - Understand the mathematics
  3. Single-cell Tutorial - Comprehensive walkthrough

Session Information

## R version 4.4.0 (2024-04-24)
## Platform: aarch64-apple-darwin20
## Running under: macOS 15.6.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] C
## 
## time zone: Asia/Shanghai
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.39     desc_1.4.3        R6_2.6.1          fastmap_1.2.0    
##  [5] xfun_0.56         cachem_1.1.0      knitr_1.51        htmltools_0.5.9  
##  [9] rmarkdown_2.30    lifecycle_1.0.5   cli_3.6.5         sass_0.4.10      
## [13] pkgdown_2.1.3     textshaping_1.0.4 jquerylib_0.1.4   systemfonts_1.3.1
## [17] compiler_4.4.0    tools_4.4.0       ragg_1.5.0        bslib_0.9.0      
## [21] evaluate_1.0.5    yaml_2.3.12       otel_0.2.0        jsonlite_2.0.0   
## [25] rlang_1.1.7       fs_1.6.6          htmlwidgets_1.6.4