📖 Documentation: https://zaoqu-liu.github.io/darwin/
darwin is an R package for automatic marker gene selection using multi-objective evolutionary optimization. It implements the NSGA-II algorithm to identify Pareto-optimal gene subsets for bulk RNA-seq deconvolution.
✨ Features
- Multi-objective optimization using NSGA-II algorithm
- High performance with C++ implementations via RcppArmadillo
- Flexible input: Supports matrices, data.frames, Seurat V4/V5, and SingleCellExperiment objects
- Multiple objectives: Correlation, distance, condition number, and custom functions
- Built-in deconvolution: NNLS, NuSVR, and linear regression methods
- Parallel computing support for large-scale problems
- Cross-platform: Works on Windows, macOS, and Linux
📦 Installation
From R-universe (Recommended)
install.packages("darwin", repos = "https://zaoqu-liu.r-universe.dev")From GitHub
# Install remotes if needed
install.packages("remotes")
# Install darwin
remotes::install_github("Zaoqu-Liu/darwin")🚀 Quick Start
library(darwin)
# Create reference expression matrix (cell types × genes)
set.seed(42)
reference <- matrix(abs(rnorm(500)), nrow = 5, ncol = 100)
rownames(reference) <- paste0("CellType", 1:5)
colnames(reference) <- paste0("Gene", 1:100)
# Initialize darwin
dw <- darwin(reference)
# Run optimization
dw$optimize(
ngen = 100, # Number of generations
objectives = c("correlation", "distance"), # Objectives to optimize
weights = c(-1, 1) # Minimize corr, maximize dist
)
# Visualize Pareto front
dw$plot()
# Select optimal solution
dw$select(weights = c(-1, 1))
# Get selected genes
genes <- dw$get_genes()
print(genes)
# Perform deconvolution
bulk <- matrix(abs(rnorm(300)), nrow = 3, ncol = 100)
colnames(bulk) <- colnames(reference)
result <- dw$deconvolve(bulk, method = "nnls")
print(result$proportions)📚 Documentation
- Getting Started - Basic usage tutorial
- Algorithm Theory - NSGA-II and multi-objective optimization
- Visualization Guide - Plotting and analysis
- Bulk Deconvolution - Complete deconvolution workflow
- Advanced Usage - Custom objectives and parallel computing
- Performance Benchmarks - Scaling and optimization tips
- Function Reference - Complete API documentation
📊 Supported Objective Functions
| Objective | Direction | Description |
|---|---|---|
correlation |
Minimize | Total pairwise correlation between cell types |
distance |
Maximize | Total pairwise Euclidean distance |
condition |
Minimize | Condition number of reference matrix |
| Custom | User-defined | Any function returning a scalar |
🔬 Methods
darwin uses the NSGA-II (Non-dominated Sorting Genetic Algorithm II) for multi-objective optimization:
- Non-dominated sorting: Solutions ranked by Pareto dominance
- Crowding distance: Maintains diversity in the Pareto front
- Tournament selection: Balances exploitation and exploration
- Genetic operators: Crossover and mutation for solution evolution
📖 Citation
If you use darwin in your research, please cite:
@software{darwin,
author = {Liu, Zaoqu},
title = {darwin: Multi-Objective Gene Selection for Bulk Deconvolution},
year = {2024},
url = {https://github.com/Zaoqu-Liu/darwin}
}The algorithm is based on:
Aliee, H., & Theis, F. J. (2021). AutoGeneS: Automatic gene selection using multi-objective optimization for RNA-seq deconvolution. Cell Systems, 12(7), 706-715.e4.
📄 License
MIT © Zaoqu Liu