📖 Documentation: https://zaoqu-liu.github.io/CellProgramMapper/
Overview
CellProgramMapper is a high-performance R package for projecting single-cell transcriptomic data onto reference gene expression programs (GEPs). The package implements non-negative matrix factorization (NMF)-based methods for systematic characterization of cellular transcriptional states.
Methodology
Mathematical Framework
Given a query expression matrix X ∈ ℝn×p (n cells × p genes) and a reference spectra matrix H ∈ ℝk×p (k programs × p genes), CellProgramMapper estimates the usage matrix W ∈ ℝn×k by solving:
For each cell i, this decomposes into independent Non-Negative Least Squares (NNLS) subproblems:
Implementation
Two NNLS solvers are provided:
| Method | Algorithm | Reference |
|---|---|---|
| Coordinate Descent | Sequential coordinate-wise optimization | Franc et al. (2005) |
| Active Set | Lawson-Hanson algorithm | Lawson & Hanson (1974) |
The coordinate descent method is generally faster for typical problem sizes, while the active set method provides guaranteed finite convergence.
Installation
From R-universe (Recommended)
install.packages("CellProgramMapper",
repos = "https://zaoqu-liu.r-universe.dev")From GitHub
# install.packages("remotes")
remotes::install_github("Zaoqu-Liu/CellProgramMapper")Quick Start
library(CellProgramMapper)
# Map cells to reference gene expression programs
result <- CellProgramMapper(
query = seurat_obj, # Seurat object, matrix, or file path
reference = "TCAT.V1", # Pre-built reference or custom file
method = "cd", # "cd" (coordinate descent) or "active_set"
verbose = TRUE
)
# Access results
usage_matrix <- result$usage_norm # Normalized usage (rows sum to 1)
scores <- result$scores # Computed add-on scores
# Integration with Seurat
seurat_obj <- add_results_to_seurat(seurat_obj, result)Available References
# List pre-built references
available_references()Building Custom References
Construct consensus GEPs from multiple cNMF analyses:
consensus <- BuildConsensusReference(
cnmf_paths = c("path/to/cnmf1", "path/to/cnmf2"),
ks = c(10, 15),
density_thresholds = c(0.1, 0.1),
output_dir = "./consensus_output",
corr_thresh = 0.5
)Performance
CellProgramMapper is optimized for computational efficiency:
- C++ Backend: Core NNLS solvers implemented in C++ via RcppArmadillo
- Sparse Matrix Support: Native handling of sparse matrices
- Parallel Processing: Optional parallelization via future framework
- Batch Processing: Memory-efficient processing of large datasets
Output Structure
The CellProgramMapper() function returns a CellProgramMapperResult object containing:
| Field | Description |
|---|---|
usage |
Raw usage matrix (cells × programs) |
usage_norm |
Normalized usage matrix (rows sum to 1) |
scores |
Computed add-on scores |
overlap_genes |
Genes used for mapping |
n_cells |
Number of cells processed |
n_programs |
Number of programs |
References
- Lawson CL, Hanson RJ (1974). Solving Least Squares Problems. Prentice-Hall.
- Franc V, Hlavac V, Navara M (2005). Sequential Coordinate-Wise Algorithm for the Non-negative Least Squares Problem. CAIP 2005.
- Lee DD, Seung HS (1999). Learning the parts of objects by non-negative matrix factorization. Nature 401:788-791.
Contact
- Author: Zaoqu Liu
- Email: liuzaoqu@163.com
- GitHub: https://github.com/Zaoqu-Liu/CellProgramMapper