Skip to contents

📖 Documentation: https://zaoqu-liu.github.io/CellProgramMapper/

Overview

CellProgramMapper is a high-performance R package for projecting single-cell transcriptomic data onto reference gene expression programs (GEPs). The package implements non-negative matrix factorization (NMF)-based methods for systematic characterization of cellular transcriptional states.

Methodology

Mathematical Framework

Given a query expression matrix X ∈ ℝn×p (n cells × p genes) and a reference spectra matrix H ∈ ℝk×p (k programs × p genes), CellProgramMapper estimates the usage matrix W ∈ ℝn×k by solving:

minW0XWHF2\min_{W \geq 0} \|X - WH\|_F^2

For each cell i, this decomposes into independent Non-Negative Least Squares (NNLS) subproblems:

minwi0xiHwi22\min_{w_i \geq 0} \|x_i - H^\top w_i\|_2^2

Implementation

Two NNLS solvers are provided:

Method Algorithm Reference
Coordinate Descent Sequential coordinate-wise optimization Franc et al. (2005)
Active Set Lawson-Hanson algorithm Lawson & Hanson (1974)

The coordinate descent method is generally faster for typical problem sizes, while the active set method provides guaranteed finite convergence.

Preprocessing

Input data undergoes standardization by scaling each gene by its population standard deviation (without centering):

xj=xjσj,σj=1ni=1n(xijxj)2x'_j = \frac{x_j}{\sigma_j}, \quad \sigma_j = \sqrt{\frac{1}{n}\sum_{i=1}^n (x_{ij} - \bar{x}_j)^2}

This matches the preprocessing in sklearn.preprocessing.scale(X, with_mean=False).

Installation

install.packages("CellProgramMapper", 
                 repos = "https://zaoqu-liu.r-universe.dev")

From GitHub

# install.packages("remotes")
remotes::install_github("Zaoqu-Liu/CellProgramMapper")

Dependencies

Required: - R (≥ 4.0.0) - Rcpp, RcppArmadillo - Matrix, data.table - curl, yaml, rappdirs - future, future.apply

Optional: - Seurat/SeuratObject (for Seurat integration) - hdf5r, anndata (for h5ad file support)

Quick Start

library(CellProgramMapper)

# Map cells to reference gene expression programs
result <- CellProgramMapper(
    query = seurat_obj,        # Seurat object, matrix, or file path
    reference = "TCAT.V1",     # Pre-built reference or custom file
    method = "cd",             # "cd" (coordinate descent) or "active_set"
    verbose = TRUE
)

# Access results
usage_matrix <- result$usage_norm   # Normalized usage (rows sum to 1)
scores <- result$scores             # Computed add-on scores

# Integration with Seurat
seurat_obj <- add_results_to_seurat(seurat_obj, result)

Available References

# List pre-built references
available_references()

Building Custom References

Construct consensus GEPs from multiple cNMF analyses:

consensus <- BuildConsensusReference(
    cnmf_paths = c("path/to/cnmf1", "path/to/cnmf2"),
    ks = c(10, 15),
    density_thresholds = c(0.1, 0.1),
    output_dir = "./consensus_output",
    corr_thresh = 0.5
)

Performance

CellProgramMapper is optimized for computational efficiency:

  • C++ Backend: Core NNLS solvers implemented in C++ via RcppArmadillo
  • Sparse Matrix Support: Native handling of sparse matrices
  • Parallel Processing: Optional parallelization via future framework
  • Batch Processing: Memory-efficient processing of large datasets

Output Structure

The CellProgramMapper() function returns a CellProgramMapperResult object containing:

Field Description
usage Raw usage matrix (cells × programs)
usage_norm Normalized usage matrix (rows sum to 1)
scores Computed add-on scores
overlap_genes Genes used for mapping
n_cells Number of cells processed
n_programs Number of programs

Documentation

Detailed documentation and tutorials are available at:

References

  1. Lawson CL, Hanson RJ (1974). Solving Least Squares Problems. Prentice-Hall.
  2. Franc V, Hlavac V, Navara M (2005). Sequential Coordinate-Wise Algorithm for the Non-negative Least Squares Problem. CAIP 2005.
  3. Lee DD, Seung HS (1999). Learning the parts of objects by non-negative matrix factorization. Nature 401:788-791.

Citation

If you use CellProgramMapper in your research, please cite:

@software{CellProgramMapper,
  author = {Liu, Zaoqu},
  title = {CellProgramMapper: Projection of Single-Cell Data onto Reference Gene Expression Programs},
  year = {2026},
  url = {https://github.com/Zaoqu-Liu/CellProgramMapper}
}

License

MIT License © 2026 Zaoqu Liu

Contact