NNLS Solver Implementation Details

Overview

CellProgramMapper implements two algorithms for solving the Non-Negative Least Squares (NNLS) problem:

Coordinate Descent (default, faster)
Active Set Method (Lawson-Hanson algorithm)

Both solvers are implemented in C++ using RcppArmadillo for optimal performance.

The NNLS Problem

Given matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ and vector $\mathbf{b} \in \mathbb{R}^m$ , NNLS solves:

$\min_{\mathbf{x} \geq 0} \|\mathbf{A}\mathbf{x} - \mathbf{b}\|_2^2$

Active Set Method (Lawson-Hanson)

Algorithm Description

The active set method partitions variables into two sets:

P (Positive set): Variables that can be positive
Z (Zero set): Variables constrained to zero

Algorithm: Lawson-Hanson NNLS
Input: A (m x n matrix), b (m-vector)
Output: x (n-vector) with x >= 0

1. Initialize: x = 0, P = empty, Z = {1,...,n}
2. Compute gradient: w = A'(b - Ax)
3. While Z is not empty and max(w[Z]) > 0:
   a. Move index of max(w[Z]) from Z to P
   b. Inner loop:
      - Solve unconstrained LS on P: z[P] = argmin ||A[:,P]z - b||^2
      - While any z[P] <= 0:
        * Compute step alpha to boundary
        * Update x = x + alpha(z - x)
        * Move zero indices from P to Z
   c. Update gradient: w = A'(b - Ax)
4. Return x

Convergence Properties

Finite termination: Guaranteed to terminate in at most $\binom{n+m}{n}$ iterations
Exact solution: Returns the globally optimal solution
Numerical stability: Uses QR decomposition for least squares subproblems

Implementation

// Simplified C++ implementation (actual code in src/nnls_solver.cpp)
arma::vec nnls_single(const arma::mat& A, const arma::vec& b, 
                      int max_iter, double tol) {
    int n = A.n_cols;
    arma::vec x = arma::zeros(n);
    arma::vec w = A.t() * b;  // Initial gradient
    
    arma::uvec P_set, Z_set = arma::regspace<arma::uvec>(0, n-1);
    
    while (Z_set.n_elem > 0 && iter < max_iter) {
        // Find index with max positive gradient in Z
        arma::uword t = Z_set(w.elem(Z_set).index_max());
        if (w(t) <= tol) break;
        
        // Move t from Z to P
        P_set = join_cols(P_set, {t});
        Z_set.shed_row(find_idx);
        
        // Inner loop: solve unconstrained subproblem
        while (true) {
            arma::vec z_P = solve(A.cols(P_set), b);
            if (all(z_P > 0)) {
                x.elem(P_set) = z_P;
                break;
            }
            // Handle negative values...
        }
        w = A.t() * (b - A * x);
    }
    return x;
}

Coordinate Descent Method

Algorithm Description

Coordinate descent optimizes one variable at a time while keeping others fixed.

For NNLS, we work with the normal equations: $\mathbf{Q}\mathbf{x} = \mathbf{c}$

where $\mathbf{Q} = \mathbf{A}^\top\mathbf{A}$ and $\mathbf{c} = \mathbf{A}^\top\mathbf{b}$ .

Algorithm: Coordinate Descent NNLS
Input: Q = A'A, c = A'b
Output: x (n-vector) with x >= 0

1. Initialize: x = 0, g = -c (gradient at x=0)
2. Repeat until convergence:
   For j = 1 to n:
     a. Compute update: delta = -g[j] / Q[j,j]
     b. Project: x_new = max(0, x[j] + delta)
     c. Update gradient: g = g + (x_new - x[j]) * Q[:,j]
     d. x[j] = x_new
3. Return x

Advantages

Simplicity: Easy to implement and understand
Efficiency: Each iteration is $O(n)$ for gradient update
Precomputation: $\mathbf{Q} = \mathbf{A}^\top\mathbf{A}$ computed once for batch processing

Implementation

// Simplified C++ implementation
arma::vec nnls_coord_descent(const arma::mat& AtA, const arma::vec& Atb,
                             int max_iter, double tol) {
    int n = AtA.n_cols;
    arma::vec x = arma::zeros(n);
    arma::vec grad = -Atb;  // Gradient at x=0
    
    for (int iter = 0; iter < max_iter; iter++) {
        double max_change = 0;
        
        for (int j = 0; j < n; j++) {
            if (AtA(j,j) < 1e-12) continue;  // Skip zero diagonal
            
            double old_xj = x(j);
            double new_xj = std::max(0.0, old_xj - grad(j) / AtA(j,j));
            
            if (new_xj != old_xj) {
                double delta = new_xj - old_xj;
                x(j) = new_xj;
                grad += delta * AtA.col(j);  // Efficient gradient update
                max_change = std::max(max_change, std::abs(delta));
            }
        }
        
        if (max_change < tol) break;
    }
    return x;
}

Comparison

Aspect	Active Set	Coordinate Descent
Time per iteration	O(k^3)	O(k^2)
Iterations to converge	Usually fewer	Usually more
Memory	O(p*k)	O(k^2)
Convergence guarantee	Finite	Linear
Best for	Small k	Large k

Benchmark

library(CellProgramMapper)
#> CellProgramMapper v1.0.0
#> Map single cells to reference gene expression programs
#> GitHub: https://github.com/Zaoqu-Liu/CellProgramMapper

# Benchmark function using CellProgramMapper
benchmark_solver <- function(n_cells, n_genes, n_programs, method) {
  set.seed(42)
  X <- matrix(abs(rnorm(n_cells * n_genes)), nrow = n_cells)
  colnames(X) <- paste0("Gene", 1:n_genes)
  H <- matrix(abs(rnorm(n_programs * n_genes)), nrow = n_programs)
  colnames(H) <- paste0("Gene", 1:n_genes)
  rownames(H) <- paste0("GEP", 1:n_programs)
  
  start_time <- Sys.time()
  result <- CellProgramMapper(
    query = X,
    reference = H,
    method = method,
    verbose = FALSE
  )
  end_time <- Sys.time()
  
  as.numeric(difftime(end_time, start_time, units = "secs"))
}

# Run benchmarks
n_cells_vec <- c(100, 500, 1000)
n_genes <- 200
n_programs <- 10

results <- data.frame()
for (n_cells in n_cells_vec) {
  for (method in c("cd", "active_set")) {
    time_taken <- benchmark_solver(n_cells, n_genes, n_programs, method)
    results <- rbind(results, data.frame(
      n_cells = n_cells,
      method = method,
      time = time_taken
    ))
  }
}
#> Warning: Query data does not appear to be integer counts. For best results,
#> provide raw UMI/read counts.
#> Warning: Query data does not appear to be integer counts. For best results,
#> provide raw UMI/read counts.
#> Warning: Query data does not appear to be integer counts. For best results,
#> provide raw UMI/read counts.
#> Warning: Query data does not appear to be integer counts. For best results,
#> provide raw UMI/read counts.
#> Warning: Query data does not appear to be integer counts. For best results,
#> provide raw UMI/read counts.
#> Warning: Query data does not appear to be integer counts. For best results,
#> provide raw UMI/read counts.

# Plot
par(mar = c(4, 4, 2, 1))
cd_times <- results$time[results$method == "cd"]
as_times <- results$time[results$method == "active_set"]

barplot(rbind(cd_times, as_times), beside = TRUE,
        names.arg = n_cells_vec,
        col = c("#1976d2", "#388e3c"),
        xlab = "Number of cells",
        ylab = "Time (seconds)",
        main = "NNLS Solver Performance")
legend("topleft", legend = c("Coordinate Descent", "Active Set"),
       fill = c("#1976d2", "#388e3c"), bty = "n")

Solver performance comparison

Choosing a Solver

Use Coordinate Descent (default) when:

Processing large numbers of cells (>1000)
Number of programs is moderate (5-50)
Speed is the priority

Use Active Set when:

High numerical precision is required
Number of programs is small (<10)
Guaranteed finite convergence is important

# Use coordinate descent (default)
result <- CellProgramMapper(query, reference, method = "cd")

# Use active set method
result <- CellProgramMapper(query, reference, method = "active_set")

Numerical Considerations

Tolerance Parameter

The tol parameter controls convergence:

Smaller values: more iterations, higher precision
Larger values: fewer iterations, faster but less precise
Default: 1e-8 (good balance)

Maximum Iterations

The max_iter parameter prevents infinite loops:

Default: 1000 (sufficient for most problems)
Increase for very large programs or ill-conditioned problems

Division by Zero Protection

Both solvers include protection against division by zero:

if (AtA(j,j) < 1e-12) continue;  // Skip near-zero diagonal

Session Info

sessionInfo()
#> R version 4.4.0 (2024-04-24)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS 15.6.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] C
#> 
#> time zone: Asia/Shanghai
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] CellProgramMapper_1.0.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] cli_3.6.5           knitr_1.51          rlang_1.1.7        
#>  [4] xfun_0.56           otel_0.2.0          textshaping_1.0.4  
#>  [7] data.table_1.18.0   jsonlite_2.0.0      future.apply_1.20.1
#> [10] listenv_0.10.0      htmltools_0.5.9     ragg_1.5.0         
#> [13] sass_0.4.10         rappdirs_0.3.4      rmarkdown_2.30     
#> [16] grid_4.4.0          evaluate_1.0.5      jquerylib_0.1.4    
#> [19] fastmap_1.2.0       yaml_2.3.12         lifecycle_1.0.5    
#> [22] compiler_4.4.0      codetools_0.2-20    fs_1.6.6           
#> [25] Rcpp_1.1.1          htmlwidgets_1.6.4   future_1.69.0      
#> [28] systemfonts_1.3.1   lattice_0.22-7      digest_0.6.39      
#> [31] R6_2.6.1            parallelly_1.46.1   parallel_4.4.0     
#> [34] curl_7.0.0          bslib_0.9.0         Matrix_1.7-4       
#> [37] tools_4.4.0         globals_0.18.0      pkgdown_2.2.0      
#> [40] cachem_1.1.0        desc_1.4.3

Zaoqu Liu

2026-01-26

Overview

The NNLS Problem

Active Set Method (Lawson-Hanson)

Algorithm Description

Convergence Properties

Implementation

Coordinate Descent Method

Algorithm Description

Advantages

Implementation

Comparison

Benchmark

Choosing a Solver

Numerical Considerations

Tolerance Parameter

Maximum Iterations

Division by Zero Protection

Session Info