Skip to contents

Normalize confusion matrix relative to correctly classified cells

Usage

normalize_confmat_r1(cmat, mode = "1")

Arguments

cmat

Confusion matrix (from calc_confusion_matrix). Rows represent true labels, columns represent predicted labels.

mode

Normalization mode: "1" (default, as in SCCAF) or "2"

Value

Symmetric matrix of pairwise R1-normalized confusion values. Values typically range from 0 to >1 (can exceed 1 when misclassifications outnumber correct classifications).

Details

R1 normalization measures the confusion rate between cluster pairs relative to the number of correctly classified cells.

For each pair (i, j), compute: $$R1(i,j) = \max\left(\frac{C_{ij}}{C_{jj}}, \frac{C_{ji}}{C_{ii}}\right)$$

where:

  • \(C_{ij}\) = cells truly in cluster i but predicted as cluster j

  • \(C_{jj}\) = cells truly in cluster j and correctly predicted (diagonal)

The ratio \(C_{ij}/C_{jj}\) represents how many cells from cluster i are misclassified as j, relative to the correctly classified cells in j. A high R1 value indicates substantial confusion between the cluster pair.

References

Miao, Z., et al. (2020). Putative cell type discovery from single-cell gene expression data. Nature Methods.

Examples

cmat <- matrix(c(90, 5, 5, 3, 85, 12, 2, 10, 88), nrow = 3)
rownames(cmat) <- colnames(cmat) <- c("A", "B", "C")
normalize_confmat_r1(cmat)
#>            A          B          C
#> A 0.00000000 0.05555556 0.05555556
#> B 0.05555556 0.00000000 0.14117647
#> C 0.05555556 0.14117647 0.00000000