Skip to contents

Normalize confusion matrix relative to total cell count

Usage

normalize_confmat_r2(cmat)

Arguments

cmat

Confusion matrix (from calc_confusion_matrix). Rows represent true labels, columns represent predicted labels.

Value

Symmetric matrix of pairwise R2-normalized confusion values. Values range from 0 to 1, representing the fraction of total cells misclassified between each cluster pair.

Details

R2 normalization measures the overall impact of confusion between cluster pairs on the entire dataset.

For each pair (i, j), compute: $$R2(i,j) = \frac{C_{ij} + C_{ji}}{N}$$

where:

  • \(C_{ij}\) = cells truly in cluster i but predicted as cluster j

  • \(C_{ji}\) = cells truly in cluster j but predicted as cluster i

  • \(N\) = total number of cells in the test set

This gives the fraction of total cells that are confused between clusters i and j. Unlike R1, R2 values are always between 0 and 1.

References

Miao, Z., et al. (2020). Putative cell type discovery from single-cell gene expression data. Nature Methods.

Examples

cmat <- matrix(c(90, 5, 5, 3, 85, 12, 2, 10, 88), nrow = 3)
rownames(cmat) <- colnames(cmat) <- c("A", "B", "C")
normalize_confmat_r2(cmat)
#>            A          B          C
#> A 0.00000000 0.02666667 0.02333333
#> B 0.02666667 0.00000000 0.07333333
#> C 0.02333333 0.07333333 0.00000000