A Scalable Framework for Copy Number Variation Inference
from Single-Cell and Spatial Transcriptomics Data
📖 Documentation | 📦 R-universe | 💻 GitHub
Overview
fastCNV is an R package designed for efficient and accurate inference of copy number variations (CNVs) from transcriptomic data. It addresses the computational challenges associated with CNV detection in large-scale single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) datasets, including 10X Genomics Visium and Visium HD platforms.
The algorithm employs a sliding window approach across the genome, integrating expression profiles from genomically ordered genes to infer chromosomal amplifications and deletions. By leveraging reference cell populations (e.g., non-malignant cells), fastCNV effectively distinguishes tumor-associated CNV signals from technical noise.
Key Capabilities
- Computational Efficiency: Optimized for large-scale datasets (>100,000 cells)
- Multi-Platform Support: Compatible with scRNA-seq, Visium, and Visium HD data
- Reference-Based Normalization: Robust CNV inference using normal cell populations
- Clonal Architecture Analysis: Hierarchical clustering and phylogenetic tree reconstruction
- Seurat Integration: Seamless compatibility with Seurat 4.x and 5.x workflows
Installation
From R-universe (Recommended)
install.packages("fastCNV", repos = "https://zaoqu-liu.r-universe.dev")From GitHub
if (!requireNamespace("remotes", quietly = TRUE)) {
install.packages("remotes")
}
remotes::install_github("Zaoqu-Liu/fastCNV")Methodology
The CNV inference pipeline consists of the following steps:
- Gene Ordering: Genes are ordered by their genomic coordinates (chromosome and position)
- Expression Smoothing: A sliding window approach aggregates expression across neighboring genes
- Reference Normalization: Expression values are centered using reference (non-malignant) cells
- Score Computation: CNV scores represent relative deviations from the reference baseline
- Thresholding: Quantile-based filtering removes background noise
- Clustering: Hierarchical clustering identifies distinct CNV subpopulations
Usage
Basic Analysis
library(fastCNV)
# Perform CNV analysis on a Seurat object
result <- fastCNV(
seuratObj = seurat_object,
sampleName = "Sample1",
referenceVar = "cell_type",
referenceLabel = c("Normal_epithelial", "Fibroblast"),
prepareCounts = TRUE,
getCNVPerChromosomeArm = TRUE,
getCNVClusters = TRUE,
doPlot = TRUE
)
# Inspect CNV clusters
table(result$cnv_clusters)Visium HD Data
For high-resolution spatial transcriptomics:
result <- fastCNV_10XHD(
seuratObjHD = visium_hd_object,
sampleName = "Visium_HD_Sample",
referenceVar = "region_annotation",
referenceLabel = c("Normal_tissue"),
doPlot = TRUE
)Supported Data Types
| Platform | Function | Resolution |
|---|---|---|
| scRNA-seq | fastCNV() |
Single-cell |
| 10X Visium | fastCNV() |
55 µm spots |
| 10X Visium HD | fastCNV_10XHD() |
8/16 µm bins |
Core Functions
| Function | Description |
|---|---|
fastCNV() |
Main CNV inference pipeline |
fastCNV_10XHD() |
Specialized pipeline for Visium HD |
CNVCalling() |
Core CNV score computation |
CNVCluster() |
Hierarchical clustering of CNV profiles |
CNVClassification() |
Cell classification based on CNV patterns |
CNVTree() |
Phylogenetic tree construction from CNV profiles |
plotCNVResults() |
Heatmap visualization of CNV landscapes |
plotCNVTree() |
Phylogenetic tree visualization |
Performance Benchmarks
Typical runtime on a standard workstation (8 cores, 32 GB RAM):
| Dataset Size | Platform | Runtime |
|---|---|---|
| ~5,000 cells | scRNA-seq | ~1 min |
| ~20,000 cells | scRNA-seq | ~5 min |
| ~50,000 cells | Visium | ~15 min |
| ~200,000 bins | Visium HD (16 µm) | ~40 min |
Frequently Asked Questions
Q: Is mouse genome supported?
A: The current version is optimized for the human genome (GRCh38/hg38). Support for mouse (mm10/mm39) is planned for future releases.
Q: Can CNV analysis be performed without reference cells?
A: While technically possible, reference cells significantly improve accuracy by providing a baseline for normalization. We strongly recommend including normal/non-malignant cells as references.
Q: How should I select the window size parameter?
A: The default value (windowSize = 150) is suitable for most applications. Smaller values (e.g., 100) increase resolution but may introduce noise; larger values (e.g., 200) provide smoother profiles.
Citation
If fastCNV contributes to your research, please cite:
Cabrejas G, Groeneveld C, et al. fastCNV: Fast and accurate inference of copy number variations from single-cell RNA sequencing data. bioRxiv 2025.
Authors and Contributors
- Zaoqu Liu (Maintainer) – liuzaoqu@163.com – GitHub
- Gadea Cabrejas (Original Author) – gadea.cabrejas-saiz@u-paris.fr
- Clarice Groeneveld (Original Author) – clarice.groeneveld@inserm.fr
Links
- 📖 Documentation: https://zaoqu-liu.github.io/fastCNV/
- 📦 R-universe: https://zaoqu-liu.r-universe.dev/fastCNV
- 💻 GitHub: https://github.com/Zaoqu-Liu/fastCNV
- 🔬 Original Development: https://github.com/must-bioinfo/fastCNV
