fastCNV

A Scalable Framework for Copy Number Variation Inference
from Single-Cell and Spatial Transcriptomics Data

📖 Documentation | 📦 R-universe | 💻 GitHub

Overview

fastCNV is an R package designed for efficient and accurate inference of copy number variations (CNVs) from transcriptomic data. It addresses the computational challenges associated with CNV detection in large-scale single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) datasets, including 10X Genomics Visium and Visium HD platforms.

The algorithm employs a sliding window approach across the genome, integrating expression profiles from genomically ordered genes to infer chromosomal amplifications and deletions. By leveraging reference cell populations (e.g., non-malignant cells), fastCNV effectively distinguishes tumor-associated CNV signals from technical noise.

Key Capabilities

Computational Efficiency: Optimized for large-scale datasets (>100,000 cells)
Multi-Platform Support: Compatible with scRNA-seq, Visium, and Visium HD data
Reference-Based Normalization: Robust CNV inference using normal cell populations
Clonal Architecture Analysis: Hierarchical clustering and phylogenetic tree reconstruction
Seurat Integration: Seamless compatibility with Seurat 4.x and 5.x workflows

Installation

From R-universe (Recommended)

install.packages("fastCNV", repos = "https://zaoqu-liu.r-universe.dev")

From GitHub

if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}
remotes::install_github("Zaoqu-Liu/fastCNV")

Methodology

The CNV inference pipeline consists of the following steps:

Gene Ordering: Genes are ordered by their genomic coordinates (chromosome and position)
Expression Smoothing: A sliding window approach aggregates expression across neighboring genes
Reference Normalization: Expression values are centered using reference (non-malignant) cells
Score Computation: CNV scores represent relative deviations from the reference baseline
Thresholding: Quantile-based filtering removes background noise
Clustering: Hierarchical clustering identifies distinct CNV subpopulations

Usage

Basic Analysis

library(fastCNV)

# Perform CNV analysis on a Seurat object
result <- fastCNV(
  seuratObj = seurat_object,
  sampleName = "Sample1",
  referenceVar = "cell_type",
  referenceLabel = c("Normal_epithelial", "Fibroblast"),
  prepareCounts = TRUE,
  getCNVPerChromosomeArm = TRUE,
  getCNVClusters = TRUE,
  doPlot = TRUE
)

# Inspect CNV clusters
table(result$cnv_clusters)

Multi-Sample Analysis

For cohort-level studies with multiple samples:

result <- fastCNV(
  seuratObj = list(sample1, sample2, sample3),
  sampleName = c("Patient_A", "Patient_B", "Patient_C"),
  referenceVar = "cell_annotation",
  referenceLabel = c("Normal_cells"),
  pooledReference = TRUE
)

Visium HD Data

For high-resolution spatial transcriptomics:

result <- fastCNV_10XHD(
  seuratObjHD = visium_hd_object,
  sampleName = "Visium_HD_Sample",
  referenceVar = "region_annotation",
  referenceLabel = c("Normal_tissue"),
  doPlot = TRUE
)

Supported Data Types

Platform	Function	Resolution
scRNA-seq	`fastCNV()`	Single-cell
10X Visium	`fastCNV()`	55 µm spots
10X Visium HD	`fastCNV_10XHD()`	8/16 µm bins

Core Functions

Function	Description
`fastCNV()`	Main CNV inference pipeline
`fastCNV_10XHD()`	Specialized pipeline for Visium HD
`CNVCalling()`	Core CNV score computation
`CNVCluster()`	Hierarchical clustering of CNV profiles
`CNVClassification()`	Cell classification based on CNV patterns
`CNVTree()`	Phylogenetic tree construction from CNV profiles
`plotCNVResults()`	Heatmap visualization of CNV landscapes
`plotCNVTree()`	Phylogenetic tree visualization

Performance Benchmarks

Typical runtime on a standard workstation (8 cores, 32 GB RAM):

Dataset Size	Platform	Runtime
~5,000 cells	scRNA-seq	~1 min
~20,000 cells	scRNA-seq	~5 min
~50,000 cells	Visium	~15 min
~200,000 bins	Visium HD (16 µm)	~40 min

Frequently Asked Questions

Q: Is mouse genome supported?
A: The current version is optimized for the human genome (GRCh38/hg38). Support for mouse (mm10/mm39) is planned for future releases.

Q: Can CNV analysis be performed without reference cells?
A: While technically possible, reference cells significantly improve accuracy by providing a baseline for normalization. We strongly recommend including normal/non-malignant cells as references.

Q: How should I select the window size parameter?
A: The default value (windowSize = 150) is suitable for most applications. Smaller values (e.g., 100) increase resolution but may introduce noise; larger values (e.g., 200) provide smoother profiles.

Citation

If fastCNV contributes to your research, please cite:

Cabrejas G, Groeneveld C, et al. fastCNV: Fast and accurate inference of copy number variations from single-cell RNA sequencing data. bioRxiv 2025.

License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0).

Authors and Contributors

Zaoqu Liu (Maintainer) – liuzaoqu@163.com – GitHub
Gadea Cabrejas (Original Author) – gadea.cabrejas-saiz@u-paris.fr
Clarice Groeneveld (Original Author) – clarice.groeneveld@inserm.fr