Skip to contents

A Scalable Framework for Copy Number Variation Inference
from Single-Cell and Spatial Transcriptomics Data

Documentation R-universe GitHub License R Version

📖 Documentation | 📦 R-universe | 💻 GitHub


Overview

fastCNV is an R package designed for efficient and accurate inference of copy number variations (CNVs) from transcriptomic data. It addresses the computational challenges associated with CNV detection in large-scale single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) datasets, including 10X Genomics Visium and Visium HD platforms.

The algorithm employs a sliding window approach across the genome, integrating expression profiles from genomically ordered genes to infer chromosomal amplifications and deletions. By leveraging reference cell populations (e.g., non-malignant cells), fastCNV effectively distinguishes tumor-associated CNV signals from technical noise.

Key Capabilities

  • Computational Efficiency: Optimized for large-scale datasets (>100,000 cells)
  • Multi-Platform Support: Compatible with scRNA-seq, Visium, and Visium HD data
  • Reference-Based Normalization: Robust CNV inference using normal cell populations
  • Clonal Architecture Analysis: Hierarchical clustering and phylogenetic tree reconstruction
  • Seurat Integration: Seamless compatibility with Seurat 4.x and 5.x workflows

Installation

install.packages("fastCNV", repos = "https://zaoqu-liu.r-universe.dev")

From GitHub

if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}
remotes::install_github("Zaoqu-Liu/fastCNV")

Methodology

The CNV inference pipeline consists of the following steps:

  1. Gene Ordering: Genes are ordered by their genomic coordinates (chromosome and position)
  2. Expression Smoothing: A sliding window approach aggregates expression across neighboring genes
  3. Reference Normalization: Expression values are centered using reference (non-malignant) cells
  4. Score Computation: CNV scores represent relative deviations from the reference baseline
  5. Thresholding: Quantile-based filtering removes background noise
  6. Clustering: Hierarchical clustering identifies distinct CNV subpopulations

Usage

Basic Analysis

library(fastCNV)

# Perform CNV analysis on a Seurat object
result <- fastCNV(
  seuratObj = seurat_object,
  sampleName = "Sample1",
  referenceVar = "cell_type",
  referenceLabel = c("Normal_epithelial", "Fibroblast"),
  prepareCounts = TRUE,
  getCNVPerChromosomeArm = TRUE,
  getCNVClusters = TRUE,
  doPlot = TRUE
)

# Inspect CNV clusters
table(result$cnv_clusters)

Multi-Sample Analysis

For cohort-level studies with multiple samples:

result <- fastCNV(
  seuratObj = list(sample1, sample2, sample3),
  sampleName = c("Patient_A", "Patient_B", "Patient_C"),
  referenceVar = "cell_annotation",
  referenceLabel = c("Normal_cells"),
  pooledReference = TRUE
)

Visium HD Data

For high-resolution spatial transcriptomics:

result <- fastCNV_10XHD(
  seuratObjHD = visium_hd_object,
  sampleName = "Visium_HD_Sample",
  referenceVar = "region_annotation",
  referenceLabel = c("Normal_tissue"),
  doPlot = TRUE
)

Supported Data Types

Platform Function Resolution
scRNA-seq fastCNV() Single-cell
10X Visium fastCNV() 55 µm spots
10X Visium HD fastCNV_10XHD() 8/16 µm bins

Core Functions

Function Description
fastCNV() Main CNV inference pipeline
fastCNV_10XHD() Specialized pipeline for Visium HD
CNVCalling() Core CNV score computation
CNVCluster() Hierarchical clustering of CNV profiles
CNVClassification() Cell classification based on CNV patterns
CNVTree() Phylogenetic tree construction from CNV profiles
plotCNVResults() Heatmap visualization of CNV landscapes
plotCNVTree() Phylogenetic tree visualization

Performance Benchmarks

Typical runtime on a standard workstation (8 cores, 32 GB RAM):

Dataset Size Platform Runtime
~5,000 cells scRNA-seq ~1 min
~20,000 cells scRNA-seq ~5 min
~50,000 cells Visium ~15 min
~200,000 bins Visium HD (16 µm) ~40 min

Frequently Asked Questions

Q: Is mouse genome supported?
A: The current version is optimized for the human genome (GRCh38/hg38). Support for mouse (mm10/mm39) is planned for future releases.

Q: Can CNV analysis be performed without reference cells?
A: While technically possible, reference cells significantly improve accuracy by providing a baseline for normalization. We strongly recommend including normal/non-malignant cells as references.

Q: How should I select the window size parameter?
A: The default value (windowSize = 150) is suitable for most applications. Smaller values (e.g., 100) increase resolution but may introduce noise; larger values (e.g., 200) provide smoother profiles.


Citation

If fastCNV contributes to your research, please cite:

Cabrejas G, Groeneveld C, et al. fastCNV: Fast and accurate inference of copy number variations from single-cell RNA sequencing data. bioRxiv 2025.


License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0).


Authors and Contributors