Preprocess the single-cell raw data using functions in the Seurat package
Source: R/scPAS.R
run_Seurat.RdThis function provide a simplified-version of Seurat analysis pipeline for single-cell RNA-seq data. It contains the following steps in the pipeline:
Create a
Seuratobject from raw data.Normalize the count data present in a given assay.
Identify the variable features.
Scales and centers features in the dataset.
Run a PCA dimensionality reduction.
Constructs a Shared Nearest Neighbor (SNN) Graph for a given dataset.
Identify clusters of cells by a shared nearest neighbor (SNN) modularity optimization based clustering algorithm.
Run t-distributed Stochastic Neighbor Embedding (t-SNE) dimensionality reduction on selected features.
Runs the Uniform Manifold Approximation and Projection (UMAP) dimensional reduction technique.
Usage
run_Seurat(
counts,
project = "Single_Cell",
min.cells = 400,
min.features = 200,
meta.data = NULL,
normalization.method = "LogNormalize",
scale.factor = 10000,
selection.method = "vst",
resolution = 0.6,
dims_Neighbors = 1:10,
dims_TSNE = 1:10,
dims_UMAP = 1:10,
verbose = TRUE
)Arguments
- counts
A
matrix-like object with unnormalized data with cells as columns and features as rows.- project
Project name for the
Seuratobject.- min.cells
Include features detected in at least this many cells. Will subset the counts matrix as well. To reintroduce excluded features, create a new object with a lower cutoff.
- min.features
Include cells where at least this many features are detected.
- meta.data
meta data of single cell data.
- normalization.method
Method for normalization.
LogNormalize: Feature counts for each cell are divided by the total counts for that cell and multiplied by the scale.factor. This is then natural-log transformed using log1p.
CLR: Applies a centered log ratio transformation.
RC: Relative counts. Feature counts for each cell are divided by the total counts for that cell and multiplied by the scale.factor. No log-transformation is applied. For counts per million (CPM) set
scale.factor = 1e6.
- scale.factor
Sets the scale factor for cell-level normalization.
- selection.method
How to choose top variable features. Choose one of :
vst: First, fits a line to the relationship of log(variance) and log(mean) using local polynomial regression (loess). Then standardizes the feature values using the observed mean and expected variance (given by the fitted line). Feature variance is then calculated on the standardized values after clipping to a maximum (see clip.max parameter).
mean.var.plot (mvp): First, uses a function to calculate average expression (mean.function) and dispersion (dispersion.function) for each feature. Next, divides features into num.bin (deafult 20) bins based on their average expression, and calculates z-scores for dispersion within each bin. The purpose of this is to identify variable features while controlling for the strong relationship between variability and average expression.
dispersion (disp): selects the genes with the highest dispersion values
- resolution
Value of the resolution parameter, use a value above (below) 1.0 if you want to obtain a larger (smaller) number of communities.
- dims_Neighbors
Dimensions of reduction to use as input.
- dims_TSNE
Which dimensions to use as input features for t-SNE.
- dims_UMAP
Which dimensions to use as input features for UMAP.
- verbose
Print output.