Preprocess the single-cell raw data using functions in the Seurat package

This function provide a simplified-version of Seurat analysis pipeline for single-cell RNA-seq data. It contains the following steps in the pipeline:

Create a Seurat object from raw data.
Normalize the count data present in a given assay.
Identify the variable features.
Scales and centers features in the dataset.
Run a PCA dimensionality reduction.
Constructs a Shared Nearest Neighbor (SNN) Graph for a given dataset.
Identify clusters of cells by a shared nearest neighbor (SNN) modularity optimization based clustering algorithm.
Run t-distributed Stochastic Neighbor Embedding (t-SNE) dimensionality reduction on selected features.
Runs the Uniform Manifold Approximation and Projection (UMAP) dimensional reduction technique.

Usage

run_Seurat(
  counts,
  project = "Single_Cell",
  min.cells = 400,
  min.features = 200,
  meta.data = NULL,
  normalization.method = "LogNormalize",
  scale.factor = 10000,
  selection.method = "vst",
  resolution = 0.6,
  dims_Neighbors = 1:10,
  dims_TSNE = 1:10,
  dims_UMAP = 1:10,
  verbose = TRUE
)

Arguments

counts

A matrix-like object with unnormalized data with cells as columns and features as rows.

project

Project name for the Seurat object.

min.cells

Include features detected in at least this many cells. Will subset the counts matrix as well. To reintroduce excluded features, create a new object with a lower cutoff.

min.features

Include cells where at least this many features are detected.

meta.data

meta data of single cell data.

normalization.method

Method for normalization.

LogNormalize: Feature counts for each cell are divided by the total counts for that cell and multiplied by the scale.factor. This is then natural-log transformed using log1p.
CLR: Applies a centered log ratio transformation.
RC: Relative counts. Feature counts for each cell are divided by the total counts for that cell and multiplied by the scale.factor. No log-transformation is applied. For counts per million (CPM) set scale.factor = 1e6.

scale.factor

Sets the scale factor for cell-level normalization.

selection.method

How to choose top variable features. Choose one of :

vst: First, fits a line to the relationship of log(variance) and log(mean) using local polynomial regression (loess). Then standardizes the feature values using the observed mean and expected variance (given by the fitted line). Feature variance is then calculated on the standardized values after clipping to a maximum (see clip.max parameter).
mean.var.plot (mvp): First, uses a function to calculate average expression (mean.function) and dispersion (dispersion.function) for each feature. Next, divides features into num.bin (deafult 20) bins based on their average expression, and calculates z-scores for dispersion within each bin. The purpose of this is to identify variable features while controlling for the strong relationship between variability and average expression.
dispersion (disp): selects the genes with the highest dispersion values

resolution

Value of the resolution parameter, use a value above (below) 1.0 if you want to obtain a larger (smaller) number of communities.

dims_Neighbors

Dimensions of reduction to use as input.

dims_TSNE

Which dimensions to use as input features for t-SNE.

dims_UMAP

Which dimensions to use as input features for UMAP.

verbose

Print output.

Value

A Seurat object containing cell-cell similarity network, t-SNE and UMAP representations.