scPAS : A tool for identifying Phenotype-Associated cell Subpopulations from single-cell sequencing data by integrating bulk data
Source:R/scPAS.R
scPAS.RdscPAS : A tool for identifying Phenotype-Associated cell Subpopulations from single-cell sequencing data by integrating bulk data
Usage
scPAS(
bulk_dataset,
sc_dataset,
phenotype,
assay = "RNA",
tag = NULL,
nfeature = NULL,
do_imputation = TRUE,
imputation_method = c("KNN", "ALRA"),
alpha = NULL,
network_class = c("SC", "bulk"),
independent = TRUE,
family = c("gaussian", "binomial", "cox"),
permutation_times = 2000,
FDR.threshold = 0.05,
n_cores = 1
)Arguments
- bulk_dataset
Matrix. Bulk expression matrix of related disease. Each row represents a gene and each column represents a sample. The input expression values are continuous, such as microarray fluorescent units in logarithmic scale, RNA-seq log-CPMs, log-RPKMs or log-TPMs.
- sc_dataset
Matrix or seurat object. Single-cell RNA-seq expression matrix of related disease. Each row represents a gene and each column represents a sample. A Seurat object that contains the preprocessed data and constructed network is preferred. Otherwise, a cell-cell similarity network is constructed based on the input matrix.Otherwise, the raw count expression matrix will be processed by using Seurat's default parameters. See run_Seurat for details.
- phenotype
Phenotype annotation of each bulk sample. It can be a continuous dependent variable, binary group indicator vector, or clinical survival data:
Continuous dependent variable. Should be a quantitative vector for
family = gaussian.Binary group indicator vector. Should be either a 0-1 encoded vector or a factor with two levels for
family = binomial.Clinical survival data. Should be a two-column matrix with columns named 'time' and 'status'. The latter is a binary variable, with '1' indicating event (e.g.recurrence of cancer or death), and '0' indicating right censored. The function
Surv()in package survival produces such a matrix.
- assay
Name of Assay to get.
- tag
Names for each phenotypic group. Used for logistic regressions only.
- nfeature
Numeric. The Number of features to select as top variable features in sc_dataset. Top variable features will be used to intersect with the features of bulk_dataset. Default is NULL.All features will be used.
- do_imputation
Logical. Whether to perform imputation on single-cell data (default: TRUE).
- imputation_method
Character. Name of alternative method for imputation.
- alpha
Numeric. Parameter used to balance the effect of the l1 norm and the network-based penalties. It can be a number or a searching vector. If
alpha = NULL, a default searching vector is used. The range of alpha is in[0,1]. A larger alpha lays more emphasis on the l1 norm.- network_class
The source of feature-feature similarity network. By default this is set to
scand the other one isbulk.- independent
Logical. The background distribution of risk scores is constructed independently of each cell.
- family
Character. Response type for the regression model. It depends on the type of the given phenotype and can be
family = gaussianfor linear regression,family = binomialfor classification, orfamily = coxfor Cox regression.- permutation_times
Integer. Number of permutation iterations for statistical significance testing (default: 2000). Higher values increase accuracy but also computation time. Recommended: 1000-5000. For faster testing, use 500-1000.
- FDR.threshold
Numeric. FDR value threshold for identifying phenotype-associated cells. The default is 0.05.
- n_cores
Integer. Number of CPU cores to use for parallel permutation test (default: 1 for sequential processing). Setting n_cores > 1 enables parallel computing which can significantly speed up the analysis (2-4x faster with 4 cores). Requires 'future' and 'future.apply' packages.
Value
This function returns a Seurat object with the following components added to :
- scPAS_para
A list contains the final model parameters added to misc.
- PAS result
A data frame containing risk scores (scPAS_RS), normalized risk scores (scPAS_NRS), p-value (scPAS_Pvalue) , adjusted p-value (scPAS_FDR) cell classification labels (scPAS) added to metaData.