Skip to contents

Automatically detects species based on gene naming patterns with high confidence. Uses statistical analysis of naming conventions to distinguish human vs mouse genes.

Usage

detect_species(genes, confidence_threshold = 0.7)

Arguments

genes

Character vector. Gene symbols to analyze

confidence_threshold

Numeric. Minimum confidence score (0-1) to return a species determination (default: 0.7)

Value

List with elements:

  • species: "Homo_sapiens", "Mus_musculus", or "unknown"

  • confidence: Confidence score (0-1)

  • method: Detection method used

  • patterns: List of pattern statistics

Details

**Detection Logic**:

  • Human genes: ALL UPPERCASE (TGFB1, VEGFA, CD8A)

  • Mouse genes: First letter uppercase, rest lowercase (Tgfb1, Vegfa, Cd8a)

Analyzes up to 100 genes and calculates proportion matching each pattern. Species is determined if confidence exceeds threshold (default 70

**Marker Gene Validation** (future enhancement): Could be enhanced to check for species-specific marker genes like:

  • Human-specific: HBA1, HBB (hemoglobin)

  • Mouse-specific: Gm genes (predicted genes)

Examples

if (FALSE) { # \dontrun{
# Detect human genes
detect_species(c("TGFB1", "VEGFA", "CTNNB1"))
# Returns: list(species = "Homo_sapiens", confidence = 1.0)

# Detect mouse genes
detect_species(c("Tgfb1", "Vegfa", "Ctnnb1"))
# Returns: list(species = "Mus_musculus", confidence = 1.0)

# Mixed or ambiguous
detect_species(c("TGFB1", "Vegfa", "CD8A", "Ctnnb1"))
# Returns: list(species = "unknown", confidence = 0.5)
} # }