Automatically detects species based on gene naming patterns with high confidence. Uses statistical analysis of naming conventions to distinguish human vs mouse genes.
Value
List with elements:
species: "Homo_sapiens", "Mus_musculus", or "unknown"
confidence: Confidence score (0-1)
method: Detection method used
patterns: List of pattern statistics
Details
**Detection Logic**:
Human genes: ALL UPPERCASE (TGFB1, VEGFA, CD8A)
Mouse genes: First letter uppercase, rest lowercase (Tgfb1, Vegfa, Cd8a)
Analyzes up to 100 genes and calculates proportion matching each pattern. Species is determined if confidence exceeds threshold (default 70
**Marker Gene Validation** (future enhancement): Could be enhanced to check for species-specific marker genes like:
Human-specific: HBA1, HBB (hemoglobin)
Mouse-specific: Gm genes (predicted genes)
Examples
if (FALSE) { # \dontrun{
# Detect human genes
detect_species(c("TGFB1", "VEGFA", "CTNNB1"))
# Returns: list(species = "Homo_sapiens", confidence = 1.0)
# Detect mouse genes
detect_species(c("Tgfb1", "Vegfa", "Ctnnb1"))
# Returns: list(species = "Mus_musculus", confidence = 1.0)
# Mixed or ambiguous
detect_species(c("TGFB1", "Vegfa", "CD8A", "Ctnnb1"))
# Returns: list(species = "unknown", confidence = 0.5)
} # }