Convert Genes Between Species Using BioMart
Source:R/species_conversion.R
convert_species_biomart.RdConverts gene symbols between species using Ensembl BioMart ortholog mappings. Provides accurate, biologically-validated homolog mappings rather than simple name transformation.
Usage
convert_species_biomart(
genes,
from_species,
to_species = "Homo_sapiens",
ensembl_version = 103,
mirror = NULL,
cache = TRUE,
max_tries = 5
)Arguments
- genes
Character vector. Gene symbols to convert
- from_species
Character. Source species:
"Homo_sapiens" (human)
"Mus_musculus" (mouse)
- to_species
Character. Target species (default: "Homo_sapiens")
- ensembl_version
Character or numeric. Ensembl version (default: 103). Using a fixed version ensures reproducibility. Use "current_release" for latest version.
- mirror
Character or NULL. Ensembl mirror for faster access:
"www": Main server (Europe)
"uswest": US West Coast
"useast": US East Coast
"asia": Asia
- cache
Logical. Cache BioMart results for faster repeated queries (default: TRUE)
- max_tries
Integer. Maximum retry attempts for network operations (default: 5)
Value
List with elements:
mapping: data.frame with columns from_gene, to_gene
unmapped: character vector of genes without orthologs
stats: list with mapping statistics (n_input, n_mapped, mapping_rate, etc.)
cache_key: cache identifier (if cache=TRUE)
Details
**Ortholog Mapping**: Uses Ensembl's "associated_gene_name" attribute which provides the primary ortholog symbol. For mouse→human conversion, this maps:
Tgfb1 → TGFB1
Vegfa → VEGFA
Ctnnb1 → CTNNB1
**One-to-Many Mappings**: Some genes have multiple orthologs (e.g., Tgfb1 might map to TGFB1, TGFB2, TGFB3). By default, all mappings are returned. Downstream functions handle aggregation.
**Caching**: When cache=TRUE, results are stored using R.cache with key based on:
Gene set (hashed)
Source and target species
Ensembl version
Cache dramatically speeds up repeated analyses.
**Network Requirements**: Requires internet connection to query Ensembl BioMart (first time). Queries typically take 10-30 seconds depending on gene count and network speed.
Examples
if (FALSE) { # \dontrun{
# Convert mouse genes to human
result <- convert_species_biomart(
genes = c("Tgfb1", "Vegfa", "Ctnnb1"),
from_species = "Mus_musculus",
to_species = "Homo_sapiens"
)
# Check mapping
result$mapping
# from_gene to_gene
# Tgfb1 TGFB1
# Vegfa VEGFA
# Ctnnb1 CTNNB1
# Check statistics
result$stats$mapping_rate # Proportion successfully mapped
# Unmapped genes
result$unmapped
} # }