Skip to contents

Converts gene symbols between species using Ensembl BioMart ortholog mappings. Provides accurate, biologically-validated homolog mappings rather than simple name transformation.

Usage

convert_species_biomart(
  genes,
  from_species,
  to_species = "Homo_sapiens",
  ensembl_version = 103,
  mirror = NULL,
  cache = TRUE,
  max_tries = 5
)

Arguments

genes

Character vector. Gene symbols to convert

from_species

Character. Source species:

  • "Homo_sapiens" (human)

  • "Mus_musculus" (mouse)

to_species

Character. Target species (default: "Homo_sapiens")

ensembl_version

Character or numeric. Ensembl version (default: 103). Using a fixed version ensures reproducibility. Use "current_release" for latest version.

mirror

Character or NULL. Ensembl mirror for faster access:

  • "www": Main server (Europe)

  • "uswest": US West Coast

  • "useast": US East Coast

  • "asia": Asia

cache

Logical. Cache BioMart results for faster repeated queries (default: TRUE)

max_tries

Integer. Maximum retry attempts for network operations (default: 5)

Value

List with elements:

  • mapping: data.frame with columns from_gene, to_gene

  • unmapped: character vector of genes without orthologs

  • stats: list with mapping statistics (n_input, n_mapped, mapping_rate, etc.)

  • cache_key: cache identifier (if cache=TRUE)

Details

**Ortholog Mapping**: Uses Ensembl's "associated_gene_name" attribute which provides the primary ortholog symbol. For mouse→human conversion, this maps:

  • Tgfb1 → TGFB1

  • Vegfa → VEGFA

  • Ctnnb1 → CTNNB1

**One-to-Many Mappings**: Some genes have multiple orthologs (e.g., Tgfb1 might map to TGFB1, TGFB2, TGFB3). By default, all mappings are returned. Downstream functions handle aggregation.

**Caching**: When cache=TRUE, results are stored using R.cache with key based on:

  • Gene set (hashed)

  • Source and target species

  • Ensembl version

Cache dramatically speeds up repeated analyses.

**Network Requirements**: Requires internet connection to query Ensembl BioMart (first time). Queries typically take 10-30 seconds depending on gene count and network speed.

Examples

if (FALSE) { # \dontrun{
# Convert mouse genes to human
result <- convert_species_biomart(
  genes = c("Tgfb1", "Vegfa", "Ctnnb1"),
  from_species = "Mus_musculus",
  to_species = "Homo_sapiens"
)

# Check mapping
result$mapping
#   from_gene to_gene
#   Tgfb1     TGFB1
#   Vegfa     VEGFA
#   Ctnnb1    CTNNB1

# Check statistics
result$stats$mapping_rate  # Proportion successfully mapped

# Unmapped genes
result$unmapped
} # }