Type: | Package |
Title: | Convert Gene IDs Between Each Other and Fetch Annotations from Biomart |
Version: | 0.1.10 |
Date: | 2025-02-06 |
Author: | Vidal Fey [aut, cre], Henrik Edgren [aut] |
Maintainer: | Vidal Fey <vidal.fey@gmail.com> |
Description: | Gene Symbols or Ensembl Gene IDs are converted using the Bimap interface in 'AnnotationDbi' in convertId2() but that function is only provided as fallback mechanism for the most common use cases in data analysis. The main function in the package is convert.bm() which queries BioMart using the full capacity of the API provided through the 'biomaRt' package. Presets and defaults are provided for convenience but all "marts", "filters" and "attributes" can be set by the user. Function convert.alias() converts Gene Symbols to Aliases and vice versa and function likely_symbol() attempts to determine the most likely current Gene Symbol. |
Depends: | AnnotationDbi |
Imports: | org.Hs.eg.db, org.Mm.eg.db, plyr, stringr, biomaRt, stats, xml2, utils, rappdirs, assertthat, methods, httr, BiocFileCache |
License: | GPL-3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-02-06 12:16:49 UTC; fsvife |
Repository: | CRAN |
Date/Publication: | 2025-02-06 12:40:01 UTC |
Convert Gene IDs Between Each Other and Fetch Annotations from Biomart
Description
Gene Symbols or Ensembl Gene IDs are converted using the Bimap interface in 'AnnotationDbi' in convertId2() but that function is only provided as fallback mechanism for the most common use cases in data analysis. The main function in the package is convert.bm() which queries Biomart using the full capacity of the API provided through the 'biomaRt' package. Presets and defaults are provided for convenience but all "marts", "filters" and "attributes" can be set by the user. Function convert.alias() converts Gene Symbols to Aliases and vice versa and function likely_symbol() attempts to determine the most likely current Gene Symbol.
Details
Package: | convertid |
Type: | Package |
Initial version: | 0.1-0 |
Created: | 2021-08-18 |
License: | GPL-3 |
LazyLoad: | yes |
Author(s)
Vidal Fey <vidal.fey@gmail.com> Maintainer: Vidal Fey <vidal.fey@gmail.com>
Add values to cache
Description
Add values to cache
Usage
.addToCache(bfc, result, hash)
Arguments
bfc |
Object of class BiocFileCache, created by a call to BiocFileCache::BiocFileCache() |
result |
character; name of the file written to chache |
hash |
unique hash representing a query. |
Unexported functions Test if a path exists and is writable
Description
.cache.writable()
uses file.access()
to test if a
given location exists and is writable by the user.
Usage
.cache.writable(path)
Arguments
path |
( |
Value
TRUE if both conditions are met, FALSE if not.
See Also
Examples
## Not run: .cache.writable(rappdirs::user_cache_dir())
Check whether value in cache exists
Description
Check whether value in cache exists
Usage
.checkInCache(bfc, hash, verbose = FALSE)
Arguments
bfc |
Object of class BiocFileCache, created by a call to BiocFileCache::BiocFileCache() |
hash |
unique hash representing a query. |
verbose |
logical; should additional verbose output be printed? Not currently used. This function returns TRUE if a record with the requested hash already exists in the file cache, otherwise returns FALSE. |
Unexported functions Create a file cache directory at a given location.
Description
.create.cache()
attempts to create a cache directory based on a given path name. Typically, such path
is specific to the package from within the function is called. The default settings refer to the file cache framework in the biomaRt
package.
Usage
.create.cache(cache.path = rappdirs::user_cache_dir("biomaRt"))
Arguments
cache.path |
( |
Value
TRUE if the location was successfully set up, FALSE if not.
See Also
Examples
## Not run: .create.cache(rappdirs::user_cache_dir("biomaRt"))
Unexported functions Test and retrieve Ensembl-specific CURL SSL configuration.
Description
.get.Ensembl_config()
tests and gets CURL options used with "^https://.*ensembl.org" URLs.
The function is a modified version of .getEnsemblSSL
from the biomaRt
package.
Usage
.get.Ensembl_config(use.cache = TRUE)
Arguments
use.cache |
( |
Value
A R object of class request
listing current CURL options.
See Also
Examples
## Not run: .get.Ensembl_config()
Unexported functions Get httr configuration, i.e., current CURL options for data fetching functions.
Description
.get.httr_config()
retrieves the current CURL options and in particular tests and gets
the options used with "^https://.*ensembl.org" URLs. The code was partly copied from listMarts()
.
Usage
.get.httr_config(
httr_config,
host = "https://www.ensembl.org",
use.cache = TRUE
)
Arguments
httr_config |
( |
host |
( |
use.cache |
( |
Value
A R object of class request
listing current CURL options.
See Also
Examples
## Not run: .get.httr_config()
Read values from cache
Description
Read values from cache
Usage
.readFromCache(bfc, hash)
Arguments
bfc |
Object of class BiocFileCache, created by a call to BiocFileCache::BiocFileCache() |
hash |
unique hash representing a query. |
Unexported functions
Set the location for the biomaRt
cache
Description
.setBiomaRtCacheLocation()
attempts to set the cache location
used by the functions in the biomaRt
package and defined in the BIOMART_CACHE
environment variable.
If that variable is set and the defined location exists and is writable nothing is done.
If the system default cache location exists and is writable a sub-folder app
is used (and created if necessary).
If the above don't work a new path is constructed from cache.dir
and the app
folder and an attempt is made to create that.
If all of the above fail the function attempts to create file.path(tempdir(), app)
. If tat fails, too,
an exception is thrown.
Usage
.setCacheLocation(cache.dir = rappdirs::user_cache_dir(), app = "biomaRt")
Arguments
cache.dir |
( |
app |
( |
Value
The value of the BIOMART_CACHE environment variable, i.e., the cache location.
See Also
Examples
## Not run: .setCacheLocation()
Convert Symbols to Aliases and Vice Versa.
Description
convert.alias()
attempts to find all possible symbol-alias combinations for a given gene symbol, i.e.,
it assumes the input ID to be either an Alias or a Symbol and performs multiple queries to find all possible
counterparts. The input IDs are converted to title and upper case before querying and all possibilities are tested.
There are species presets for Human and Mouse annotations.
Usage
convert.alias(id, species = c("Human", "Mouse"), db = NULL)
Arguments
id |
( |
species |
( |
db |
( |
Value
A data.frame
with two columns:
'SYMBOL': The official gene symbol. | |
'ALIAS': All possible aliases. | |
See Also
Examples
convert.alias("TRPV4")
Retrieve Additional Annotations from Biomart
Description
convert.bm()
is a wrapper for get.bm()
which in turn makes use of getBM()
from the biomaRt package.
It takes a matrix or data frame with the IDs to be converted in one column or as row names as input and returns a data frame with additional
annotations after cleaning the fetched annotations and merging them with the input data frame.
Usage
convert.bm(
dat,
id = "ID",
biom.data.set = c("human", "mouse"),
biom.mart = c("ensembl", "mouse", "snp", "funcgen", "plants"),
host = "https://www.ensembl.org",
biom.filter = "ensembl_gene_id",
biom.attributes = c("ensembl_gene_id", "hgnc_symbol", "description"),
biom.cache = rappdirs::user_cache_dir("biomaRt"),
use.cache = TRUE,
sym.col = "hgnc_symbol",
rm.dups = FALSE,
verbose = FALSE
)
Arguments
dat |
|
id |
|
biom.data.set |
|
biom.mart |
|
host |
|
biom.filter |
|
biom.attributes |
|
biom.cache |
|
use.cache |
( |
sym.col |
|
rm.dups |
|
verbose |
( |
Details
Wrapped around 'get.bm'.
Value
A data frame with the retrieved information.
Author(s)
Vidal Fey
See Also
Examples
## Not run:
dat <- data.frame(ID=c("ENSG00000111199", "ENSG00000134121", "ENSG00000176102", "ENSG00000171611"))
bm <- convert.bm(dat)
bm
## End(Not run)
Convert Gene Symbols to Ensembl Gene IDs or vice versa
Description
convertId2()
uses the Bimap interface in AnnotationDbi to extract information from
annotation packages. The function is limited to Human and Mouse annotations and is provided only as
fallback mechanism for the most common use cases in data analysis. Please use the Biomart interface
function convert.bm()
for more flexibility.
Usage
convertId2(id, species = c("Human", "Mouse"))
Arguments
id |
( |
species |
( |
Value
A named character vector where the input IDs are the names and the query results the values.
See Also
Examples
convertId2("ENSG00000111199")
convertId2("TRPV4")
Make a Query to Biomart.
Description
get.bm()
is a user-friendly wrapper for getBM()
from the biomaRt package with default
settings for Human and Mouse.
It sets all needed variables and performs the query.
Usage
get.bm(
values,
biom.data.set = c("human", "mouse"),
biom.mart = c("ensembl", "mouse", "snp", "funcgen", "plants"),
host = "https://www.ensembl.org",
biom.filter = "ensembl_gene_id",
biom.attributes = c("ensembl_gene_id", "hgnc_symbol", "description"),
biom.cache = rappdirs::user_cache_dir("biomaRt"),
use.cache = TRUE,
verbose = FALSE
)
Arguments
values |
|
biom.data.set |
|
biom.mart |
|
host |
|
biom.filter |
|
biom.attributes |
|
biom.cache |
|
use.cache |
( |
verbose |
( |
Value
A data frame with the retrieved information.
Author(s)
Vidal Fey
See Also
Examples
## Not run:
val <- c("ENSG00000111199", "ENSG00000134121", "ENSG00000176102", "ENSG00000171611")
bm <- get.bm(val)
bm
## End(Not run)
Retrieve Symbol Aliases and Previous symbols to determine a likely current symbol
Description
likely_symbol()
downloads the latest version of the HGNC gene symbol database as a text
file and query it to obtain symbol aliases, previous symbols and all symbols currently in use. (Optionally)
assuming the input ID to be either an Alias or a Symbol or a Previous Symbol it performs multiple queries and
compares the results of all possible combinations to determine a likely current Symbol.
Usage
likely_symbol(
syms,
alias_sym = TRUE,
prev_sym = TRUE,
orgnsm = "human",
hgnc = NULL,
hgnc_url = NULL,
output = c("likely", "symbols", "all"),
verbose = TRUE
)
Arguments
syms |
( |
alias_sym |
( |
prev_sym |
( |
orgnsm |
( |
hgnc |
( |
hgnc_url |
( |
output |
( |
verbose |
( |
Details
Please note that the algorithm is very slow for large input vectors.
Value
A data.frame
with the following columns depending on the output
setting.
output="likely"
:
'likely_symbol' | |
'input_symbol' | |
output="symbols"
:
'current_symbols' | |
'likely_symbol' | |
'input_symbol' | |
'all_symbols' | |
output="all"
:
'orig_input' | |
'organism' | |
'current_symbols' | |
'likely_symbol' | |
'input_symbol' | |
'all_symbols' | |
Note
Only fully implemented for Human for now.
Examples
## Not run:
likely_symbol(c("ABCC4", "ACPP", "KIAA1524"))
## End(Not run)
Convenience Function to Convert Ensembl Gene IDs to Gene Symbols
Description
todisp2()
uses Biomart by employing get.bm()
to retrieve Gene Symbols for a set of Ensembl
Gene IDs. It is mainly meant as a fast way to convert IDs in standard gene expression analysis output to Symbols,
e.g., for visualisation, which is why the input ID type is hard coded to ENSG IDs. If Biomart is not available
the function can fall back to use convertId2()
or a user-provided data frame with corresponding ENSG IDs and
Symbols.
Usage
todisp2(ensg, lab = NULL, biomart = TRUE, verbose = FALSE)
Arguments
ensg |
( |
lab |
( |
biomart |
( |
verbose |
( |
Value
A character vector of Gene Symbols.
See Also
Examples
## Not run:
val <- c("ENSG00000111199", "ENSG00000134121", "ENSG00000176102", "ENSG00000171611")
sym <- todisp2(val)
sym
## End(Not run)