Type: | Package |
Title: | Metabolite Set Enrichment Analysis for Loadings |
Version: | 2.2.1 |
Date: | 2025-07-05 |
Description: | Computing metabolite set enrichment analysis (MSEA) (Yamamoto, H. et al. (2014) <doi:10.1186/1471-2105-15-51>), single sample enrichment analysis (SSEA) (Yamamoto, H. (2023) <doi:10.51094/jxiv.262>) and over-representation analysis (ORA) that accounts for undetected metabolites (Yamamoto, H. (2024) <doi:10.51094/jxiv.954>). |
License: | LGPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
Imports: | XML |
Depends: | loadings |
NeedsCompilation: | no |
Packaged: | 2025-07-05 12:24:25 UTC; yamamoto |
Author: | Hiroyuki Yamamoto [aut, cre] |
Maintainer: | Hiroyuki Yamamoto <h.yama2396@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-07-05 12:40:02 UTC |
Convert metabolite set / csv to list
Description
This function converts your own metabolite set (csv file to list).
Usage
csv2list(filepath)
Arguments
filepath |
file path of metabolite set (csv file) |
Details
The first row of csv file are "metabolite set name" and "metabolite IDs" as header. The first column must be metabolite IDs and second column must be metabolite set name.
Value
list of metabolite set name and metabolite IDs
Author(s)
Hiroyuki Yamamoto
Examples
# ---------------------------
# Convert csv file to list
# ---------------------------
# filepath <- "C:/pathway.csv" # filepath of csv file
# N <- csv2list(filepath) # convert csv file to list
Metabolome data from a fasting mouse study
Description
Example data derived from a fasting mouse liver metabolomics study,
originally published by Yamamoto et al. and distributed in the loadings package as fasting
.
This version is reformatted and annotated to reproduce the analysis setup used in MetaboAnalyst.
It is intended for demonstrating ORA and MSEA functions in the mseapca package.
Usage
data(fasting_mseapca)
Format
A named list
with the following components:
SIG
Character vector of significant metabolites (e.g.,
p < 0.05
).DET
Character vector of all detected metabolites.
pathway
Named list of metabolite sets (pathways). Each element is a character vector of metabolite IDs belonging to that pathway.
Details
The dataset is intended for vignette examples:
-
SIG
— Significant metabolites (p < 0.05
) obtained from statistical analysis of a fasting vs control comparison. -
DET
— Background list of all detected metabolites. -
pathway
— Pathway definitions from MetaboAnalyst, reused for mouse data.
The object can be used directly with functions such as
msea_ora
, ora_det
, and ora_bino
.
Source
Derived from the fasting
dataset in the loadings package.
Original reference: Yamamoto H., Fujimori T., Sato H., Ishikawa G., Kami K., Ohashi Y. (2014). "Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis". BMC Bioinformatics, 15(1):51.
Examples
## Load data
data(fasting_mseapca)
SIG <- fasting_mseapca$SIG
DET <- fasting_mseapca$DET
M <- fasting_mseapca$pathway
## Simple ORA on detected metabolites
res <- ora_det(SIG, DET, M)
head(res$`Result of MSEA(ORA)`)
Save compound set as XML file
Description
This function save compound set of list format as XML file.
Usage
list2xml(filepath, M)
Arguments
filepath |
filepath of XML file to save |
M |
list fomat of compound set and compound names |
Details
This function is used to store a compound set. Saved xml file can be read using the read_pathway function.
Value
filepath of saved XML file
Author(s)
Hiroyuki Yamamoto
Examples
## Not run:
data(pathway)
M <- pathway$fasting
xml_file <- "pathway_fasting.xml"
N <- list2xml(xml_file, M)
# XML::saveXML(N,filepath)
## End(Not run)
Wrapper function for over-representation analysis (ORA)
Description
msea_ora
is a wrapper function that calls ORA
implementations, allowing the user to choose how undetected metabolites are
handled.
-
"det"
– ORA using detected metabolites only as background (ora_det
). -
"all"
– ORA using all metabolites appearing in the pathway list as background (ora_all
). -
"est_naive"
,"est_weighted"
,"est_shrink"
– ORA adjusted for undetected metabolites by the "naive", "weighted", or "shrinkage" estimator (all viaora_est
).
Usage
msea_ora(SIG, DET, M, option = "det", lambda=NULL)
Arguments
SIG |
Character vector of significant metabolites |
DET |
Character vector of detected metabolites. Required for all options except |
M |
A named list, where each element is a metabolite set (e.g., pathway) containing character vectors of metabolites. |
option |
One of |
lambda |
Shrinkage parameter used only when |
Value
A list with:
Result of MSEA(ORA) |
Matrix of p-values and q-values |
significant metabolites |
List of significant metabolites per set |
Contingency tables |
A list of 2×2 contingency tables used in Fisher's exact tests. |
Author(s)
Hiroyuki Yamamoto
References
Yamamoto H, Fujimori T, Sato H, Ishikawa G, Kami K, Ohashi Y, Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis. BMC Bioinformatics, (2014) 15(1):51.
Yamamoto H. Probabilistic Over-Representation Analysis for Metabolite Set Enrichment Analysis Considering Undetected Metabolites", Jxiv, (2024).
See Also
Examples
# Example1 : Metabolome data
data(fasting)
data(pathway)
# pca and pca loading
pca <- prcomp(fasting$X, scale=TRUE)
pca <- pca_loading(pca)
# all detected metabolites
metabolites <- colnames(fasting$X)
# statistically significant negatively correlated metabolites in PC1 loading
SIG <- metabolites[pca$loading$R[,1] < 0 & pca$loading$p.value[,1] < 0.05]
DET <- metabolites
# Fix for multiple annotations
DET[DET == "UDP-glucose ; UDP-galactose"] <- "UDP-glucose"
DET[DET == "Isonicotinamide ; Nicotinamide"] <- "Nicotinamide"
DET[DET == "1-Methylhistidine ; 3-Methylhistidine"] <- "3-Methylhistidine"
# metabolite set list
M <- pathway$fasting
# MSEA by over representation analysis
B <- msea_ora(SIG, DET, M)
B$`Result of MSEA(ORA)`
## Example2 : Proteome data
data(covid19)
data(pathway)
X <- covid19$X$proteomics
Y <- covid19$Y
D <- covid19$D
tau <- covid19$tau
protein_name <- colnames(X)
# pls-rog and pls-rog loading
plsrog <- pls_rog(X,Y,D)
plsrog <- plsrog_loading(plsrog)
# statistically significant proteins
index_prot <- which(plsrog$loading$R[,1]>0 & plsrog$loading$p.value[,1]<0.05)
sig_prot <- protein_name[index_prot]
# protein set list
M <- pathway$covid19$proteomics
# MSEA by over representation analysis
B <- msea_ora(sig_prot, protein_name, M)
B$`Result of MSEA(ORA)`
## Example3: Metabolome data
data(fasting_mseapca)
SIG <- fasting_mseapca$SIG
DET <- fasting_mseapca$DET
M <- fasting_mseapca$pathway
# Perform ORA using detected metabolites only
B <- msea_ora(SIG, DET, M)
B$`Result of MSEA(ORA)`
Wrapper function for Over-Representation Analysis with p-value range estimation
Description
This function performs over-representation analysis (ORA) to assess metabolite set enrichment while considering uncertainty due to undetected metabolites. It wraps different methods for estimating a p-value range, including full enumeration and binomial resampling.
Usage
msea_ora_range(SIG, DET = NULL, M,
option = "ora_full",
probs = c(0.025, 0.975),
nsim = 1000,
lambda = 5)
Arguments
SIG |
A character vector of statistically significant metabolites. |
DET |
A character vector of all detected metabolites. Required for all methods except |
M |
A named list of metabolite sets, each containing a character vector of metabolites. |
option |
Method to use for estimating the p-value range. One of |
probs |
Numeric vector of quantile probabilities for binomial simulation (e.g., |
nsim |
Number of simulations for binomial-based estimation. Ignored for |
lambda |
Shrinkage parameter for |
Details
This wrapper function allows switching between multiple ORA implementations that estimate the uncertainty due to undetected metabolites. The ora_full
method uses exhaustive enumeration of all possible detection patterns, while the other methods use binomial resampling with different estimation strategies (naive, weighted, or shrinkage-based).
Value
A list containing a matrix with p-value range results for each metabolite set. Columns include lower, median, and upper p-values.
Author(s)
Hiroyuki Yamamoto
Examples
# Example: Metabolome data
data(fasting_mseapca)
SIG <- fasting_mseapca$SIG
DET <- fasting_mseapca$DET
M <- fasting_mseapca$pathway
# Perform ORA using detected metabolites only
B <- msea_ora_range(SIG, DET, M, option = "bino_naive", nsim=10)
B$`Range of p-values for each pathway`
MSEA by Subramanian et al.
Description
This function performs metabolite set enrichment analysis implemented in the same fashion as gene set enrichment analysis (Subramanian et al. 2005). In this function, a permutation procedure is performed for a metabolite set rather than class label. This procedure corresponds to a "gene set" of permutation type in GSEA-P software (Subramanian et al. 2007). A leading-edge subset analysis is also undertaken following the standard GSEA procedure.
Usage
msea_sub(M, D, y, maxiter = 1000)
Arguments
M |
list of metbolite set name and metabolite IDs |
D |
data.frame(metabolite ID, data matix) |
y |
response variable (e.g. PC score) |
maxiter |
maximum number of iterations in random permutation (default=1000) |
Value
list of normalized enrichment score, p-value and q-value for metabolite set, and the results of leading edge subset
Author(s)
Hiroyuki Yamamoto
References
Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S. & Mesirov, J. P. (2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545-15550.
Subramanian, A., Kuehn, H., Gould, J., Tamayo, P., Mesirov, J.P. (2007) GSEA-P: A desktop application for Gene Set Enrichment Analysis. Bioinformatics, doi: 10.1093/bioinformatics/btm369.
Examples
data(fasting)
data(pathway)
# pca and pca loading
pca <- prcomp(fasting$X, scale=TRUE)
pca <- pca_loading(pca)
# all detected metabolites
metabolites <- colnames(fasting$X)
# statistically significant negatively correlated metabolites in PC1 loading
SIG <- metabolites[pca$loading$R[,1] < 0 & pca$loading$p.value[,1] < 0.05]
ALL <- metabolites #all detected metabolites
# Set response variable
y <- pca$x[,1]
# preparing dataframe
D <- data.frame(ALL,t(fasting$X)) # preparing dataframe
# MSEA by Subramanian et al.
M <- pathway$fasting
P <- msea_sub(M,D,y, maxiter = 10) # iteration was set ato 10 for demonstration
ORA using all metabolites
Description
This function performs over-representation analysis (ORA) using all metabolites present in the given metabolite set list as the background, without specifying a reference metabolome. This corresponds to the behavior of MetaboAnalyst when no reference metabolome is uploaded.
Usage
ora_all(SIG, M)
Arguments
SIG |
Character vector of significant metabolites |
M |
Named list of metabolite sets |
Value
A list with:
Result of MSEA(ORA) |
Matrix of p-values and q-values |
significant metabolites |
List of significant metabolites per set |
Contingency tables |
A list of 2×2 contingency tables used in Fisher's exact tests. |
Author(s)
Hiroyuki Yamamoto
References
Yamamoto H, Fujimori T, Sato H, Ishikawa G, Kami K, Ohashi Y, Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis. BMC Bioinformatics, (2014) 15(1):51.
Yamamoto H. Probabilistic Over-Representation Analysis for Metabolite Set Enrichment Analysis Considering Undetected Metabolites", Jxiv, (2024).
Examples
# Example: Metabolome data
data(fasting_mseapca)
SIG <- fasting_mseapca$SIG
M <- fasting_mseapca$pathway
# Perform ORA using detected metabolites only
B <- ora_all(SIG, M)
B$`Result of MSEA(ORA)`
Over-representation analysis with binomial resampling adjustment
Description
Performs ORA while adjusting for undetected metabolites by binomial resampling.
Usage
ora_bino(SIG, DET, M, method = "naive", probs = c(0.025, 0.975), nsim = 1000, lambda = 5)
Arguments
SIG |
Character vector of significant metabolites. |
DET |
Character vector of detected metabolites. |
M |
Named list of metabolite sets. |
method |
|
probs |
Quantiles for the confidence interval (default 95%). |
nsim |
Number of binomial simulations (default 1000). |
lambda |
Shrinkage parameter used when |
Value
A list containing one matrix. Rows = metabolite sets; columns = lower, median, and upper p-values.
Author(s)
Hiroyuki Yamamoto
Examples
# Example: Metabolome data
data(fasting_mseapca)
SIG <- fasting_mseapca$SIG
DET <- fasting_mseapca$DET
M <- fasting_mseapca$pathway
# Perform ORA using detected metabolites only
B <- ora_bino(SIG, DET, M, method="naive", nsim = 10)
B$`Range of p-values for each pathway`
ORA using detected metabolites
Description
This function performs metabolite set enrichment analysis using over-representation analysis (ORA) under the assumption that only detected metabolites are used as the background. A one-sided Fisher's exact test is applied to each metabolite set.
Usage
ora_det(SIG, DET, M)
Arguments
SIG |
Character vector of significant metabolites |
DET |
Character vector of detected metabolites (background set) |
M |
A named list, where each element is a metabolite set (e.g., pathway) containing character vectors of metabolites |
Value
A list containing:
-
Result of MSEA(ORA)
: A matrix with raw p-values and adjusted q-values (BH correction) for each metabolite set. -
significant metabolites
: A list of significant metabolites overlapping with each metabolite set. -
Contingency tables
: A list of 2×2 contingency tables used in Fisher's exact tests.
Author(s)
Hiroyuki Yamamoto
References
Yamamoto H, Fujimori T, Sato H, Ishikawa G, Kami K, Ohashi Y, Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis. BMC Bioinformatics, (2014) 15(1):51.
Yamamoto H. Probabilistic Over-Representation Analysis for Metabolite Set Enrichment Analysis Considering Undetected Metabolites", Jxiv, (2024).
Examples
# Example: Metabolome data
data(fasting_mseapca)
SIG <- fasting_mseapca$SIG
DET <- fasting_mseapca$DET
M <- fasting_mseapca$pathway
# Perform ORA using detected metabolites only
B <- ora_det(SIG, DET, M)
B$`Result of MSEA(ORA)`
ORA adjusting for undetected metabolites
Description
This function performs metabolite set enrichment analysis using over-representation analysis (ORA), incorporating point estimates to adjust for potentially undetected metabolites. It supports three estimation methods: naive, weighted, and shrinkage-based adjustment.
Usage
ora_est(SIG, DET, M, method = "naive", lambda = 5)
Arguments
SIG |
Character vector of significant metabolites |
DET |
Character vector of detected metabolites (background set) |
M |
A named list, where each element is a metabolite set (e.g., pathway) containing character vectors of metabolites |
method |
A character string specifying the estimation method to use. One of |
lambda |
A numeric value used in the |
Details
This function estimates the impact of undetected metabolites on enrichment results. It builds upon the ORA results from detected metabolites, then adjusts the contingency tables by estimating how many undetected metabolites might be significant, based on a specified method.
Value
A list containing:
-
Result of MSEA (ORA with adjustment)
: A matrix with raw p-values and adjusted q-values (BH correction) for each metabolite set. -
significant metabolites
: A list of significant metabolites overlapping with each metabolite set. -
Contingency tables
: A list of 2×2 contingency tables used in Fisher's exact tests.
Author(s)
Hiroyuki Yamamoto
References
Yamamoto H, Fujimori T, Sato H, Ishikawa G, Kami K, Ohashi Y, Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis. BMC Bioinformatics, (2014) 15(1):51.
Yamamoto H. Probabilistic Over-Representation Analysis for Metabolite Set Enrichment Analysis Considering Undetected Metabolites", Jxiv, (2024).
Examples
# Example: Metabolome data
data(fasting_mseapca)
SIG <- fasting_mseapca$SIG
DET <- fasting_mseapca$DET
M <- fasting_mseapca$pathway
# Perform ORA using detected metabolites only
B <- ora_est(SIG, DET, M)
B$`Result of MSEA (ORA with adjustment)`
Over-representation analysis with full enumeration of undetected metabolite patterns
Description
This function performs over-representation analysis (ORA) by enumerating all possible patterns of significant and non-significant assignments among undetected metabolites for each metabolite set. It returns the minimum, median, and maximum p-values from Fisher’s exact tests across these patterns, thereby estimating the full uncertainty range due to undetected metabolites.
Usage
ora_full(SIG, DET, M)
Arguments
SIG |
A character vector of statistically significant metabolites. |
DET |
A character vector of all detected metabolites (the background). |
M |
A named list of metabolite sets, where each element is a character vector of metabolites. |
Details
For each metabolite set, the number of undetected metabolites is calculated. The function then considers all possible numbers of significant metabolites (from 0 to the total number of undetected ones) among those undetected. For each case, a 2x2 contingency table is constructed and Fisher’s exact test is applied. The resulting p-values are aggregated to report the minimum, median, and maximum values.
Value
A list containing:
Range of p-values |
A matrix with rows corresponding to metabolite sets and three columns:
|
Author(s)
Hiroyuki Yamamoto
References
Yamamoto H, Fujimori T, Sato H, Ishikawa G, Kami K, Ohashi Y, Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis. BMC Bioinformatics, (2014) 15(1):51.
Yamamoto H. Probabilistic Over-Representation Analysis for Metabolite Set Enrichment Analysis Considering Undetected Metabolites", Jxiv, (2024).
Examples
# Example: Metabolome data
data(fasting_mseapca)
SIG <- fasting_mseapca$SIG
DET <- fasting_mseapca$DET
M <- fasting_mseapca$pathway
# Perform ORA using detected metabolites only
B <- ora_full(SIG, DET, M)
B$`Range of p-values`
Generate metabolite set list from PathBank database
Description
This function generates metabolite set list of PathBank database by referencing the AHPathbankDbs Bioconductor package.
Usage
pathbank2list(tbl_pathbank, subject, id)
Arguments
tbl_pathbank |
tibble from AHPathbankDbs |
subject |
Pathway subject (Metabolic, Disease, etc.) in tibble |
id |
database ID (HMDB ID, Uniprot ID, etc.) used for analysis |
Details
AHPathbankDbs needs to be installed separately.
Value
list of metabolite or protein set
Author(s)
Hiroyuki Yamamoto
Examples
## PathBank
#library(AnnotationHub)
#ah <- AnnotationHub()
#qr <- query(ah, c("pathbank", "Homo sapiens"))
##tbl_pathbank <- qr[[1]] # metabolomics
#tbl_pathbank <- qr[[2]] # proteomics
#ids <- names(tbl_pathbank)[-c(1:4)]
#id <- ids[1] # Uniprot ID
#subs <- unique(tbl_pathbank$`Pathway Subject`)
#subject <- subs[6] # Protein
# M <- pathbank2list(tbl_pathbank, subject, id)
Example of metabolite set list for fasting and covid19 datasets
Description
This data includes metabolite set list for fasting and covid19 datasets in loadings package
Usage
data(pathway)
Arguments
The list object pathway contains the following elements:
fasting : metabolite set list for fasting mouse dataset
covid19$proteomics : protein set list for covid19 dataset.
References
Yamamoto H., Fujimori T., Sato H., Ishikawa G., Kami K., Ohashi Y. (2014). "Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis". BMC Bioinformatics, (2014) 15(1):51.
B. Shen, et al, Proteomic and Metabolomic Characterization of COVID-19 Patient Sera, Cell. 182 (2020) 59-72.e15.
Examples
data(pathway)
Read metabolite set file (*.xml)
Description
This function generates metabolite set list from metabolite set file (XML). This is mainly used to be called by other functions.
Usage
read_pathway(fullpath)
Arguments
fullpath |
file path of metabolite set (XML) |
Value
list of metabolite set name and metabolite IDs.
Author(s)
Hiroyuki Yamamoto
Examples
# filename <- "C:/R/pathway.xml" # load metabolite set file
# M <- read_pathway(filename) # Convert XML to metabolite set (list)
Generate binary label matrix of metabolite set
Description
This function generates binary label matrix of metabolites and metabolite sets. This is mainly used to be called by other functions, and used to count the number of metabolites in a specific metabolite set.
Usage
setlabel(MET, M)
Arguments
MET |
A character vector of metabolites (e.g., detected or significant metabolites). |
M |
A named list of metabolite sets, where each element is a character vector of metabolites. |
Details
This function is used internally in various ORA methods (e.g., ora_det
, ora_all
, ora_est
) to compute contingency tables for enrichment analysis.
Value
binary label matrix of metabolites in metabolite sets
Author(s)
Hiroyuki Yamamoto
Examples
# Example
data(fasting)
data(pathway)
MET <- colnames(fasting$X) # detected metabolites
M <- pathway$fasting # metabolite set list
# Fix for multiple annotations
MET[MET == "UDP-glucose ; UDP-galactose"] <- "UDP-glucose"
MET[MET == "trans-Glutaconic acid ; Itaconic acid"] <- "Itaconic acid"
MET[MET == "Isonicotinamide ; Nicotinamide"] <- "Nicotinamide"
MET[MET == "Isobutyric acid ; Butyric acid"] <- "Isobutyric acid"
MET[MET == "GDP-mannose ; GDP-galactose"] <- "GDP-mannose"
MET[MET == "ADP-glucose ; GDP-fucose"] <- "ADP-glucose"
MET[MET == "1-Methylhistidine ; 3-Methylhistidine"] <- "3-Methylhistidine"
L <- setlabel(MET, M)
# Example 2
data(fasting_mseapca)
MET <- fasting_mseapca$DET
M <- fasting_mseapca$pathway
L <- setlabel(MET, M)
Single sample enrichment analysis by over representation analysis
Description
This function performs single sample enrichment analysis (SSEA) by over representation analysis (ORA). SSEA performs MSEA by ORA between detected and not detected metabolites in each sample."
Usage
ssea_ora(det_list, det_all, M)
Arguments
det_list |
metabolite names of detected metabolites |
det_all |
metabolite names of all metabolites |
M |
list of metabolite set and metabolite names |
Details
The threshold for determining whether a metabolite is detected or not is typically set by the signal-to-noise (S/N) ratio. If the S/N ratio is unavailable, one might consider using the signal intensity or peak area for each metabolite as an alternative. In such cases, all values below the threshold can be set to 0.
Value
A matrix where each row represents a sample and each column represents a set of metabolites.
Author(s)
Hiroyuki Yamamoto
References
Yamamoto H., Single sample enrichment analysisfor mass spectrometry-based omics data, Jxiv.(2023)
Examples
data(fasting)
data(pathway)
det_list <- pathway$data$fasting
M <- pathway$fasting
det_all <- unique(c(colnames(fasting$X), as.character(unlist(M))))
# SSEA
Z <- ssea_ora(det_list, det_all, M)
## PCA for SSEA score
pca <- prcomp(Z, scale=TRUE)
pca <- pca_loading(pca)