Help for package mseapca

Type:

Package

Title:

Metabolite Set Enrichment Analysis for Loadings

Version:

2.2.1

Date:

2025-07-05

Description:

Computing metabolite set enrichment analysis (MSEA) (Yamamoto, H. et al. (2014) <doi:10.1186/1471-2105-15-51>), single sample enrichment analysis (SSEA) (Yamamoto, H. (2023) <doi:10.51094/jxiv.262>) and over-representation analysis (ORA) that accounts for undetected metabolites (Yamamoto, H. (2024) <doi:10.51094/jxiv.954>).

License:

LGPL-3

Encoding:

UTF-8

LazyData:

true

Imports:

XML

Depends:

loadings

NeedsCompilation:

Packaged:

2025-07-05 12:24:25 UTC; yamamoto

Author:

Hiroyuki Yamamoto [aut, cre]

Maintainer:

Hiroyuki Yamamoto <h.yama2396@gmail.com>

Repository:

CRAN

Date/Publication:

2025-07-05 12:40:02 UTC

Convert metabolite set / csv to list

Description

This function converts your own metabolite set (csv file to list).

Usage

csv2list(filepath)

Arguments

filepath

file path of metabolite set (csv file)

Details

The first row of csv file are "metabolite set name" and "metabolite IDs" as header. The first column must be metabolite IDs and second column must be metabolite set name.

Value

list of metabolite set name and metabolite IDs

Author(s)

Hiroyuki Yamamoto

Examples

	# ---------------------------
	#  Convert csv file to list
	# ---------------------------
	# filepath <- "C:/pathway.csv"	# filepath of csv file
	# N <- csv2list(filepath)	# convert csv file to list

Metabolome data from a fasting mouse study

Description

Example data derived from a fasting mouse liver metabolomics study, originally published by Yamamoto et al. and distributed in the loadings package as fasting. This version is reformatted and annotated to reproduce the analysis setup used in MetaboAnalyst. It is intended for demonstrating ORA and MSEA functions in the mseapca package.

Usage

data(fasting_mseapca)

Format

A named list with the following components:

SIG: Character vector of significant metabolites (e.g., p < 0.05).
DET: Character vector of all detected metabolites.
pathway: Named list of metabolite sets (pathways). Each element is a character vector of metabolite IDs belonging to that pathway.

Details

The dataset is intended for vignette examples:

SIG — Significant metabolites (p < 0.05) obtained from statistical analysis of a fasting vs control comparison.
DET — Background list of all detected metabolites.
pathway — Pathway definitions from MetaboAnalyst, reused for mouse data.

The object can be used directly with functions such as msea_ora, ora_det, and ora_bino.

Source

Derived from the fasting dataset in the loadings package.

Original reference: Yamamoto H., Fujimori T., Sato H., Ishikawa G., Kami K., Ohashi Y. (2014). "Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis". BMC Bioinformatics, 15(1):51.

Examples

## Load data
data(fasting_mseapca)

SIG <- fasting_mseapca$SIG
DET <- fasting_mseapca$DET
M   <- fasting_mseapca$pathway

## Simple ORA on detected metabolites
res <- ora_det(SIG, DET, M)
head(res$`Result of MSEA(ORA)`)

Save compound set as XML file

Description

This function save compound set of list format as XML file.

Usage

list2xml(filepath, M)

Arguments

filepath

filepath of XML file to save

M

list fomat of compound set and compound names

Details

This function is used to store a compound set. Saved xml file can be read using the read_pathway function.

Value

filepath of saved XML file

Author(s)

Hiroyuki Yamamoto

Examples

## Not run: 
	data(pathway)
	M <- pathway$fasting
	xml_file <- "pathway_fasting.xml"
	N <- list2xml(xml_file, M)
	# XML::saveXML(N,filepath)
	
## End(Not run)

Wrapper function for over-representation analysis (ORA)

Description

msea_ora is a wrapper function that calls ORA implementations, allowing the user to choose how undetected metabolites are handled.

"det" – ORA using detected metabolites only as background (ora_det).
"all" – ORA using all metabolites appearing in the pathway list as background (ora_all).
"est_naive", "est_weighted", "est_shrink" – ORA adjusted for undetected metabolites by the "naive", "weighted", or "shrinkage" estimator (all via ora_est).

Usage

msea_ora(SIG, DET, M, option = "det", lambda=NULL)

Arguments

SIG

Character vector of significant metabolites

DET

Character vector of detected metabolites. Required for all options except "all".

M

A named list, where each element is a metabolite set (e.g., pathway) containing character vectors of metabolites.

option

One of "det", "all", "est_naive", "est_weighted", or "est_shrink".

lambda

Shrinkage parameter used only when option = "est_shrink".

Value

A list with:

Result of MSEA(ORA)

Matrix of p-values and q-values

significant metabolites

List of significant metabolites per set

Contingency tables

A list of 2×2 contingency tables used in Fisher's exact tests.

Author(s)

Hiroyuki Yamamoto

References

Yamamoto H, Fujimori T, Sato H, Ishikawa G, Kami K, Ohashi Y, Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis. BMC Bioinformatics, (2014) 15(1):51.

Yamamoto H. Probabilistic Over-Representation Analysis for Metabolite Set Enrichment Analysis Considering Undetected Metabolites", Jxiv, (2024).

Examples

# Example1 : Metabolome data
data(fasting)
data(pathway)

# pca and pca loading
pca <- prcomp(fasting$X, scale=TRUE)
pca <- pca_loading(pca)

# all detected metabolites
metabolites <- colnames(fasting$X)

# statistically significant negatively correlated metabolites in PC1 loading
SIG <- metabolites[pca$loading$R[,1] < 0 & pca$loading$p.value[,1] < 0.05]
DET <- metabolites

# Fix for multiple annotations
DET[DET == "UDP-glucose ; UDP-galactose"] <- "UDP-glucose"
DET[DET == "Isonicotinamide ; Nicotinamide"] <- "Nicotinamide"
DET[DET == "1-Methylhistidine ; 3-Methylhistidine"] <- "3-Methylhistidine"

# metabolite set list
M <- pathway$fasting

# MSEA by over representation analysis
B <- msea_ora(SIG, DET, M)
B$`Result of MSEA(ORA)`

## Example2 : Proteome data
data(covid19)
data(pathway)

X <- covid19$X$proteomics
Y <- covid19$Y
D <- covid19$D
tau <- covid19$tau

protein_name <- colnames(X)

# pls-rog and pls-rog loading
plsrog <- pls_rog(X,Y,D)
plsrog <- plsrog_loading(plsrog)

# statistically significant proteins
index_prot <- which(plsrog$loading$R[,1]>0 & plsrog$loading$p.value[,1]<0.05)
sig_prot <- protein_name[index_prot]

# protein set list
M <- pathway$covid19$proteomics

# MSEA by over representation analysis
B <- msea_ora(sig_prot, protein_name, M)
B$`Result of MSEA(ORA)`

## Example3: Metabolome data
data(fasting_mseapca)

SIG <- fasting_mseapca$SIG
DET <- fasting_mseapca$DET
M <- fasting_mseapca$pathway

# Perform ORA using detected metabolites only
B <- msea_ora(SIG, DET, M)
B$`Result of MSEA(ORA)`

Wrapper function for Over-Representation Analysis with p-value range estimation

Description

This function performs over-representation analysis (ORA) to assess metabolite set enrichment while considering uncertainty due to undetected metabolites. It wraps different methods for estimating a p-value range, including full enumeration and binomial resampling.

Usage

msea_ora_range(SIG, DET = NULL, M,
               option = "ora_full",
               probs = c(0.025, 0.975),
               nsim = 1000,
               lambda = 5)

Arguments

SIG

A character vector of statistically significant metabolites.

DET

A character vector of all detected metabolites. Required for all methods except ora_full.

M

A named list of metabolite sets, each containing a character vector of metabolites.

option

Method to use for estimating the p-value range. One of "ora_full", "bino_naive", "bino_weighted", or "bino_shrink".

probs

Numeric vector of quantile probabilities for binomial simulation (e.g., c(0.025, 0.975)). Ignored for ora_full.

nsim

Number of simulations for binomial-based estimation. Ignored for ora_full.

lambda

Shrinkage parameter for "bino_shrink" option.

Details

This wrapper function allows switching between multiple ORA implementations that estimate the uncertainty due to undetected metabolites. The ora_full method uses exhaustive enumeration of all possible detection patterns, while the other methods use binomial resampling with different estimation strategies (naive, weighted, or shrinkage-based).

Value

A list containing a matrix with p-value range results for each metabolite set. Columns include lower, median, and upper p-values.

Author(s)

Hiroyuki Yamamoto

Examples

# Example: Metabolome data
data(fasting_mseapca)

SIG <- fasting_mseapca$SIG
DET <- fasting_mseapca$DET
M <- fasting_mseapca$pathway

# Perform ORA using detected metabolites only
B <- msea_ora_range(SIG, DET, M, option = "bino_naive", nsim=10)

B$`Range of p-values for each pathway`

MSEA by Subramanian et al.

Description

This function performs metabolite set enrichment analysis implemented in the same fashion as gene set enrichment analysis (Subramanian et al. 2005). In this function, a permutation procedure is performed for a metabolite set rather than class label. This procedure corresponds to a "gene set" of permutation type in GSEA-P software (Subramanian et al. 2007). A leading-edge subset analysis is also undertaken following the standard GSEA procedure.

Usage

msea_sub(M, D, y, maxiter = 1000)

Arguments

M

list of metbolite set name and metabolite IDs

D

data.frame(metabolite ID, data matix)

y

response variable (e.g. PC score)

maxiter

maximum number of iterations in random permutation (default=1000)

Value

list of normalized enrichment score, p-value and q-value for metabolite set, and the results of leading edge subset

Author(s)

Hiroyuki Yamamoto

References

Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S. & Mesirov, J. P. (2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545-15550.

Subramanian, A., Kuehn, H., Gould, J., Tamayo, P., Mesirov, J.P. (2007) GSEA-P: A desktop application for Gene Set Enrichment Analysis. Bioinformatics, doi: 10.1093/bioinformatics/btm369.

Examples

data(fasting)
data(pathway)

# pca and pca loading
pca <- prcomp(fasting$X, scale=TRUE)
pca <- pca_loading(pca)

# all detected metabolites
metabolites <- colnames(fasting$X)

# statistically significant negatively correlated metabolites in PC1 loading
SIG <- metabolites[pca$loading$R[,1] < 0 & pca$loading$p.value[,1] < 0.05]
ALL <- metabolites #all detected metabolites

# Set response variable
y <- pca$x[,1]

# preparing dataframe
D <- data.frame(ALL,t(fasting$X)) 		# preparing dataframe

# MSEA by Subramanian et al.
M <- pathway$fasting
P <- msea_sub(M,D,y, maxiter = 10) # iteration was set ato 10 for demonstration

ORA using all metabolites

Description

This function performs over-representation analysis (ORA) using all metabolites present in the given metabolite set list as the background, without specifying a reference metabolome. This corresponds to the behavior of MetaboAnalyst when no reference metabolome is uploaded.

Usage

ora_all(SIG, M)

Arguments

SIG

Character vector of significant metabolites

M

Named list of metabolite sets

Value

A list with:

Result of MSEA(ORA)

Matrix of p-values and q-values

significant metabolites

List of significant metabolites per set

Contingency tables

A list of 2×2 contingency tables used in Fisher's exact tests.

Author(s)

Hiroyuki Yamamoto

References

Yamamoto H. Probabilistic Over-Representation Analysis for Metabolite Set Enrichment Analysis Considering Undetected Metabolites", Jxiv, (2024).

Examples

# Example: Metabolome data
data(fasting_mseapca)

SIG <- fasting_mseapca$SIG
M <- fasting_mseapca$pathway

# Perform ORA using detected metabolites only
B <- ora_all(SIG, M)
B$`Result of MSEA(ORA)`

Over-representation analysis with binomial resampling adjustment

Description

Performs ORA while adjusting for undetected metabolites by binomial resampling.

Usage

ora_bino(SIG, DET, M, method = "naive", probs = c(0.025, 0.975), nsim = 1000, lambda = 5)

Arguments

SIG

Character vector of significant metabolites.

DET

Character vector of detected metabolites.

M

Named list of metabolite sets.

method

"naive", "weighted", or "shrink".

probs

Quantiles for the confidence interval (default 95%).

nsim

Number of binomial simulations (default 1000).

lambda

Shrinkage parameter used when method = "shrink" (default: 5).

Value

A list containing one matrix. Rows = metabolite sets; columns = lower, median, and upper p-values.

Author(s)

Hiroyuki Yamamoto

Examples

# Example: Metabolome data
data(fasting_mseapca)

SIG <- fasting_mseapca$SIG
DET <- fasting_mseapca$DET
M <- fasting_mseapca$pathway

# Perform ORA using detected metabolites only
B <- ora_bino(SIG, DET, M, method="naive", nsim = 10)

B$`Range of p-values for each pathway`

ORA using detected metabolites

Description

This function performs metabolite set enrichment analysis using over-representation analysis (ORA) under the assumption that only detected metabolites are used as the background. A one-sided Fisher's exact test is applied to each metabolite set.

Usage

ora_det(SIG, DET, M)

Arguments

SIG

Character vector of significant metabolites

DET

Character vector of detected metabolites (background set)

M

A named list, where each element is a metabolite set (e.g., pathway) containing character vectors of metabolites

Value

A list containing:

Result of MSEA(ORA): A matrix with raw p-values and adjusted q-values (BH correction) for each metabolite set.
significant metabolites: A list of significant metabolites overlapping with each metabolite set.
Contingency tables: A list of 2×2 contingency tables used in Fisher's exact tests.

Author(s)

Hiroyuki Yamamoto

References

Yamamoto H. Probabilistic Over-Representation Analysis for Metabolite Set Enrichment Analysis Considering Undetected Metabolites", Jxiv, (2024).

Examples

# Example: Metabolome data
data(fasting_mseapca)

SIG <- fasting_mseapca$SIG
DET <- fasting_mseapca$DET
M <- fasting_mseapca$pathway

# Perform ORA using detected metabolites only
B <- ora_det(SIG, DET, M)
B$`Result of MSEA(ORA)`

ORA adjusting for undetected metabolites

Description

This function performs metabolite set enrichment analysis using over-representation analysis (ORA), incorporating point estimates to adjust for potentially undetected metabolites. It supports three estimation methods: naive, weighted, and shrinkage-based adjustment.

Usage

ora_est(SIG, DET, M, method = "naive", lambda = 5)

Arguments

SIG

Character vector of significant metabolites

DET

Character vector of detected metabolites (background set)

M

A named list, where each element is a metabolite set (e.g., pathway) containing character vectors of metabolites

method

A character string specifying the estimation method to use. One of "naive", "weighted", or "shrink". Default is "naive".

lambda

A numeric value used in the "shrink" method as a shrinkage parameter. Default is 5.

Details

This function estimates the impact of undetected metabolites on enrichment results. It builds upon the ORA results from detected metabolites, then adjusts the contingency tables by estimating how many undetected metabolites might be significant, based on a specified method.

Value

A list containing:

Result of MSEA (ORA with adjustment): A matrix with raw p-values and adjusted q-values (BH correction) for each metabolite set.
significant metabolites: A list of significant metabolites overlapping with each metabolite set.
Contingency tables: A list of 2×2 contingency tables used in Fisher's exact tests.

Author(s)

Hiroyuki Yamamoto

References

Yamamoto H. Probabilistic Over-Representation Analysis for Metabolite Set Enrichment Analysis Considering Undetected Metabolites", Jxiv, (2024).

Examples

# Example: Metabolome data
data(fasting_mseapca)

SIG <- fasting_mseapca$SIG
DET <- fasting_mseapca$DET
M <- fasting_mseapca$pathway

# Perform ORA using detected metabolites only
B <- ora_est(SIG, DET, M)
B$`Result of MSEA (ORA with adjustment)`

Over-representation analysis with full enumeration of undetected metabolite patterns

Description

This function performs over-representation analysis (ORA) by enumerating all possible patterns of significant and non-significant assignments among undetected metabolites for each metabolite set. It returns the minimum, median, and maximum p-values from Fisher’s exact tests across these patterns, thereby estimating the full uncertainty range due to undetected metabolites.

Usage

ora_full(SIG, DET, M)

Arguments

SIG

A character vector of statistically significant metabolites.

DET

A character vector of all detected metabolites (the background).

M

A named list of metabolite sets, where each element is a character vector of metabolites.

Details

For each metabolite set, the number of undetected metabolites is calculated. The function then considers all possible numbers of significant metabolites (from 0 to the total number of undetected ones) among those undetected. For each case, a 2x2 contingency table is constructed and Fisher’s exact test is applied. The resulting p-values are aggregated to report the minimum, median, and maximum values.

Value

A list containing:

Range of p-values

A matrix with rows corresponding to metabolite sets and three columns: lower p-value, p-value(median), and upper p-value.

Author(s)

Hiroyuki Yamamoto

References

Yamamoto H. Probabilistic Over-Representation Analysis for Metabolite Set Enrichment Analysis Considering Undetected Metabolites", Jxiv, (2024).

Examples

# Example: Metabolome data
data(fasting_mseapca)

SIG <- fasting_mseapca$SIG
DET <- fasting_mseapca$DET
M <- fasting_mseapca$pathway

# Perform ORA using detected metabolites only
B <- ora_full(SIG, DET, M)
B$`Range of p-values`

Generate metabolite set list from PathBank database

Description

This function generates metabolite set list of PathBank database by referencing the AHPathbankDbs Bioconductor package.

Usage

pathbank2list(tbl_pathbank, subject, id)

Arguments

tbl_pathbank

tibble from AHPathbankDbs

subject

Pathway subject (Metabolic, Disease, etc.) in tibble

id

database ID (HMDB ID, Uniprot ID, etc.) used for analysis

Details

AHPathbankDbs needs to be installed separately.

Value

list of metabolite or protein set

Author(s)

Hiroyuki Yamamoto

Examples

## PathBank
#library(AnnotationHub)
#ah <- AnnotationHub()

#qr <- query(ah, c("pathbank", "Homo sapiens"))

##tbl_pathbank <- qr[[1]] # metabolomics
#tbl_pathbank <- qr[[2]] # proteomics

#ids <- names(tbl_pathbank)[-c(1:4)]
#id <- ids[1] # Uniprot ID

#subs <- unique(tbl_pathbank$`Pathway Subject`)
#subject <- subs[6] # Protein

# M <- pathbank2list(tbl_pathbank, subject, id)

Example of metabolite set list for fasting and covid19 datasets

Description

This data includes metabolite set list for fasting and covid19 datasets in loadings package

Usage

data(pathway)

Arguments

The list object pathway contains the following elements:

fasting : metabolite set list for fasting mouse dataset

covid19$proteomics : protein set list for covid19 dataset.

References

Yamamoto H., Fujimori T., Sato H., Ishikawa G., Kami K., Ohashi Y. (2014). "Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis". BMC Bioinformatics, (2014) 15(1):51.

B. Shen, et al, Proteomic and Metabolomic Characterization of COVID-19 Patient Sera, Cell. 182 (2020) 59-72.e15.

Examples

data(pathway)

Read metabolite set file (*.xml)

Description

This function generates metabolite set list from metabolite set file (XML). This is mainly used to be called by other functions.

Usage

read_pathway(fullpath)

Arguments

fullpath

file path of metabolite set (XML)

Value

list of metabolite set name and metabolite IDs.

Author(s)

Hiroyuki Yamamoto

Examples

	# filename <- "C:/R/pathway.xml"	# load metabolite set file
	# M <- read_pathway(filename)		# Convert XML to metabolite set (list)

Generate binary label matrix of metabolite set

Description

This function generates binary label matrix of metabolites and metabolite sets. This is mainly used to be called by other functions, and used to count the number of metabolites in a specific metabolite set.

Usage

setlabel(MET, M)

Arguments

MET

A character vector of metabolites (e.g., detected or significant metabolites).

M

A named list of metabolite sets, where each element is a character vector of metabolites.

Details

This function is used internally in various ORA methods (e.g., ora_det, ora_all, ora_est) to compute contingency tables for enrichment analysis.

Value

binary label matrix of metabolites in metabolite sets

Author(s)

Hiroyuki Yamamoto

Examples

# Example
data(fasting)
data(pathway)

MET <- colnames(fasting$X) # detected metabolites
M <- pathway$fasting # metabolite set list

# Fix for multiple annotations
MET[MET == "UDP-glucose ; UDP-galactose"] <- "UDP-glucose"
MET[MET == "trans-Glutaconic acid ; Itaconic acid"] <- "Itaconic acid"
MET[MET == "Isonicotinamide ; Nicotinamide"] <- "Nicotinamide"
MET[MET == "Isobutyric acid ; Butyric acid"] <- "Isobutyric acid"
MET[MET == "GDP-mannose ; GDP-galactose"] <- "GDP-mannose"
MET[MET == "ADP-glucose ; GDP-fucose"] <- "ADP-glucose"
MET[MET == "1-Methylhistidine ; 3-Methylhistidine"] <- "3-Methylhistidine"

L <- setlabel(MET, M)

# Example 2
data(fasting_mseapca)

MET <- fasting_mseapca$DET
M <- fasting_mseapca$pathway

L <- setlabel(MET, M)

Single sample enrichment analysis by over representation analysis

Description

This function performs single sample enrichment analysis (SSEA) by over representation analysis (ORA). SSEA performs MSEA by ORA between detected and not detected metabolites in each sample."

Usage

ssea_ora(det_list, det_all, M)

Arguments

det_list

metabolite names of detected metabolites

det_all

metabolite names of all metabolites

M

list of metabolite set and metabolite names

Details

The threshold for determining whether a metabolite is detected or not is typically set by the signal-to-noise (S/N) ratio. If the S/N ratio is unavailable, one might consider using the signal intensity or peak area for each metabolite as an alternative. In such cases, all values below the threshold can be set to 0.

Value

A matrix where each row represents a sample and each column represents a set of metabolites.

Author(s)

Hiroyuki Yamamoto

References

Yamamoto H., Single sample enrichment analysisfor mass spectrometry-based omics data, Jxiv.(2023)

Examples

data(fasting)
data(pathway)

det_list <- pathway$data$fasting
M <- pathway$fasting
det_all <- unique(c(colnames(fasting$X), as.character(unlist(M))))

# SSEA
Z <- ssea_ora(det_list, det_all, M)

## PCA for SSEA score
pca <- prcomp(Z, scale=TRUE)
pca <- pca_loading(pca)

Convert metabolite set / csv to list

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Metabolome data from a fasting mouse study

Description

Usage

Format

Details

Source

Examples

Save compound set as XML file

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Wrapper function for over-representation analysis (ORA)

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Wrapper function for Over-Representation Analysis with p-value range estimation

Description

Usage

Arguments

Details

Value

Author(s)

Examples

MSEA by Subramanian et al.

Description

Usage

Arguments

Value

Author(s)

References

Examples

ORA using all metabolites

Description

Usage

Arguments

Value

Author(s)

References

Examples

Over-representation analysis with binomial resampling adjustment

Description

Usage

Arguments

Value

Author(s)

Examples

ORA using detected metabolites

Description

Usage

Arguments

Value

Author(s)

References

Examples

ORA adjusting for undetected metabolites

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples