Help for package pathfindR.data

Title:

Data Package for 'pathfindR'

Version:

2.1.0

Maintainer:

Ege Ulgen <egeulgen@gmail.com>

Description:

This is a data-only package, containing data needed to run the CRAN package 'pathfindR', a package for enrichment analysis utilizing active subnetworks. This package contains protein-protein interaction network data, data related to gene sets and example input/output data.

License:

MIT + file LICENSE

Encoding:

UTF-8

LazyData:

true

Depends:

R (≥ 4.0)

RoxygenNote:

7.3.1

URL:

https://github.com/egeulgen/pathfindR.data

BugReports:

https://github.com/egeulgen/pathfindR.data/issues

NeedsCompilation:

Packaged:

2024-04-27 18:55:55 UTC; egeulgen

Author:

Ege Ulgen

[cre, cph], Ozan Ozisik

[aut]

Repository:

CRAN

Date/Publication:

2024-04-27 22:50:03 UTC

BioCarta Pathways - Descriptions

Description

A named vector containing the descriptions for each human BioCarta pathway. Generated on 27 Apr 2024.

Usage

biocarta_descriptions

Format

named vector containing 292 character values, the descriptions for the given pathways.

BioCarta Pathways - Gene Sets

Description

A list containing the genes involved in each human BioCarta pathway. Each element is a vector of gene symbols located in the given pathway. Generated on 27 Apr 2024.

Usage

biocarta_genes

Format

list containing 292 vectors of gene symbols. Each vector corresponds to a gene set.

Human Cell Markers - Descriptions

Description

A named vector containing descriptions of different cell types from different tissues in human. Names of the vectors are Cell Ontology IDs (if available) of the cell types in the following format: "tissue type, cancer type, cell name" For more information, refer to the article: Hu C, Li T, Xu Y, Zhang X, Li F, Bai J, et al. CellMarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scRNA-seq data. Nucleic Acids Res. 2022 Oct 27;gkac947. Generated on 27 Apr 2024.

Usage

cell_markers_descriptions

Format

named vector containing 1986 character values, the descriptions for the given human cell types.

Human Cell Markers - Gene Sets

Description

A list containing the sets of genes that are cell markers of different cell types from different tissues in human. Each element is a vector of cell marker gene symbols for the given cell type. Names correspond to the Cell Ontology ID (if available) of the cell type. For more information, refer to the article: Hu C, Li T, Xu Y, Zhang X, Li F, Bai J, et al. CellMarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scRNA-seq data. Nucleic Acids Res. 2022 Oct 27;gkac947. Generated on 27 Apr 2024.

Usage

cell_markers_gsets

Format

list containing 1986 vectors. Each vector corresponds to a cell marker gene set for a given human cell type.

Example Active Subnetworks

Description

A list of vectors containing genes for each active subnetwork that passed the filtering step. Generated on 27 Apr 2024.

Usage

example_active_snws

Format

list containing 150 vectors. Each vector is the set of genes for the given active subnetwork.

Second Example Output for the pathfindR Enrichment Workflow (H.sapiens. - Rheumatoid Arthritis data)

Description

The data frame containing the results of pathfindR's active-subnetwork-oriented enrichment workflow performed on the rheumatoid arthritis dataset GSE84074 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE84074. Analysis via run_pathfindR was performed using the default settings. Generated on 27 Apr 2024.

Usage

example_comparison_output

Format

A data frame with 38 rows and 9 columns:

ID: ID of the enriched term
Term_Description: Description of the enriched term
Fold_Enrichment: Fold enrichment value for the enriched term
occurrence: the number of iterations that the given term was found to enriched over all iterations
support: the median support (proportion of active subnetworks leading to enrichment within an iteration) over all iterations
lowest_p: the lowest adjusted-p value of the given term over all iterations
highest_p: the highest adjusted-p value of the given term over all iterations
Up_regulated: the up-regulated genes in the input involved in the given term, comma-separated
Down_regulated: the down-regulated genes in the input involved in the given term, comma-separated

Custom Gene Set Enrichment Results

Description

A data frame consisting of pathfindR enrichment analysis results on the example TF target genes data (target gene sets of CREB and MYC). Generated on 27 Apr 2024.

Usage

example_custom_genesets_result

Format

data frame containing 2 rows and 9 columns. Each row is a gene set (the TF target gene sets).

Example Experiment Matrix for pathfindR - Enriched Term Scoring

Description

A matrix containing the log₂-transformed and quantile-normalized expression values of the differentially-expressed genes for 18 rheumatoid arthritis (RA) patients and 15 healthy subjects. The matrix contains expression values of 572 significantly differentially-expressed genes (see example_pathfindR_input) with adj.P.Val <= 0.05. Generated on 28 Sep 2019.

Usage

example_experiment_matrix

Format

A matrix with 572 rows and 33 columns.

Source

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15573

Example Input for Mus musculus - Myeloma Analysis

Description

A dataset containing the differentially-expressed genes and adjusted p-values for the GEO dataset GSE99393. The RNA microarray experiment was perform to detail the global program of gene expression underlying polarization of myeloma-associated macrophages by CSF1R antibody treatment. The samples were 6 murine bone marrow derived macrophages co-cultured with myeloma cells (myeloma-associated macrophages), 3 of which were treated with CSF1R antibody (treatment group) and the rest were treated with control IgG antibody (control group). In this dataset, differentially-expressed genes with |logFC| >= 2 and FDR < 0.05 are presented. Generated on 1 Nov 2019.

Usage

example_mmu_input

Format

A data frame with 45 rows and 2 variables:

Gene_Symbol: MGI gene symbols of the differentially-expressed genes
FDR: adjusted p values, via the Benjamini & Hochberg (1995) method

Source

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99393

Example Output for Mus musculus - Myeloma Analysis

Description

A dataset containing the results of pathfindR's active-subnetwork-oriented enrichment workflow performed on the Mus musculus myeloma differential expression dataset example_mmu_input. Generated on 27 Apr 2024.

Usage

example_mmu_output

Format

A data frame with 34 rows and 9 columns:

ID: ID of the enriched term
Term_Description: Description of the enriched term
Fold_Enrichment: Fold enrichment value for the enriched term
occurrence: the number of iterations that the given term was found to enriched over all iterations
support: the median support (proportion of active subnetworks leading to enrichment within an iteration) over all iterations
lowest_p: the lowest adjusted-p value of the given term over all iterations
highest_p: the highest adjusted-p value of the given term over all iterations
Up_regulated: the up-regulated genes in the input involved in the given term, comma-separated
Down_regulated: the down-regulated genes in the input involved in the given term, comma-separated

Example Input for the pathfindR Enrichment Workflow - Rheumatoid Arthritis (H.sapiens)

Description

A dataset containing the differentially-expressed genes along with the associated log₂(fold-change) values and FDR adjusted p-values for the GEO dataset GSE15573. This microarray dataset aimed to characterize gene expression profiles in the peripheral blood mononuclear cells of 18 rheumatoid arthritis (RA) patients versus 15 healthy subjects. Differentially-expressed genes with adj.P.Val < 0.05 are presented in this data frame. Generated on 1 Nov 2019.

Usage

example_pathfindR_input

Format

A data frame with 572 rows and 3 variables:

Gene.symbol: HGNC gene symbols of the differentially-expressed genes
logFC: log₂
(fold-change) values
adj.P.Val: adjusted p values, via the Benjamini & Hochberg (1995) method

Source

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15573

Example Output for the pathfindR Enrichment Workflow - Rheumatoid Arthritis

Description

The data frame containing the results of pathfindR's active-subnetwork-oriented enrichment workflow performed on the rheumatoid arthritis differential-expression data frame example_pathfindR_input. Analysis via run_pathfindR was performed using the default settings. Generated on 27 Apr 2024.

Usage

example_pathfindR_output

Format

A data frame with 121 rows and 9 columns:

ID: ID of the enriched term
Term_Description: Description of the enriched term
Fold_Enrichment: Fold enrichment value for the enriched term
occurrence: the number of iterations that the given term was found to enriched over all iterations
support: the median support (proportion of active subnetworks leading to enrichment within an iteration) over all iterations
lowest_p: the lowest adjusted-p value of the given term over all iterations
highest_p: the highest adjusted-p value of the given term over all iterations
Up_regulated: the up-regulated genes in the input involved in the given term, comma-separated
Down_regulated: the down-regulated genes in the input involved in the given term, comma-separated

Example Output for the pathfindR Clustering Workflow - Rheumatoid Arthritis

Description

A dataset containing the results of pathfindR's clustering and partitioning workflow performed on the rheumatoid arthritis enrichment results example_pathfindR_output. The clustering and partitioning function cluster_enriched_terms was used with the default settings (i.e. hierarchical clustering was performed and the agglomeration method was "average"). Generated on 27 Apr 2024.

Usage

example_pathfindR_output_clustered

Format

A data frame with 121 rows and 11 columns:

ID: ID of the enriched term
Term_Description: Description of the enriched term
Fold_Enrichment: Fold enrichment value for the enriched term
occurrence: the number of iterations that the given term was found to enriched over all iterations
support: the median support (proportion of active subnetworks leading to enrichment within an iteration) over all iterations

lowest_p: the lowest adjusted-p value of the given term over all iterations
highest_p: the highest adjusted-p value of the given term over all iterations
Up_regulated: the up-regulated genes in the input involved in the given term, comma-separated
Down_regulated: the down-regulated genes in the input involved in the given term, comma-separated
Cluster: the cluster to which the enriched term is assigned
Status: whether the enriched term is the "Representative" term in its cluster or only a "Member"

Gene Ontology - All Gene Ontology Gene Sets

Description

A list containing the genes involved in each GO ontology term. Each element is a vector of gene symbols located in the given gene set. Generated on 27 Apr 2024.

Usage

go_all_genes

Format

list containing 15450 vectors of gene symbols. Each vector corresponds to a GO gene set.

KEGG Pathways - Descriptions

Description

A named vector containing the descriptions for each Homo sapiens KEGG pathway. Names of the vector correspond to the KEGG ID of the pathway. Pathways that did not contain any genes were discarded. Generated on 27 Apr 2024.

Usage

kegg_descriptions

Format

named vector containing 358 character values, the descriptions for the given pathways.

KEGG Pathways - Gene Sets

Description

A list containing the genes involved in each Homo sapiens KEGG pathway. Each element is a vector of gene symbols located in the given pathway. Names correspond to the KEGG ID of the pathway. Pathways that did not contain any genes were discarded. Generated on 27 Apr 2024.

Usage

kegg_genes

Format

list containing 358 vectors of gene symbols. Each vector corresponds to a pathway.

Mus Musculus KEGG Pathways - Descriptions

Description

A named vector containing the descriptions for each Mus musculus KEGG pathway. Names of the vector correspond to the KEGG ID of the pathway. Pathways that did not contain any genes were discarded. Generated on 27 Apr 2024.

Usage

mmu_kegg_descriptions

Format

named vector containing 355 character values, the descriptions for the given pathways.

Mus Musculus KEGG Pathways - Gene Sets

Description

A list containing the genes involved in each Mus musculus KEGG pathway. Each element is a vector of gene symbols located in the given pathway. Names correspond to the KEGG ID of the pathway. Pathways that did not contain any genes were discarded. Generated on 27 Apr 2024.

Usage

mmu_kegg_genes

Format

list containing 355 vectors of gene symbols. Each vector corresponds to a pathway.

Table of Data for pathfindR

Description

Data frame containing all the data for pathfindR along with descriptions and last update dates.

Usage

pathfindR.data_updates

Format

A data frame with 30 rows and 6 columns:

Category: Category of the data
Name: Name of the data
Description: Description of the data
Source: Source of the data
Version: Version of the data (if applicable)
Last Update: Last update date

Reactome Pathways - Descriptions

Description

A named vector containing the descriptions for each human Reactome pathway. Names of the vector correspond to the Reactome ID of the pathway. Generated on 27 Apr 2024.

Usage

reactome_descriptions

Format

named vector containing 2681 character values, the descriptions for the given pathways.

Reactome Pathways - Gene Sets

Description

A list containing the genes involved in each human Reactome pathway. Each element is a vector of gene symbols located in the given pathway. Names correspond to the Reactome ID of the pathway. Generated on 27 Apr 2024.

Usage

reactome_genes

Format

list containing 2681 vectors of gene symbols. Each vector corresponds to a pathway.

BioCarta Pathways - Descriptions

Description

Usage

Format

BioCarta Pathways - Gene Sets

Description

Usage

Format

Human Cell Markers - Descriptions

Description

Usage

Format

Human Cell Markers - Gene Sets

Description

Usage

Format

Example Active Subnetworks

Description

Usage

Format

Second Example Output for the pathfindR Enrichment Workflow (H.sapiens. - Rheumatoid Arthritis data)

Description

Usage

Format

See Also

Custom Gene Set Enrichment Results

Description

Usage

Format

Example Experiment Matrix for pathfindR - Enriched Term Scoring

Description

Usage

Format

Source

See Also

Example Input for Mus musculus - Myeloma Analysis

Description

Usage

Format

Source

See Also

Example Output for Mus musculus - Myeloma Analysis

Description

Usage

Format

See Also

Example Input for the pathfindR Enrichment Workflow - Rheumatoid Arthritis (H.sapiens)

Description

Usage

Format

Source

See Also

Example Output for the pathfindR Enrichment Workflow - Rheumatoid Arthritis

Description

Usage

Format

See Also

Example Output for the pathfindR Clustering Workflow - Rheumatoid Arthritis

Description

Usage

Format

See Also

Gene Ontology - All Gene Ontology Gene Sets

Description

Usage

Format

KEGG Pathways - Descriptions

Description

Usage

Format

KEGG Pathways - Gene Sets

Description

Usage

Format

Mus Musculus KEGG Pathways - Descriptions

Description

Usage

Format

Mus Musculus KEGG Pathways - Gene Sets

Description