Title: Data Package for 'pathfindR'
Version: 2.1.0
Maintainer: Ege Ulgen <egeulgen@gmail.com>
Description: This is a data-only package, containing data needed to run the CRAN package 'pathfindR', a package for enrichment analysis utilizing active subnetworks. This package contains protein-protein interaction network data, data related to gene sets and example input/output data.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Depends: R (≥ 4.0)
RoxygenNote: 7.3.1
URL: https://github.com/egeulgen/pathfindR.data
BugReports: https://github.com/egeulgen/pathfindR.data/issues
NeedsCompilation: no
Packaged: 2024-04-27 18:55:55 UTC; egeulgen
Author: Ege Ulgen ORCID iD [cre, cph], Ozan Ozisik ORCID iD [aut]
Repository: CRAN
Date/Publication: 2024-04-27 22:50:03 UTC

BioCarta Pathways - Descriptions

Description

A named vector containing the descriptions for each human BioCarta pathway. Generated on 27 Apr 2024.

Usage

biocarta_descriptions

Format

named vector containing 292 character values, the descriptions for the given pathways.


BioCarta Pathways - Gene Sets

Description

A list containing the genes involved in each human BioCarta pathway. Each element is a vector of gene symbols located in the given pathway. Generated on 27 Apr 2024.

Usage

biocarta_genes

Format

list containing 292 vectors of gene symbols. Each vector corresponds to a gene set.


Human Cell Markers - Descriptions

Description

A named vector containing descriptions of different cell types from different tissues in human. Names of the vectors are Cell Ontology IDs (if available) of the cell types in the following format: "tissue type, cancer type, cell name" For more information, refer to the article: Hu C, Li T, Xu Y, Zhang X, Li F, Bai J, et al. CellMarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scRNA-seq data. Nucleic Acids Res. 2022 Oct 27;gkac947. Generated on 27 Apr 2024.

Usage

cell_markers_descriptions

Format

named vector containing 1986 character values, the descriptions for the given human cell types.


Human Cell Markers - Gene Sets

Description

A list containing the sets of genes that are cell markers of different cell types from different tissues in human. Each element is a vector of cell marker gene symbols for the given cell type. Names correspond to the Cell Ontology ID (if available) of the cell type. For more information, refer to the article: Hu C, Li T, Xu Y, Zhang X, Li F, Bai J, et al. CellMarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scRNA-seq data. Nucleic Acids Res. 2022 Oct 27;gkac947. Generated on 27 Apr 2024.

Usage

cell_markers_gsets

Format

list containing 1986 vectors. Each vector corresponds to a cell marker gene set for a given human cell type.


Example Active Subnetworks

Description

A list of vectors containing genes for each active subnetwork that passed the filtering step. Generated on 27 Apr 2024.

Usage

example_active_snws

Format

list containing 150 vectors. Each vector is the set of genes for the given active subnetwork.


Second Example Output for the pathfindR Enrichment Workflow (H.sapiens. - Rheumatoid Arthritis data)

Description

The data frame containing the results of pathfindR's active-subnetwork-oriented enrichment workflow performed on the rheumatoid arthritis dataset GSE84074 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE84074. Analysis via run_pathfindR was performed using the default settings. Generated on 27 Apr 2024.

Usage

example_comparison_output

Format

A data frame with 38 rows and 9 columns:

ID

ID of the enriched term

Term_Description

Description of the enriched term

Fold_Enrichment

Fold enrichment value for the enriched term

occurrence

the number of iterations that the given term was found to enriched over all iterations

support

the median support (proportion of active subnetworks leading to enrichment within an iteration) over all iterations

lowest_p

the lowest adjusted-p value of the given term over all iterations

highest_p

the highest adjusted-p value of the given term over all iterations

Up_regulated

the up-regulated genes in the input involved in the given term, comma-separated

Down_regulated

the down-regulated genes in the input involved in the given term, comma-separated

See Also

example_pathfindR_input for the RA differentially-expressed genes data frame example_pathfindR_output for the RA example pathfindR enrichment output example_pathfindR_output_clustered for the RA example pathfindR clustering output example_experiment_matrix for the RA differentially-expressed genes expression matrix run_pathfindR for details on the pathfindR enrichment analysis


Custom Gene Set Enrichment Results

Description

A data frame consisting of pathfindR enrichment analysis results on the example TF target genes data (target gene sets of CREB and MYC). Generated on 27 Apr 2024.

Usage

example_custom_genesets_result

Format

data frame containing 2 rows and 9 columns. Each row is a gene set (the TF target gene sets).


Example Experiment Matrix for pathfindR - Enriched Term Scoring

Description

A matrix containing the log2-transformed and quantile-normalized expression values of the differentially-expressed genes for 18 rheumatoid arthritis (RA) patients and 15 healthy subjects. The matrix contains expression values of 572 significantly differentially-expressed genes (see example_pathfindR_input) with adj.P.Val <= 0.05. Generated on 28 Sep 2019.

Usage

example_experiment_matrix

Format

A matrix with 572 rows and 33 columns.

Source

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15573

See Also

example_pathfindR_input for the RA differentially-expressed genes data frame example_pathfindR_output for the RA example pathfindR enrichment output score_terms for details on calculating agglomerated scores of enriched terms


Example Input for Mus musculus - Myeloma Analysis

Description

A dataset containing the differentially-expressed genes and adjusted p-values for the GEO dataset GSE99393. The RNA microarray experiment was perform to detail the global program of gene expression underlying polarization of myeloma-associated macrophages by CSF1R antibody treatment. The samples were 6 murine bone marrow derived macrophages co-cultured with myeloma cells (myeloma-associated macrophages), 3 of which were treated with CSF1R antibody (treatment group) and the rest were treated with control IgG antibody (control group). In this dataset, differentially-expressed genes with |logFC| >= 2 and FDR < 0.05 are presented. Generated on 1 Nov 2019.

Usage

example_mmu_input

Format

A data frame with 45 rows and 2 variables:

Gene_Symbol

MGI gene symbols of the differentially-expressed genes

FDR

adjusted p values, via the Benjamini & Hochberg (1995) method

Source

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99393

See Also

example_mmu_output for the example mmu enrichment output. run_pathfindR for details on the pathfindR enrichment analysis.


Example Output for Mus musculus - Myeloma Analysis

Description

A dataset containing the results of pathfindR's active-subnetwork-oriented enrichment workflow performed on the Mus musculus myeloma differential expression dataset example_mmu_input. Generated on 27 Apr 2024.

Usage

example_mmu_output

Format

A data frame with 34 rows and 9 columns:

ID

ID of the enriched term

Term_Description

Description of the enriched term

Fold_Enrichment

Fold enrichment value for the enriched term

occurrence

the number of iterations that the given term was found to enriched over all iterations

support

the median support (proportion of active subnetworks leading to enrichment within an iteration) over all iterations

lowest_p

the lowest adjusted-p value of the given term over all iterations

highest_p

the highest adjusted-p value of the given term over all iterations

Up_regulated

the up-regulated genes in the input involved in the given term, comma-separated

Down_regulated

the down-regulated genes in the input involved in the given term, comma-separated

See Also

example_mmu_input for the example mmu input. run_pathfindR for details on the pathfindR enrichment workflow.


Example Input for the pathfindR Enrichment Workflow - Rheumatoid Arthritis (H.sapiens)

Description

A dataset containing the differentially-expressed genes along with the associated log2(fold-change) values and FDR adjusted p-values for the GEO dataset GSE15573. This microarray dataset aimed to characterize gene expression profiles in the peripheral blood mononuclear cells of 18 rheumatoid arthritis (RA) patients versus 15 healthy subjects. Differentially-expressed genes with adj.P.Val < 0.05 are presented in this data frame. Generated on 1 Nov 2019.

Usage

example_pathfindR_input

Format

A data frame with 572 rows and 3 variables:

Gene.symbol

HGNC gene symbols of the differentially-expressed genes

logFC
log2

(fold-change) values

adj.P.Val

adjusted p values, via the Benjamini & Hochberg (1995) method

Source

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15573

See Also

example_pathfindR_output for the RA example pathfindR enrichment output example_pathfindR_output_clustered for the RA example pathfindR clustering output example_experiment_matrix for the RA differentially-expressed genes expression matrix run_pathfindR for details on the pathfindR enrichment analysis


Example Output for the pathfindR Enrichment Workflow - Rheumatoid Arthritis

Description

The data frame containing the results of pathfindR's active-subnetwork-oriented enrichment workflow performed on the rheumatoid arthritis differential-expression data frame example_pathfindR_input. Analysis via run_pathfindR was performed using the default settings. Generated on 27 Apr 2024.

Usage

example_pathfindR_output

Format

A data frame with 121 rows and 9 columns:

ID

ID of the enriched term

Term_Description

Description of the enriched term

Fold_Enrichment

Fold enrichment value for the enriched term

occurrence

the number of iterations that the given term was found to enriched over all iterations

support

the median support (proportion of active subnetworks leading to enrichment within an iteration) over all iterations

lowest_p

the lowest adjusted-p value of the given term over all iterations

highest_p

the highest adjusted-p value of the given term over all iterations

Up_regulated

the up-regulated genes in the input involved in the given term, comma-separated

Down_regulated

the down-regulated genes in the input involved in the given term, comma-separated

See Also

example_pathfindR_input for the RA differentially-expressed genes data frame example_pathfindR_output_clustered for the RA example pathfindR clustering outputs example_experiment_matrix for the RA differentially-expressed genes expression matrix run_pathfindR for details on the pathfindR enrichment analysis


Example Output for the pathfindR Clustering Workflow - Rheumatoid Arthritis

Description

A dataset containing the results of pathfindR's clustering and partitioning workflow performed on the rheumatoid arthritis enrichment results example_pathfindR_output. The clustering and partitioning function cluster_enriched_terms was used with the default settings (i.e. hierarchical clustering was performed and the agglomeration method was "average"). Generated on 27 Apr 2024.

Usage

example_pathfindR_output_clustered

Format

A data frame with 121 rows and 11 columns:

ID

ID of the enriched term

Term_Description

Description of the enriched term

Fold_Enrichment

Fold enrichment value for the enriched term

occurrence

the number of iterations that the given term was found to enriched over all iterations

support

the median support (proportion of active subnetworks leading to enrichment within an iteration) over all iterations

s

lowest_p

the lowest adjusted-p value of the given term over all iterations

highest_p

the highest adjusted-p value of the given term over all iterations

Up_regulated

the up-regulated genes in the input involved in the given term, comma-separated

Down_regulated

the down-regulated genes in the input involved in the given term, comma-separated

Cluster

the cluster to which the enriched term is assigned

Status

whether the enriched term is the "Representative" term in its cluster or only a "Member"

See Also

example_pathfindR_input for the RA differentially-expressed genes data frame example_experiment_matrix for the RA differentially-expressed genes expression matrix run_pathfindR for details on the pathfindR enrichment analysis example_pathfindR_output for the RA example pathfindR enrichment output cluster_enriched_terms for details on clustering methods


Gene Ontology - All Gene Ontology Gene Sets

Description

A list containing the genes involved in each GO ontology term. Each element is a vector of gene symbols located in the given gene set. Generated on 27 Apr 2024.

Usage

go_all_genes

Format

list containing 15450 vectors of gene symbols. Each vector corresponds to a GO gene set.


KEGG Pathways - Descriptions

Description

A named vector containing the descriptions for each Homo sapiens KEGG pathway. Names of the vector correspond to the KEGG ID of the pathway. Pathways that did not contain any genes were discarded. Generated on 27 Apr 2024.

Usage

kegg_descriptions

Format

named vector containing 358 character values, the descriptions for the given pathways.


KEGG Pathways - Gene Sets

Description

A list containing the genes involved in each Homo sapiens KEGG pathway. Each element is a vector of gene symbols located in the given pathway. Names correspond to the KEGG ID of the pathway. Pathways that did not contain any genes were discarded. Generated on 27 Apr 2024.

Usage

kegg_genes

Format

list containing 358 vectors of gene symbols. Each vector corresponds to a pathway.


Mus Musculus KEGG Pathways - Descriptions

Description

A named vector containing the descriptions for each Mus musculus KEGG pathway. Names of the vector correspond to the KEGG ID of the pathway. Pathways that did not contain any genes were discarded. Generated on 27 Apr 2024.

Usage

mmu_kegg_descriptions

Format

named vector containing 355 character values, the descriptions for the given pathways.


Mus Musculus KEGG Pathways - Gene Sets

Description

A list containing the genes involved in each Mus musculus KEGG pathway. Each element is a vector of gene symbols located in the given pathway. Names correspond to the KEGG ID of the pathway. Pathways that did not contain any genes were discarded. Generated on 27 Apr 2024.

Usage

mmu_kegg_genes

Format

list containing 355 vectors of gene symbols. Each vector corresponds to a pathway.


Table of Data for pathfindR

Description

Data frame containing all the data for pathfindR along with descriptions and last update dates.

Usage

pathfindR.data_updates

Format

A data frame with 30 rows and 6 columns:

Category

Category of the data

Name

Name of the data

Description

Description of the data

Source

Source of the data

Version

Version of the data (if applicable)

Last Update

Last update date


Reactome Pathways - Descriptions

Description

A named vector containing the descriptions for each human Reactome pathway. Names of the vector correspond to the Reactome ID of the pathway. Generated on 27 Apr 2024.

Usage

reactome_descriptions

Format

named vector containing 2681 character values, the descriptions for the given pathways.


Reactome Pathways - Gene Sets

Description

A list containing the genes involved in each human Reactome pathway. Each element is a vector of gene symbols located in the given pathway. Names correspond to the Reactome ID of the pathway. Generated on 27 Apr 2024.

Usage

reactome_genes

Format

list containing 2681 vectors of gene symbols. Each vector corresponds to a pathway.