| Type: | Package | 
| Title: | Principal Component BiSulfite | 
| Version: | 0.1.1 | 
| Description: | A system for fast, accurate, and flexible whole genome bisulfite sequencing (WGBS) data analysis of two-condition comparisons. Principal Component BiSulfite, 'PCBS', assigns methylated loci eigenvector values from the treatment-delineating principal component in lieu of running millions of pairwise statistical tests, which dramatically increases analysis flexibility and reduces computational requirements. Methods: https://katlande.github.io/PCBS/articles/Differential_Methylation.html. | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| Depends: | R (≥ 2.10) | 
| URL: | https://github.com/katlande/PCBS | 
| BugReports: | https://github.com/katlande/PCBS/issues | 
| Imports: | ggplot2, tibble, ggrepel, dplyr, data.table | 
| NeedsCompilation: | no | 
| Packaged: | 2024-08-27 16:54:50 UTC; kathrynlande | 
| Author: | Kathryn Lande [aut, cre, cph] | 
| Maintainer: | Kathryn Lande <kathryn.lande@mail.mcgill.ca> | 
| Repository: | CRAN | 
| Date/Publication: | 2024-08-27 18:00:02 UTC | 
Check if DMR calling seed number is ovcompressed.
Description
Identifies if seed number to use for DMR calling causes overcompression.
Usage
CheckOvercompression(ranks, CpG_cutoff, values, max.dmr.size, return.plot)
Arguments
| ranks | Rank data frame from getPCRanks. | 
| CpG_cutoff | NULL or numeric. If NULL, seed numbers tested will be input of the values argument. If numeric, seed numbers tested will be CpG_cutoff*values argument. Recommended to us rankDist estimate if not null | 
| values | Numeric vector, either seed numbers to test if CpG_cutoff=NULL or multipliers if CpG_cutoff is numeric | 
| max.dmr.size | Automatic=5000. Maximum DMR expansion size in downstream analysis. Note: pipeline is optimized for 5000bp max DMR size, it is not recommended to play with this value. | 
| return.plot | T/F, whether to return a plot or a numeric representing the best seed number for downstream analysis | 
Value
If return.plot=T, a grob plotting input seed number vs. compressed seed number is returned. Otherwise, a numeric is returned containing the largest tested input value without detectable overcompression.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
CheckOvercompression(ranks, 980)
Identify your best principle component.
Description
Defines the best principle component to use for downstream analysis.
Usage
DefineBestPC(mat, IDs, filter_thresh, return.plot)
Arguments
| mat | Bismark2Matrix.R output file, or data frame object | 
| IDs | A character vector of IDs containing the common names for compared conditions. E.g., for samples trt1, trt2 vs. ctl1, ctl2, IDs=c("trt", "ctl") | 
| filter_thresh | A coverage threshold for filtering, where CpG coverage of all samples must be larger than this value | 
| return.plot | T/F, whether to return a PCA plot or a numeric representing the best principle component for downstream analysis | 
Value
If return.plot=T, a grob plotting a PCA of percent methylation of all samples is returned. Otherwise, a numeric representing the best principal component to use for PCBS analysis is returned.
Examples
DefineBestPC(eigen, IDs = c("trt", "ctl"))
Call DMRs from WGBS data.
Description
DMR Calling.
Usage
Get_Novel_DMRs(ranks, nSeeds, chromDictObj, DMR_resolution, 
QueryLimit, minCpGs, minZ, perms)
Arguments
| ranks | Rank data frame from getPCRanks. | 
| nSeeds | Integer, number of input seeds for DMR expansion. | 
| chromDictObj | chromDict() output. If null, chromDict() is run internally. | 
| DMR_resolution | Automatic=NULL. Integer, number of bases to increase the DMR by with each expansion. If NULL, QueryLimit/25. | 
| QueryLimit | Automatic=5000. Maximum DMR expansion size (bp) | 
| minCpGs | Automatic=15. Minimum CpGs in a DMR region, regions with fewer CpGs will be discarded. | 
| minZ | Automatic=1. Absolute Z score threshold for DMR calling; internal value. Not recommended to play with this setting. | 
| perms | Automatic=1000. Number of permutations to use when defining the null distribution. Increasing this value largely influences computational time with minimal return | 
Value
Returns a data.frame of all novel DMRs.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
DMRs <- Get_Novel_DMRs(ranks, 2940, minCpGs=10)
Add a mean methylation column to a data.frame of regions.
Description
Using a chromDictMeth() output object, quickly calculate the mean methylation difference across set of regions.
Usage
MethyDiff_Set(chromDictMeth, regions)
Arguments
| chromDictMeth | chromDictMeth() output object | 
| regions | 
 | 
Value
Returns the regions object with a mean percent methylation column.
Examples
chromDictMethylDiff <- chromDictMeth(eigen, c("trt", "ctl"))
regions <- data.frame(chr=c("chr3", "chr3", "chr1"),
                      s=c(4920450, 3961576, 300000),
                      e=c(4923267, 3963805, 302900),
                      ID=c("Hypo-DMR", "partial Hyper-DMR", "random"))
MethyDiff_Set(chromDictMethylDiff, regions)
Get the mean methylation difference across a specified region.
Description
Using a chromDictMeth() output object, quickly calculate the mean methylation difference across a user-specified region.
Usage
MethylDiff(chromDictMeth, chrom, start, end)
Arguments
| chromDictMeth | chromDictMeth() output object | 
| chrom | 
 | 
| start | 
 | 
| end | 
 | 
Value
Returns a list of data.tables for each chromosome, for faster analysis.
Examples
chromDictMethylDiff <- chromDictMeth(eigen, c("trt", "ctl"))
MethylDiff(chromDictMethylDiff, "chr3", 4920450, 4923267)
PCBS ggplot theme.
Description
Custom theme for ggplot used by all PCBS output figures.
Usage
Ol_Reliable()
Value
Theme for ggplot objects used by PCBS.
Examples
df <- data.frame(A=c(1,2,3), B=c(1,2,3))
ggplot2::ggplot(df, ggplot2::aes(x=A, y=B))+
ggplot2::geom_point()+
Ol_Reliable()
Add ranks to eigenvector scores.
Description
Defines the best principle component to use for downstream analysis.
Usage
addRanks(ranks)
Arguments
| ranks | getPCRanks output data frame. | 
Value
The input data.frame with rank order and absolute rank order columns.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"))
ranks <- addRanks(ranks)
Check rank cut-off values manually.
Description
Plots a score vs. rank plot with a manually chosen rank cut-off for manual k selection.
Usage
checkRank(ranks, cutoff)
Arguments
| ranks | getPCRanks output data frame | 
| cutoff | integer, rank value to check | 
Value
Returns a grob plotting the input cutoff on a plot of absolute eigenvector score vs. absolute rank order.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
test_50 <- checkRank(ranks, 50) # set cut-off to 50
test_500 <- checkRank(ranks, 500) # set cut-off to 500
Convert a rank object into a chromDict.
Description
Internal to many functions; creates a chromDict for faster computing times. chromDict can be run separately to speed up functions run iteratively. A chromDict is a list of chromosome-specific data.tables generated from ranks.
Usage
chromDict(ranks)
Arguments
| ranks | getPCRanks output data frame | 
Value
Returns a list of data.tables for each chromosome, for faster analysis. Used internall by many PCBS functions.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
chromDictObj <- chromDict(ranks)
Create a chromDict of percent methylation difference at all sites.
Description
chromDicts are lists of keyed data.tables that enable very fast computing times. Much like the primary PCBS function chromDict() which makes a chromDict of all locus ranks, the chromDictMeth() function makes is a list of chromosome-specific data.tables containing percent methylation differences at all loci..
Usage
chromDictMeth(mat, IDs, filter_thresh)
Arguments
| mat | 
 | 
| IDs | 
 | 
| filter_thresh | 
 | 
Value
Returns a list of data.tables for each chromosome, for faster analysis.
Examples
chromDictMethylDiff <- chromDictMeth(eigen, c("trt", "ctl"))
Simulated WGBS data for PCBS vignettes
Description
simulated WGBS data with added DMRs and DMLs
Usage
eigen
Format
A data frame with 50000 observations on the following 13 variables.
- cpgID
- a character vector, chrom:locus 
- trt1_PercMeth
- a numeric vector, percent methylated of sample trt1 
- trt1_nCpG
- a numeric vector, depth of sample trt1 
- trt2_PercMeth
- a numeric vector, percent methylated of sample trt2 
- trt2_nCpG
- a numeric vector, depth of sample trt2 
- trt3_PercMeth
- a numeric vector, percent methylated of sample trt3 
- trt3_nCpG
- a numeric vector, depth of sample trt3 
- ctl1_PercMeth
- a numeric vector, percent methylated of sample ctl1 
- ctl1_nCpG
- a numeric vector, depth of sample ctl1 
- ctl2_PercMeth
- a numeric vector, percent methylated of sample ctl2 
- ctl2_nCpG
- a numeric vector, depth of sample ctl2 
- ctl3_PercMeth
- a numeric vector, percent methylated of sample ctl3 
- ctl3_nCpG
- a numeric vector, depth of sample ctl3 
Source
generated through simulation
Get CpG eigenvector scores from a principle component.
Description
Returns eigenvector scores for input CpG sites.
Usage
getPCRanks(mat, IDs, PC, filter_thresh)
Arguments
| mat | Bismark2Matrix.R output file, or data frame object | 
| IDs | A character vector of IDs containing the common names for compared conditions. E.g., for samples trt1 & trt2 vs. ctl1 & ctl2, IDs=c("trt1", "ctl") | 
| PC | Integer, which principle component to use. Use to DefineBestPC if unsure. | 
| filter_thresh | Integer, a coverage threshold for filtering, where CpG coverage of all samples must be larger than this value. | 
Value
Returns a data.frame of eigenvector scores for all loci.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
Calculated methylation significance in a set of regions.
Description
Returns p-values and Z-scores for CpGs in a set of regions, compared to a local null background distribution.
Usage
getRegionScores(ranks, regions,  chromDictObj)
Arguments
| ranks | getPCRanks output data.frame, only necessary if chromDictObj=NULL | 
| regions | A three-column dataframe containing a set of regions to test. Columns = chrom, start, end. | 
| chromDictObj | chromDict() output object, recommended input instead of ranks. | 
Value
Returns a data.frame with significance scores for all input regions.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
DMLs <- addRanks(ranks)
# data.frame of regions to test:
regions <- data.frame(chr=c("chr3", "chr3", "chr1"),
                      s=c(4920450, 3961576, 300000),
                      e=c(4923267, 3963805, 302900),
                      ID=c("Hypo-DMR", "partial Hyper-DMR", "random"))
                
getRegionScores(DMLs, regions)
Nested DMR calling function within Get_Novel_DMRs()
Description
Expands DMRs from collapsed seeds iteratively.
Usage
get_all_DMRs(chromDictObj, seeds, res=40, max.dmr.size=3000, min.dmr.cpgs=10,
min.absZscore, null)
Arguments
| chromDictObj | chromDict() output. | 
| seeds | compressed seeds | 
| res | Get_Novel_DMRs arg DMR_resolution | 
| max.dmr.size | Get_Novel_DMRs arg QueryLimit | 
| min.dmr.cpgs | Get_Novel_DMRs arg minCpGs | 
| min.absZscore | Get_Novel_DMRs arg minZ | 
| null | null distribution | 
Value
Returns a list with two indicies, representing intermediate DMR calls within the Get_Novel_DMRs() function and a list of background regions..
Find y value of linear regression given x.
Description
Find y value of linear regression given x.
Usage
lin(x, l)
Arguments
| x | x coordinate | 
| l | linear model of lm() | 
Value
Returns a column numeric representing the y coordinate at the input x of the linear model l.
PC-Intersect nested function.
Description
Finds the intersection point of two linear regression models, lm().
Usage
lmIntx(fit1, fit2, rnd=2)
Arguments
| fit1 | lm() model 1 | 
| fit2 | lm() model 2 | 
| rnd | number of significant figures | 
Value
Returns a 2 column data.frame of one row, containing the x and y coordinates of the intersection point of the input models.
Make a metagene from mean percent methylation differences.
Description
Uses mean binned percent methylation differences across a set of regions to draw a metagene.
Usage
methylDiff_metagene(chromDictMethObj, regions, bin, title,
xaxis, yaxis, return.data, linecol, value)
Arguments
| chromDictMethObj | chromDictMeth() output object | 
| regions | A three-column data.frame containing a set of regions to test. Columns = chrom, start, end. | 
| bin | integer, number of bins to use in metagenes. Default=100. | 
| title | Output plot title | 
| xaxis | Output plot x-axis title | 
| yaxis | Output plot y-axis title | 
| return.data | T/F, whether to return a plot, or data that can be run with plot_metagene() or multiple_metagenes(). | 
| linecol | Colour for line, auto="red" | 
| value | Name of the plotted metric in chromDictMethObj. Only needs to be set explicity for abnormal use cases where chromDictMethObj contains a non-rank value output by chromDict(). | 
Value
If return.data=F, returns a grob containing a metagene plot. Otherwise, returns a list of two data.frames containing metagene and metagene standard error plotting information.
Examples
chromDictMethylDiff <- chromDictMeth(eigen, c("trt", "ctl"))
regions <- data.frame(chr=c("chr3", "chr3", "chr1"),
                      s=c(4920450, 3961576, 300000),
                      e=c(4923267, 3963805, 302900),
                      ID=c("Hypo-DMR", "partial Hyper-DMR", "random"))
 
methylDiff_metagene(chromDictMethylDiff, regions)
                     
Plot multiple metagene data objects on a single plot.
Description
Plots multiple metagene object using the raw data generated by score_metagene().
Usage
multiple_metagenes(data_list, set_names, title, xaxis, yaxis, legend.title, col, se_alpha)
Arguments
| data_list | List of score_metagene() raw data output | 
| set_names | Character vector of names for score_metagene() object | 
| title | Output plot title | 
| xaxis | Output plot x-axis title | 
| yaxis | Output plot y-axis title | 
| legend.title | T/F, whether to show legend title | 
| col | Vector of colours to use for lines | 
| se_alpha | 0-1, alpha value for standard error shading | 
Value
Returns a grob containing a plot of the input metagene data.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
DMRs <- Get_Novel_DMRs(ranks, 2940, minCpGs=10)
# Select all significantly hypomethylated DMRs:
hypo_DMRs <- DMRs[DMRs$FDR <= 0.05 & DMRs$DMR_Zscore < 0,] 
# Select all significantly hypermethylated DMRs:
hyper_DMRs <- DMRs[DMRs$FDR <= 0.05 & DMRs$DMR_Zscore > 0,]
# select chrom, start, and end of all hyper DMRs
regions_hypo <- hypo_DMRs[c(1:3)] 
regions_hyper <- hyper_DMRs[c(1:3)] 
# return.data = T returns raw data instead of a plot:
hyper_metagene <- score_metagene(ranks, regions_hyper, return.data = TRUE)
hypo_metagene <- score_metagene(ranks, regions_hypo, return.data = TRUE)
# The multiple_metagenes function plots multiple metagenes 
# using a list of raw data objects from score_metagene().
multiple_metagenes(data_list = list(hyper_metagene, hypo_metagene),
set_names = c("Hyper DMRs", "Hypo DMRs"),
title="Metagenes of DMR Regions", legend.title = FALSE)
Nested DMR calling function within within get_all_DMRs(), Get_Novel_DMRs()
Description
Expands one DMR from a single point.
Usage
oneSeed(chroms, seed, resolution, max.size, mincpgs, null_list, Zlim=1)
Arguments
| chroms | chromDict() output. | 
| seed | seed value input | 
| resolution | Get_Novel_DMRs arg DMR_resolution | 
| max.size | Get_Novel_DMRs arg QueryLimit | 
| mincpgs | Get_Novel_DMRs arg minCpGs | 
| null_list | get_all_DMRs arg null | 
| Zlim | Get_Novel_DMRs arg minZ | 
Value
returns a list of two indices, containing a data.frame with the output from a single compressed seed expansion and a data.frame of locus information around the seed expansion, intended for use within the Get_Novel_DMRs() function.
Generate a metagene plot from raw metagene data.
Description
Plots a metagene object using the raw data generated by score_metagene().
Usage
plot_metagene(data, title, xaxis, yaxis, linecol)
Arguments
| data | list, score_metagene() raw data output | 
| title | Output plot title | 
| xaxis | Output plot x-axis title | 
| yaxis | Output plot y-axis title | 
| linecol | Colour for line, auto="red" | 
Value
Returns a grob containing a plot of the input metagene data.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
DMRs <- Get_Novel_DMRs(ranks, 2940, minCpGs=10)
# Select all significantly hypomethylated DMRs:
hypo_DMRs <- DMRs[DMRs$FDR <= 0.05 & DMRs$DMR_Zscore < 0,] 
# select chrom, start, and end of all hyper DMRs
regions_hypo <- hypo_DMRs[c(1:3)] 
# return.data = T returns raw data instead of a plot:
hypo_metagene <- score_metagene(ranks, regions_hypo, return.data = TRUE)
plot_metagene(hypo_metagene)
Identify the best rank cut-off for significant CpGs.
Description
Automated rank cut-off estimator for input CpGs.
Usage
rankDist(ranks, draw_intersects, noise_perc, mode, return.plot)
Arguments
| ranks | getPCRanks output data frame. | 
| draw_intersects | T/F whether to draw intersect lines if return.plot=T | 
| noise_perc | Automatic=0.5, numeric between 0 and 1. Fraction of ranks to use to model the background noise. Not recommended to play with this value. Increasing/decreasing returns a looser/stricter threshold, respectively. | 
| mode | "intersect" or "strict", determine cut-off with "intersect" or "strict" method. "Strict" is recommended for sets with lower variability | 
| return.plot | T/F, whether to return a plot or a numeric | 
Value
If return.plot=T, a grob plotting the estimated cutoff on a plot of absolute eigenvector score vs. absolute rank order is returned. Otherwise, a numeric of the estimated cut-off is returned.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
rankDist(ranks, mode="intersect")
Make a metagene from PC Scores.
Description
Uses mean binned PC scores across a set of regions to draw a metagene.
Usage
score_metagene(ranks, regions, bin, title, xaxis, yaxis, 
chromDictObj, return.data, linecol)
Arguments
| ranks | getPCRanks output data.frame | 
| regions | A three-column data.frame containing a set of regions to test. Columns = chrom, start, end. | 
| bin | integer, number of bins to use in metagenes. Default=100. | 
| title | Output plot title | 
| xaxis | Output plot x-axis title | 
| yaxis | Output plot y-axis title | 
| chromDictObj | Optional chromDictObject made from chromDict(), runs internally if set to NULL (default). Scripts that run this function multiple times will be sped up by setting this option. | 
| return.data | T/F, whether to return a plot, or data that can be run with plot_metagene() or multiple_metagenes(). | 
| linecol | Colour for line, auto="red" | 
Value
If return.data=F, returns a grob containing a metagene plot. Otherwise, returns a list of two data.frames containing metagene and metagene standard error plotting information.
Examples
ranks <- getPCRanks(eigen, IDs = c("trt", "ctl"), PC = 1)
DMRs <- Get_Novel_DMRs(ranks, 2940, minCpGs=10)
# select chrom, start, and end of all hyper DMRs:
hyper_DMRs <- DMRs[DMRs$FDR <= 0.05 & DMRs$DMR_Zscore > 0,]
regions_hyper <- hyper_DMRs[c(1:3)] 
score_metagene(ranks, regions_hyper)
Standard error of a vector.
Description
Takes the standard error of a vector.
Usage
se(x)
Arguments
| x | numeric vector. | 
Value
Returns a numeric, representing the standard error of the input vector.
Examples
x <- sample(1:100, 20)
se(x)
Component of PCBS ggplot theme.
Description
Wrapper to title x-axis text in ggplot objects.
Usage
tilt()
Value
Theme for ggplot objects that cleanly rotates x-axis text.
Nested DMR calling function within within Get_Novel_DMRs()
Description
Trims the edges off of DMR expansions.
Usage
trimDMR(df, region, min.dmr.cpgs, max.dmr.size, null_summary, null_values)
Arguments
| df | DMR expansion output dataframe. | 
| region | get_all_DMRs() region output | 
| min.dmr.cpgs | Get_Novel_DMRs arg minCpGs | 
| max.dmr.size | Get_Novel_DMRs arg QueryLimit | 
| null_summary | null distribution conrtainer | 
| null_values | null distribution conrtainer | 
Value
Returns a data.frame of all trimmed DMRs for use within the Get_Novel_DMRs() function.