| Type: | Package | 
| Title: | Tidy Verbs for Dealing with Genomic Data Frames | 
| Version: | 0.1.2 | 
| Description: | Handle genomic data within data frames just as you would with 'GRanges'. This packages provides method to deal with genomic intervals the "tidy-way" which makes it simpler to integrate in the the general data munging process. The API is inspired by the popular 'bedtools' and the genome_join() method from the 'fuzzyjoin' package. | 
| URL: | https://github.com/const-ae/tidygenomics | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| Imports: | dplyr, rlang, purrr, tidyr, fuzzyjoin (≥ 0.1.3), IRanges, Rcpp | 
| Suggests: | testthat, knitr, rmarkdown | 
| RoxygenNote: | 6.1.1 | 
| LinkingTo: | Rcpp | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | yes | 
| Packaged: | 2019-08-08 11:42:50 UTC; ahlmanne | 
| Author: | Constantin Ahlmann-Eltze | 
| Maintainer: | Constantin Ahlmann-Eltze <artjom31415@googlemail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2019-08-08 11:50:02 UTC | 
Cluster ranges which are implemented as 2 equal-length numeric vectors.
Description
Cluster ranges which are implemented as 2 equal-length numeric vectors.
Usage
cluster_interval(starts, ends, max_distance = 0L)
Arguments
| starts | A numeric vector that defines the starts of each interval | 
| ends | A numeric vector that defines the ends of each interval | 
| max_distance | The maximum distance up to which intervals are still considered to be the same cluster. Default: 0. | 
Examples
starts <- c(50, 100, 120)
ends <- c(75, 130, 150)
j <- cluster_interval(starts, ends)
j == c(0,1,1)
Intersect data frames based on chromosome, start and end.
Description
Intersect data frames based on chromosome, start and end.
Usage
genome_cluster(x, by = NULL, max_distance = 0,
  cluster_column_name = "cluster_id")
Arguments
| x | A dataframe. | 
| by | A character vector with 3 entries which are the chromosome, start and end column.
For example:  | 
| max_distance | The maximum distance up to which intervals are still considered to be the same cluster. Default: 0. | 
| cluster_column_name | A string that is used as the new column name | 
Value
The dataframe with the additional column of the cluster
Examples
library(dplyr)
x1 <- data.frame(id = 1:4, bla=letters[1:4],
                 chromosome = c("chr1", "chr1", "chr2", "chr1"),
                 start = c(100, 120, 300, 260),
                 end = c(150, 250, 350, 450))
genome_cluster(x1, by=c("chromosome", "start", "end"))
genome_cluster(x1, by=c("chromosome", "start", "end"), max_distance=10)
Calculates the complement to the intervals covered by the intervals in
a data frame. It can optionally take a chromosome_size data frame
that contains 2 or 3 columns, the first the names of chromosome and in case
there are 2 columns the size or first the start index and lastly the end index
on the chromosome.
Description
Calculates the complement to the intervals covered by the intervals in
a data frame. It can optionally take a chromosome_size data frame
that contains 2 or 3 columns, the first the names of chromosome and in case
there are 2 columns the size or first the start index and lastly the end index
on the chromosome.
Usage
genome_complement(x, chromosome_size = NULL, by = NULL)
Arguments
| x | A data frame for which the complement is calculated | 
| chromosome_size | A dataframe with at least 2 columns that contains
first the chromosome name and then the size of that chromosome. Can be NULL
in which case the largest value per chromosome from  | 
| by | A character vector with 3 entries which are the chromosome, start and end column.
For example:  | 
Examples
library(dplyr)
x1 <- data.frame(id = 1:4, bla=letters[1:4],
                 chromosome = c("chr1", "chr1", "chr2", "chr1"),
                 start = c(100, 200, 300, 400),
                 end = c(150, 250, 350, 450))
genome_complement(x1, by=c("chromosome", "start", "end"))
Intersect data frames based on chromosome, start and end.
Description
Intersect data frames based on chromosome, start and end.
Usage
genome_intersect(x, y, by = NULL, mode = "both")
Arguments
| x | A dataframe. | 
| y | A dataframe. | 
| by | A character vector with 3 entries which are used to match the chromosome, start and end column.
For example:  | 
| mode | One of "both", "left", "right" or "anti". | 
Value
The intersected dataframe of x and y with the new boundaries.
Examples
library(dplyr)
x1 <- data.frame(id = 1:4, bla=letters[1:4],
                 chromosome = c("chr1", "chr1", "chr2", "chr2"),
                 start = c(100, 200, 300, 400),
                 end = c(150, 250, 350, 450))
x2 <- data.frame(id = 1:4, BLA=LETTERS[1:4],
                 chromosome = c("chr1", "chr2", "chr2", "chr1"),
                 start = c(140, 210, 400, 300),
                 end = c(160, 240, 415, 320))
j <- genome_intersect(x1, x2, by=c("chromosome", "start", "end"), mode="both")
print(j)
Join intervals on chromosomes in data frames, to the closest partner
Description
Join intervals on chromosomes in data frames, to the closest partner
Usage
genome_join_closest(x, y, by = NULL, mode = "inner",
  distance_column_name = NULL, max_distance = Inf, select = "all")
genome_inner_join_closest(x, y, by = NULL, ...)
genome_left_join_closest(x, y, by = NULL, ...)
genome_right_join_closest(x, y, by = NULL, ...)
genome_full_join_closest(x, y, by = NULL, ...)
genome_semi_join_closest(x, y, by = NULL, ...)
genome_anti_join_closest(x, y, by = NULL, ...)
Arguments
| x | A dataframe. | 
| y | A dataframe. | 
| by | A character vector with 3 entries which are used to match the chromosome, start and end column.
For example:  | 
| mode | One of "inner", "full", "left", "right", "semi" or "anti". | 
| distance_column_name | A string that is used as the new column name with the distance.
If  | 
| max_distance | The maximum distance that is allowed to join 2 entries. | 
| select | A string that is passed on to  | 
| ... | Additional arguments parsed on to genome_join_closest. | 
Value
The joined dataframe of x and y.
Examples
library(dplyr)
x1 <- data.frame(id = 1:4, bla=letters[1:4],
                 chromosome = c("chr1", "chr1", "chr2", "chr2"),
                 start = c(100, 200, 300, 400),
                 end = c(150, 250, 350, 450))
x2 <- data.frame(id = 1:4, BLA=LETTERS[1:4],
                 chromosome = c("chr1", "chr2", "chr2", "chr1"),
                 start = c(140, 210, 400, 300),
                 end = c(160, 240, 415, 320))
j <- genome_intersect(x1, x2, by=c("chromosome", "start", "end"), mode="both")
print(j)
Subtract one data frame from another based on chromosome, start and end.
Description
Subtract one data frame from another based on chromosome, start and end.
Usage
genome_subtract(x, y, by = NULL)
Arguments
| x | A dataframe. | 
| y | A dataframe. | 
| by | A character vector with 3 entries which are used to match the chromosome, start and end column.
For example:  | 
Value
The subtracted dataframe of x and y with the new boundaries.
Examples
library(dplyr)
x1 <- data.frame(id = 1:4, bla=letters[1:4],
                 chromosome = c("chr1", "chr1", "chr2", "chr1"),
                 start = c(100, 200, 300, 400),
                 end = c(150, 250, 350, 450))
x2 <- data.frame(id = 1:4, BLA=LETTERS[1:4],
                 chromosome = c("chr1", "chr2", "chr1", "chr1"),
                 start = c(120, 210, 300, 400),
                 end = c(125, 240, 320, 415))
j <- genome_subtract(x1, x2, by=c("chromosome", "start", "end"))
print(j)