Package: AhoCorasickTrie
Type: Package
Title: Fast Searching for Multiple Keywords in Multiple Texts
Version: 0.1.2
Authors@R: c(person("Matt", "Chambers", email="matt.chambers42@gmail.com", role=c("aut", "cre")),
    person("Tomas", "Petricek", role=c("aut", "cph")),
    person("Vanderbilt University", role="cph"))
Description: Aho-Corasick is an optimal algorithm for finding many
    keywords in a text. It can locate all matches in a text in O(N+M) time; i.e.,
    the time needed scales linearly with the number of keywords (N) and the size of
    the text (M). Compare this to the naive approach which takes O(N*M) time to loop
    through each pattern and scan for it in the text. This implementation builds the
    trie (the generic name of the data structure) and runs the search in a single
    function call. If you want to search multiple texts with the same trie, the
    function will take a list or vector of texts and return a list of matches to
    each text. By default, all 128 ASCII characters are allowed in both the keywords
    and the text. A more efficient trie is possible if the alphabet size can be
    reduced. For example, DNA sequences use at most 19 distinct characters and
    usually only 4; protein sequences use at most 26 distinct characters and usually
    only 20. UTF-8 (Unicode) matching is not currently supported.
License: Apache License 2.0
URL: https://github.com/chambm/AhoCorasickTrie
BugReports: https://github.com/chambm/AhoCorasickTrie/issues
Encoding: UTF-8
LazyData: true
Imports: Rcpp (>= 0.12.5)
LinkingTo: Rcpp
Suggests: Biostrings, microbenchmark, testthat
SystemRequirements: C++11
RoxygenNote: 7.1.1
NeedsCompilation: yes
Packaged: 2020-09-29 14:29:43 UTC; Matthew
Author: Matt Chambers [aut, cre],
  Tomas Petricek [aut, cph],
  Vanderbilt University [cph]
Maintainer: Matt Chambers <matt.chambers42@gmail.com>
Repository: CRAN
Date/Publication: 2020-09-29 15:10:05 UTC
Built: R 4.0.2; x86_64-apple-darwin17.0; 2020-09-30 10:23:44 UTC; unix
Archs: AhoCorasickTrie.so.dSYM
