Type: | Package |
Title: | Frequent Contiguous Sequential Pattern Mining of Text |
Version: | 0.1.2 |
Author: | Anantha Janakiraman |
Maintainer: | Anantha Janakiraman <anantharaman.j@gmail.com> |
Description: | Mines contiguous sequential patterns in text. |
Depends: | R (≥ 3.1.0) |
Imports: | NLP, tm, utils |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 6.0.1 |
NeedsCompilation: | no |
Packaged: | 2018-07-26 23:54:37 UTC; anantha |
Repository: | CRAN |
Date/Publication: | 2018-07-27 04:10:03 UTC |
Mining Frequent Contiguous Sequential Patterns in a Text Corpus
Description
Takes in the filepath and minimum support and performs pattern mining
Usage
CSeqpat(filepath, phraselenmin = 1, phraselenmax = 99999, minsupport = 1,
docdelim, stopword = FALSE, stemword = FALSE, lower = FALSE,
removepunc = FALSE)
Arguments
filepath |
Path to the text file/text corpus |
phraselenmin |
Minimum number of words in a phrase |
phraselenmax |
Maximum number of words in a phrase |
minsupport |
Minimum absolute support for mining the patterns |
docdelim |
Document delimiter in the corpus |
stopword |
Remove stopwords from the document corpus (boolean) |
stemword |
Perform stemming on the document corpus (boolean) |
lower |
Lower case all words in document corpus (boolean) |
removepunc |
Remove punctuations from document corpus (boolean) |
Value
A dataframe containing the frequent phrase patterns with their absolute support
Examples
test1 <- c("hoagie institution food year road ",
"place little dated opened weekend fresh food")
tf <- tempfile()
writeLines(test1, tf)
CSeqpat(tf,1,2,2,"\t",TRUE,FALSE,TRUE,FALSE)