| NEWS | R Documentation | 
News for Package 'tm.plugin.koRpus'
Changes in tm.plugin.koRpus version 0.4-2 (2021-05-17)
fixed
- updated test standards after changes to koRpus' internal calculations of numer of lines in texts imported from TIF data frames 
changed
- kRp.corpus: replaced - prototype()in class definition with initialize method
Changes in tm.plugin.koRpus version 0.4-1 (2020-12-17)
fixed
-  docTermMatrix(): results were wrong because numbers were assigned to wrong columns; now fixed in koRpus
- unit tests failed on windows due to an UTF-8 issue 
changed
- the nested object class kRp.hierarchy was replaced by kRp.corpus; instead of reproducing the file hierarchy in the object structure, kRp.corpus has a flat structure with all texts in one single data frame; this data frame was also renamed from - "TT.res"into- "tokens"the class name kRp.corpus was used in tm.plugin.koRpus before and is just being recycled ;) kRp.corpus inherits from class kRp.text as defined in the koRpus package
- status messages are currently only shown when only one CPU is used 
-  corpusTagged(): now calledtaggedText()as in koRpus
-  corpusDesc(): now calleddescribe()as in koRpus
- [, [<-, [[ and [[<- methods no longer apply to the summary data frame but tokens slot as in koRpus (where it applies to the TT.res slot) 
-  show(): kRp.corpus objects now list all available features
-  read.corp.custom(): removed unused mc.cores argument
-  docTermMatrix(): by default behaves like most other methods and adds its result to the input object rather than returning just the matrix; also, the generic is now defined by the koRpus package and was removed, including all of the actual function code
- adjusted unit tests and vignette 
- updated all examples to use a new sample corpus (see added), to the benefit that many "\dontrun{}" cases could be removed 
added
-  readCorpus(): the hierarchy levels of a text corpus can now be assumed directly from the directory structure by setting "hierarchy=TRUE"
-  corpusHasFeatures(),corpusHasFeatures()<-,corpusFeatures(),corpusFeatures()<-,corpusHierarchy(),corpusHierarchy()<-,corpusCorpFreq(),corpusCorpFreq()<-,diffText(),diffText()<-,originalText(): new getter/setter methods for kRp.corpus objects
-  split_by_doc_id(): new method transforms a kRp.corpus object into a list of kRp.text objects
-  corpusDocTermMatrix(): new method to get/set the sparse document term matrix in kRp.corpus objects
- [[/[[<-: gained new argument - "doc_id"to limit the scope to particular documents
-  describe()/describe()<-: now support filtering by doc_id
- new sample corpus for use in examples 
removed
- removed all classes and methods dealing with kRp.hierarchy 
- removed deprecated methods of the pre-kRp.hierarchy era 
- removed generic of - tif_as_tokens_df()as it was moved to the koRpus package
Changes in tm.plugin.koRpus version 0.3-1 (2019-05-14)
fixed
-  readCorpus(): solved a cryptic warning when more than one text was tokenized
added
-  docTermMatrix(): new method to generate document-term matrices, either with absolute frequencies or tf-idf values
-  query(): new method, extending the generic of koRpus >= 0.12-1
-  filterByClass(): new method, extending the generic of koRpus >= 0.12-1
-  jumbleWords(): new method, extending the generic of koRpus >= 0.12-1
-  clozeDelete(): new method, extending the generic of koRpus >= 0.12-1
-  cTest(): new method, extending the generic of koRpus >= 0.12-1
-  textTransform(): new method, extending the generic of koRpus >= 0.12-1
-  show(): new method for objects of class kRp.hierarchy
changed
- depends on koRpus >= 0.12-1 now 
- depends on the Matrix package now (for - docTermMatrix())
- adjusted test standards to include the additional POS tags from koRpus >= 0.12-1 
Changes in tm.plugin.koRpus version 0.02-2 (2019-01-18)
fixed
-  readCorpus(),kRpSource(): added missing imports from packages tm, NLP and parallel
-  readCorpus(): fixed status message formatting
-  corpusTm(): removed useless"level"argument and corrected the output
-  readCorpus(): removed unused"level"argument
-  corpusFiles(): now also works with flat hierarchy objects
added
-  readCorpus(): can now also import data frames in TIF format, including support for hierarchal categories
-  tif_as_corpus_df(): new S4 method to transform a kRp.hierarchy object into a TIF compliant data frame
changed
-  readCorpus(): the tm corpora now include full hierarchy metadata
- removed pre-hierarchy portions from internal function - whatIsAvailable()
Changes in tm.plugin.koRpus version 0.02-1 (2018-07-29)
changed
- vignette: also includes info on - readCorpus()
- tests: adjusted test standards to new object class 
added
- kRp.hierarchy: new S4 class to replace kRp.sourcesCorpus and kRp.topicCorpus to allow more generic nesting of hierarchical levels 
-  readCorpus(): new function to generate kRp.hierarchy objects recursively
- many corpus*() getter functions can now filter by hierarchy level or category ID 
- removed all code regarding - simpleCorpus(),- sourcesCorpus()and- topicCorpus(), their object classes and methods; this is all handled much more flexible by kRp.hierarchy and- readCorpus()now
Changes in tm.plugin.koRpus version 0.01-4 (2018-03-07)
fixed
-  sourcesCorpus(): speak of"text"instead of"texts"if it's only one
changed
- adjusted package to support koRpus >= 0.11 and sylly, especially with regards to - summary(),- hyphen(), and new class contructors
-  summary(): for more coherence with the koRpus package the"text"column in the summary slot was renamed into"doc_id"
- reaktanz.de supports HTTPS now, updated references 
- vignette is now in RMarkdown/HTML format; the SWeave/PDF version was dropped 
-  hyphen()/lex.div()/readability(): 'quiet' is now TRUE by default
-  lex.div(): 'char' is now an emtpy string by default; computing all characteristics was not a useful default for large text corpora
added
- README.md 
- new [, [<-, [[ and [[<- methods added for corpus object classes 
- new methods - tif_as_tokens_df()to export corpus objects as a single data.frame in fully TIF compliant format
-  summary(): now also includes the total number of stopwords (if available)
- new class object contructors - kRp_corpus(),- kRp_sourcesCorpus(), and- kRp_topicCorpus()can be used instead of new(- "kRp.corpus", ...) etc.
Changes in tm.plugin.koRpus version 0.01-3 (2016-07-12)
fixed
- the arguments that - simpleCorpus()was supposed to pipe to- DirSource()weren't used
changed
- the - "paths"argument of- topicCorpus()now expects a list, not a vector
- using the parallel package to be able to use more CPU cores 
added
- new argument - "format"for- simpleCorpus(),- sourceCorpus(), and- topicCorpus(), to be able to work with text objects directly, instead of files
Changes in tm.plugin.koRpus version 0.01-2 (2015-07-08)
changed
- using the S4 methods of koRpus 0.06-1 now, therefore renamed all methods removing the *.corpus suffix (e.g., - lex.div.corpus()is now- lex.div())
- renamed classes into kRp.corpus, kRp.sourcesCorpus and kRp.topicCorpus, and their generator functions accordingly 
added
- new methods - read.corp.custom(),- freq.analysis()and- summary()
- new getter/setter methods: - corpusSources(),- corpusTopics(),- corpusFreq(),- corpusSummary()
- first basic unit tests, using the testthat package 
- new option - "summary"for- lex.div()and- readability(), to automatically update the summary data.frames
- first notes in a vignette 
Changes in tm.plugin.koRpus version 0.01-1 (2015-06-29)
added
- initial release