Type: | Package |
Title: | Extract Text from Microsoft Word Documents |
Version: | 1.3.4 |
Description: | Wraps the 'AntiWord' utility to extract text from Microsoft Word documents. The utility only supports the old 'doc' format, not the new xml based 'docx' format. Use the 'xml2' package to read the latter. |
Imports: | sys (≥ 2.0) |
URL: | https://docs.ropensci.org/antiword/, https://ropensci.r-universe.dev/antiword |
BugReports: | https://github.com/ropensci/antiword/issues |
License: | GPL-2 |
Encoding: | UTF-8 |
NeedsCompilation: | yes |
Packaged: | 2024-10-03 14:12:08 UTC; jeroen |
Author: | Jeroen Ooms |
Maintainer: | Jeroen Ooms <jeroenooms@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-10-04 13:20:02 UTC |
Antiword
Description
Wraps the antiword utility. Takes a path to an word file and returns text from the document.
Usage
antiword(file = NULL, format = FALSE)
Arguments
file |
path or url to your word file |
format |
format the output text (-f parameter) |
Examples
text <- antiword("https://jeroen.github.io/files/UDHR-english.doc")
cat(text)