pangoling 1.0.3
- Internal changes to comply with CRAN requirements.
- HF_HOME is used now to store the models rather than
TRANSFORMERS_CACHE
pangoling 1.0.2
- Internal changes: OMP THREAD LIMIT was set to 1.
pangoling 1.0.1
New Features
- Added installed_py_pangoling()to check if required
Python dependencies (transformersandtorch)
are installed.
Other changes
- Informative startup message if python dependencies not
installed.
- Documentation examples won’t run if python dependencies not
installed
- Articles are now pre-computed vignettes. See
pangoling 1.0.0
- changed the ownership of the repo to ropensci
- deprecated functions are now defunct and have been replaced with
their respective alternative functions
pangoling 0.0.0.9011
- Added word_nargument incausal_words_pred()to indicate word order of the
texts.
- Allows for models with larger vocabulary than tokenizer.
pangoling 0.0.0.9010
New Features:
- Added checkpointparameter tocausal_preload()andmasked_preload()to allow
loading models from checkpoints.
- Introduced causal_next_tokens_pred_tbl(), which
replacescausal_next_tokens_tbl()and provides improved
predictability calculations.
- Added causal_words_pred(),causal_targets_pred(), andcausal_tokens_pred_lst()to compute predictability for
words, phrases, or tokens, replacingcausal_lp()andcausal_tokens_lp_tbl().
- Introduced masked_tokens_pred_tbl(), replacingmasked_tokens_tbl(), for retrieving possible tokens and
their log probabilities.
- Introduced masked_targets_pred(), replacingmasked_lp(), for calculating predictability based on left
and right context.
- Introduced transformer_vocab()with an optionaldecodeparameter to return decoded tokenized words.
- New dataset df_jaeger14: Self-paced
reading data on Chinese relative clauses.
- New dataset df_sent: Example dataset
with two word-by-word sentences.
- New vignette: Added a worked-out example of a
causal model.
Enhancements:
- Added separgument incausal_words_pred()to support languages without spaces between words (e.g., Chinese).
- New log.pargument across multiple functions to specify
how predictability is calculated (e.g., log base e, log base 2
for bits, or raw probabilities).
- Improved tokenization utilities: tokenize_lst()now
supports decoded outputs via thedecodeparameter.
- Updated install_py_pangoling()to enhance Python
environment handling.
- Added perplexity_calc()for computing perplexity from
probabilities.
Deprecations:
- Deprecated causal_next_tokens_tbl(),causal_lp(),causal_tokens_lp_tbl(), andcausal_lp_mats(). Usecausal_next_tokens_pred_tbl(),causal_targets_pred(),causal_words_pred(),
andcausal_pred_mats()instead.
- Deprecated masked_tokens_tbl()andmasked_lp(). Usemasked_tokens_pred_tbl()andmasked_targets_pred()instead.
pangoling 0.0.0.9009
- Deprecated .byin favor ofby.
pangoling 0.0.0.9008
- Fix a bug when .byis unordered
pangoling 0.0.0.9007
- set_cache_folder()function added.
- Message when the package loads.
- New troubleshooting vignette.
pangoling 0.0.0.9006
- causal_lpget a- l_contextsargument.
- Checkpoints work for causal models (not yet for masked models).
- Ropensci badge added.
pangoling 0.0.0.9005
- Strings with no tokens no longer throw errors.
- Requires correct version of R.
pangoling 0.0.0.9004
- Causal models accept batches.
pangoling 0.0.0.9003
- bug in causal_tokens_lp_tbl fixed
pangoling 0.0.0.9002
- minor function names to avoid conflict with other packages
pangoling 0.0.0.9001
- Tons of stuff. Fully functional package now.
pangoling 0.0.0.9000