Title: | Extreme Value Theory for Open Set Classification - GPD and GEV Classifiers |
Version: | 1.0 |
Description: | Two classifiers for open set recognition and novelty detection based on extreme value theory. The first classifier is based on the generalized Pareto distribution (GPD) and the second classifier is based on the generalized extreme value (GEV) distribution. For details, see Vignotto, E., & Engelke, S. (2018) <doi:10.48550/arXiv.1808.09902>. |
Depends: | R (≥ 3.4.0) |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 6.1.0.9000 |
Imports: | RANN, evd, fitdistrplus |
NeedsCompilation: | no |
Packaged: | 2018-11-07 10:28:50 UTC; vignotto |
Author: | Edoardo Vignotto |
Maintainer: | Edoardo Vignotto <edoardo.vignotto@unige.ch> |
Repository: | CRAN |
Date/Publication: | 2018-11-16 16:40:11 UTC |
Database of character image features.
Description
A dataset containing 16 features extracted from 20000 handwritten characters.
Usage
LETTER
Format
A data frame with 20000 rows and 17 variables:
- class
class labels
- V1
first extracted feature
- V2
second extracted feature
- V3
third extracted feature
- V4
4th extracted feature
- V5
5th extracted feature
- V6
6th extracted feature
- V7
7th extracted feature
- V8
8th extracted feature
- V9
9th extracted feature
- V10
10th extracted feature
- V11
11th extracted feature
- V12
12th extracted feature
- V13
13th extracted feature
- V14
14th extracted feature
- V15
15th extracted feature
- V16
16th extracted feature
Source
https://archive.ics.uci.edu/ml/datasets/letter+recognition/
GEV Classifier - testing
Description
This function is used to evaluate a test set for a pre-trained GEV classifier. It can be used to perform open set classification based on the generalized Pareto distribution.
Usage
gevcTest(train, test, pre, prob = TRUE, alpha)
Arguments
train |
a data matrix containing the train data. Class labels should not be included. |
test |
a data matrix containing the test data. |
pre |
a numeric vector of parameters obtained with the function |
prob |
logical indicating whether p-values should be returned. |
alpha |
threshold to be used if |
Details
For details on the method and parameters see Vignotto and Engelke (2018).
Value
If prob
is equal to TRUE
, a vector containing the p-values for each point is returned. A high p-value results in the classification of the corresponding test data as a known point, since this hypothesis cannot be rejected. If the p-value is small, the corresponding test data is classified as an unknown point. If prob
is equal to TRUE
, a vector of predicted values is returned.
Author(s)
Edoardo Vignotto
edoardo.vignotto@unige.ch
References
Vignotto, E., & Engelke, S. (2018). Extreme Value Theory for Open Set Classification-GPD and GEV Classifiers. arXiv preprint arXiv:1808.09902.
See Also
Examples
trainset <- LETTER[1:15000,]
testset <- LETTER[-(1:15000), -1]
knowns <- trainset[trainset$class==1, -1]
gevClassifier <- gevcTrain(train = knowns)
predicted <- gevcTest(train = knowns, test = testset, pre = gevClassifier)
GEV Classifier - training
Description
This function is used to train a GEV classifier. It can be used to perform open set classification based on the generalized extreme value distribution.
Usage
gevcTrain(train)
Arguments
train |
a data matrix containing the train data. Class labels should not be included. |
Details
For details on the method and parameters see Vignotto and Engelke (2018).
Value
A numeric vector of two elements containing the estimated parameters of the fitted reversed Weibull.
Note
Data are not scaled internally; any preprocessing has to be done externally.
Author(s)
Edoardo Vignotto
edoardo.vignotto@unige.ch
References
Vignotto, E., & Engelke, S. (2018). Extreme Value Theory for Open Set Classification - GPD and GEV Classifiers. arXiv preprint arXiv:1808.09902.
See Also
Examples
trainset <- LETTER[1:15000,]
knowns <- trainset[trainset$class==1, -1]
gevClassifier <- gevcTrain(train = knowns)
GPD Classifier - testing
Description
This function is used to evaluate a test set for a pre-trained GPD classifier. It can be used to perform open set classification based on the generalized Pareto distribution.
Usage
gpdcTest(train, test, pre, prob = TRUE, alpha = 0.01)
Arguments
train |
data matrix containing the train data. Class labels should not be included. |
test |
a data matrix containing the test data. |
pre |
a list obtained with the function |
prob |
logical indicating whether p-values should be returned. |
alpha |
threshold to be used if |
Details
For details on the method and parameters see Vignotto and Engelke (2018).
Value
If prob
is equal to TRUE
, a vector containing the p-values for each point is returned. A high p-value results in the classification of the corresponding test data as a known point, since this hypothesis cannot be rejected. If the p-value is small, the corresponding test data is classified as an unknown point. If prob
is equal to TRUE
, a vector of predicted values is returned.
Author(s)
Edoardo Vignotto
edoardo.vignotto@unige.ch
References
Vignotto, E., & Engelke, S. (2018). Extreme Value Theory for Open Set Classification-GPD and GEV Classifiers. arXiv preprint arXiv:1808.09902.
See Also
Examples
trainset <- LETTER[1:15000,]
testset <- LETTER[-(1:15000), -1]
knowns <- trainset[trainset$class==1, -1]
gpdClassifier <- gpdcTrain(train = knowns, k = 10)
predicted <- gpdcTest(train = knowns, test = testset, pre = gpdClassifier)
GPD Classifier - training
Description
This function is used to train a GPD classifier. It can be used to perform open set classification based on the generalized Pareto distribution.
Usage
gpdcTrain(train, k)
Arguments
train |
a data matrix containing the train data. Class labels should not be included. |
k |
the number of upper order statistics to be used. |
Details
For details on the method and parameters see Vignotto and Engelke (2018).
Value
A list of three elements.
pshapes |
the estimated rescaled shape parameters for each point in the training dataset. |
balls |
the estimated radius for each point in the training dataset. |
k |
the number of upper order statistics used. |
Note
Data are not scaled internally; any preprocessing has to be done externally.
Author(s)
Edoardo Vignotto
edoardo.vignotto@unige.ch
References
Vignotto, E., & Engelke, S. (2018). Extreme Value Theory for Open Set Classification-GPD and GEV Classifiers. arXiv preprint arXiv:1808.09902.
See Also
Examples
trainset <- LETTER[1:15000,]
knowns <- trainset[trainset$class==1, -1]
gpdClassifier <- gpdcTrain(train = knowns, k = 10)