| Title: | Estimate Percentiles from an Ordered Categorical Variable | 
| Version: | 1.0.5 | 
| Description: | An implementation of two functions that estimate values for percentiles from an ordered categorical variable as described by Reardon (2011, isbn:978-0-87154-372-1). One function estimates percentile differences from two percentiles while the other returns the values for every percentile from 1 to 100. | 
| Depends: | R (≥ 3.4.0) | 
| License: | MIT + file LICENSE | 
| URL: | https://cimentadaj.github.io/perccalc/, https://github.com/cimentadaj/perccalc | 
| Language: | en-US | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| RoxygenNote: | 7.0.1 | 
| Imports: | stats, tibble, multcomp | 
| Suggests: | magrittr, spelling, dplyr, knitr, rmarkdown, testthat, ggplot2, MASS, carData, tidyr (≥ 1.0.0), covr | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | no | 
| Packaged: | 2019-12-17 17:22:57 UTC; jorge | 
| Author: | Jorge Cimentada | 
| Maintainer: | Jorge Cimentada <cimentadaj@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2019-12-17 20:10:02 UTC | 
Calculate percentile differences from an ordered categorical variable and a continuous variable.
Description
Calculate percentile differences from an ordered categorical variable and a continuous variable.
Usage
perc_diff(
  data_model,
  categorical_var,
  continuous_var,
  weights = NULL,
  percentiles = c(90, 10)
)
perc_diff_df(
  data_model,
  categorical_var,
  continuous_var,
  weights = NULL,
  percentiles = c(90, 10)
)
Arguments
| data_model | A data frame with at least the categorical and continuous variables from which to estimate the percentile differences | 
| categorical_var | The bare unquoted name of the categorical variable. This variable SHOULD be an ordered factor. If not, will raise an error. | 
| continuous_var | The bare unquoted name of the continuous variable from which to estimate the percentiles | 
| weights | The bare unquoted name of the optional weight variable. If not specified, then estimation is done without weights | 
| percentiles | A numeric vector of two numbers specifying which percentiles to subtract | 
Details
perc_diff drops missing observations silently for calculating
the linear combination of coefficients.
Value
perc_diff returns a vector with the percentile difference and
its associated standard error. perc_diff_df returns the same but as
a data frame.
Examples
set.seed(23131)
N <- 1000
K <- 20
toy_data <- data.frame(id = 1:N,
                       score = rnorm(N, sd = 2),
                       type = rep(paste0("inc", 1:20), each = N/K),
                       wt = 1)
# perc_diff(toy_data, type, score)
# type is not an ordered factor!
toy_data$type <- factor(toy_data$type, levels = unique(toy_data$type), ordered = TRUE)
perc_diff(toy_data, type, score, percentiles = c(90, 10))
perc_diff(toy_data, type, score, percentiles = c(50, 10))
perc_diff(toy_data, type, score, weights = wt, percentiles = c(30, 10))
# Results as data frame
perc_diff_df(toy_data, type, score, weights = wt, percentiles = c(30, 10))
Calculate a distribution of percentiles from an ordered categorical variable and a continuous variable.
Description
Calculate a distribution of percentiles from an ordered categorical variable and a continuous variable.
Usage
perc_dist(data_model, categorical_var, continuous_var, weights = NULL)
Arguments
| data_model | A data frame with at least the categorical and continuous variables from which to estimate the percentiles | 
| categorical_var | The bare unquoted name of the categorical variable. This variable should be an ordered factor. If not, will raise an error. | 
| continuous_var | The bare unquoted name of the continuous variable from which to estimate the percentiles | 
| weights | The bare unquoted name of the optional weight variable. If not specified, then equal weights are assumed. | 
Details
perc_dist drops missing observations silently for calculating
the linear combination of coefficients.
Value
A data frame with the scores and standard errors for each percentile
Examples
set.seed(23131)
N <- 1000
K <- 20
toy_data <- data.frame(id = 1:N,
                       score = rnorm(N, sd = 2),
                       type = rep(paste0("inc", 1:20), each = N/K),
                       wt = 1)
# perc_diff(toy_data, type, score)
# type is not an ordered factor!
toy_data$type <- factor(toy_data$type, levels = unique(toy_data$type), ordered = TRUE)
perc_dist(toy_data, type, score)
Mathematics test scores of Spain, Germany and Estonia in the PISA 2006 test
Description
A dataset containing the test scores and other household information of students from Spain, Germany and Estonia from the PISA 2006 test.
Usage
pisa_2006
Format
A data frame with 25884 rows and 10 variables:
- year
- Year of the survey 
- CNT
- Long country names 
- STIDSTD
- Unique student id 
- father_edu
- The father's highest achieved degree in the ISCED scale 
- household_income
- The household's total income in categories 
- avg_math
- The average math test score out of the 5 plausible values in Mathematics 
Source
A subset extracted from the PISA2006lite R package, https://github.com/pbiecek/PISA2012lite
Mathematics test scores of Spain, Germany and Estonia in the PISA 2012 test
Description
A dataset containing the test scores and other household information of students from Spain, Germany and Estonia from the PISA 2012 test.
Usage
pisa_2012
Format
A data frame with 35093 rows and 10 variables:
- year
- Year of the survey 
- CNT
- Long country names 
- STIDSTD
- Unique student id 
- father_edu
- The father's highest achieved degree in the ISCED scale 
- household_income
- The household's total income in categories 
- avg_math
- The average math test score out of the 5 plausible values in Mathematics 
Source
A subset extracted from the PISA2012lite R package, https://github.com/pbiecek/PISA2012lite