| Type: | Package | 
| Title: | Categorical Data Analysis | 
| Version: | 0.1.4 | 
| Author: | Nick Williams | 
| Maintainer: | Nick Williams <ntwilliams.personal@gmail.com> | 
| Description: | Includes wrapper functions around existing functions for the analysis of categorical data and introduces functions for calculating risk differences and matched odds ratios. R currently supports a wide variety of tools for the analysis of categorical data. However, many functions are spread across a variety of packages with differing syntax and poor compatibility with each another. prop_test() combines the functions binom.test(), prop.test() and BinomCI() into one output. prop_power() allows for power and sample size calculations for both balanced and unbalanced designs. riskdiff() is used for calculating risk differences and matched_or() is used for calculating matched odds ratios. For further information on methods used that are not documented in other packages see Nathan Mantel and William Haenszel (1959) <doi:10.1093/jnci/22.4.719> and Alan Agresti (2002) <ISBN:0-471-36093-7>. | 
| License: | MIT + file LICENSE | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| Imports: | epitools, DescTools, cli, magrittr, Hmisc, broom, rlang | 
| RoxygenNote: | 6.1.1 | 
| Suggests: | testthat, dplyr, forcats | 
| NeedsCompilation: | no | 
| Packaged: | 2019-06-14 13:52:19 UTC; niw4001 | 
| Repository: | CRAN | 
| Date/Publication: | 2019-06-14 14:10:03 UTC | 
Pipe operator
Description
See magrittr::%>% for details.
Usage
lhs %>% rhs
Matched pairs odds ratio and confidence interval
Description
Create odds ratio and confidence interval from matched pairs data.
Usage
matched_or(df, ...)
Arguments
| df | a dataframe with binary variables x and y or a 2 x 2 frequency table/matrix. If a table or matrix, x and y must be NULL. Used to select method. | 
| ... | further arguments passed to or from other methods. | 
Details
The matched pairs odds ratio and confidence interval is the equivalent of calculating a Cochran-Mantel-Haenszel odds ratio where each pair is treated as a stratum.
Value
a list with class "matched_or" with the following components:
| tab | 2x2 table using for calculating risk difference | 
| or | dataframe with columns corresponding to matched-pairs OR, lower bound, and upper bound of CI | 
| conf.level | specified confidence level | 
Examples
set.seed(1)
gene <- data.frame(pair = seq(1:35),
                   ulcer = rbinom(35, 1, .7),
                   healthy = rbinom(35, 1, .4))
matched_or(gene, ulcer, healthy)
Matched pairs odds ratio from a data frame
Description
Create odds ratio and confidence interval from matched pairs data.
Usage
## S3 method for class 'data.frame'
matched_or(df, x, y, weight = NULL, alpha = 0.05,
  rev = c("neither", "rows", "columns", "both"), ...)
Arguments
| df | a dataframe with binary variables x and y. | 
| x | binary vector, used as rows for frequency table and calculations. | 
| y | binary vector, used as columns for frequency table and calculations. | 
| weight | an optional vector of count weights. | 
| alpha | level of significance for confidence interval. | 
| rev | reverse order of cells. Options are "row", "columns", "both", and "neither" (default). | 
| ... | further arguments passed to or from other methods. | 
Value
a list with class "matched_or" with the following components:
| tab | 2x2 table using for calculating risk difference | 
| or | dataframe with columns corresponding to matched-pairs OR, lower bound, and upper bound of CI | 
| conf.level | specified confidence level | 
Examples
gene <- data.frame(pair = seq(1:35),
                   ulcer = rbinom(35, 1, .7),
                   healthy = rbinom(35, 1, .4))
matched_or(gene, ulcer, healthy)
Matched pairs odds ratio from a table
Description
Create odds ratio and confidence interval from matched pairs data.
Usage
## S3 method for class 'table'
matched_or(df, alpha = 0.05, rev = c("neither", "rows",
  "columns", "both"), ...)
Arguments
| df | a dataframe with binary variables x and y or a 2 x 2 frequency table/matrix. | 
| alpha | level of significance for confidence interval. | 
| rev | reverse order of cells. Options are "row", "columns", "both", and "neither" (default). | 
| ... | further arguments passed to or from other methods. | 
Value
a list with class "matched_or" with the following components:
| tab | 2x2 table using for calculating risk difference | 
| or | dataframe with columns corresponding to matched-pairs OR, lower bound, and upper bound of CI | 
| conf.level | specified confidence level | 
Examples
gene <- data.frame(pair = seq(1:35),
                   ulcer = rbinom(35, 1, .7),
                   healthy = rbinom(35, 1, .4))
gene_tab <- xtabs(~ ulcer + healthy, data = gene)
gene_tab %>% matched_or()
Power and sample size for 2 proportions
Description
Calculate power and sample size for comparison of 2 proportions for both balanced and unbalanced designs.
Usage
prop_power(n, n1, n2, p1, p2, fraction = 0.5, alpha = 0.05,
  power = NULL, alternative = c("two.sided", "one.sided"), odds.ratio,
  percent.reduction, ...)
Arguments
| n | total sample size. | 
| n1 | sample size in group 1. | 
| n2 | sample size in group 2. | 
| p1 | group 1 proportion. | 
| p2 | group 2 proportion. | 
| fraction | fraction of total observations that are in group 1. | 
| alpha | significance level/type 1 error rate. | 
| power | desired power, between 0 and 1. | 
| alternative | alternative hypothesis, one- or two-sided test. | 
| odds.ratio | odds ratio comparing p2 to p2. | 
| percent.reduction | percent reduction of p1 to p2. | 
| ... | further arguments passed to or from other methods. | 
Details
Power calculations are done using the methods described in 'stats::power.prop.test', 'Hmisc::bsamsize', and 'Hmisc::bpower'.
Value
a list with class "prop_power" containing the following components:
| n | the total sample size | 
| n1 | the sample size in group 1 | 
| n2 | the sample size in group 2 | 
| p1 | the proportion in group 1 | 
| p2 | the proportion in group 2 | 
| power | calculated or desired power | 
| sig.level | level of significance | 
See Also
[stats::power.prop.test], [Hmisc::bsamsize], [Hmisc:bpower]
Examples
prop_power(n = 220, p1 = 0.35, p2 = 0.2)
prop_power(p1 = 0.35, p2 = 0.2, fraction = 2/3, power = 0.85)
prop_power(p1 = 0.35, n = 220, percent.reduction = 42.857)
prop_power(p1 = 0.35, n = 220, odds.ratio = 0.4642857)
Tests for equality of proportions
Description
Conduct 1-sample tests of proportions and tests for equality of k proportions.
Usage
prop_test(x, ...)
Arguments
| x | a vector of counts, a one-dimensional table with two entries, or a two-dimensional table with 2 columns. Used to select method. | 
| ... | further arguments passed to or from other methods. | 
Details
Calculations are done using the methods described in 'stats::binom.test()' and 'stats::prop.test()'
Value
a list with class "prop_test" containing the following components:
| x | number of successes | 
| n | number of trials | 
| p | null proportion | 
| statistic | the value of Pearson's chi-squared test statistic | 
| p_value | p-value corresponding to chi-squared test statistic | 
| df | degrees of freedom | 
| method | the method used to calculate the confidence interval | 
| method_ci | confidence interval calculated using specified method | 
| exact_ci | exact confidence interval | 
| exact_p | p-value from exact test | 
See Also
[stats::binom.test()], [stats::prop.test()]
Examples
prop_test(7, 50, method = "wald", p = 0.2)
prop_test(7, 50, method = "wald", p = 0.2, exact = TRUE)
prop_test(c(23, 24), c(50, 55))
vietnam <- data.frame(
   service = c(rep("yes", 2), rep("no", 2)),
   sleep = c(rep(c("yes", "no"), 2)),
   count = c(173, 160, 599, 851)
)
sleep <- xtabs(count ~ service + sleep, data = vietnam)
prop_test(sleep)
prop_test(vietnam, service, sleep, count)
Tests for equality of proportions
Description
Conduct 1-sample tests of proportions and tests for equality of k proportions.
Usage
## S3 method for class 'data.frame'
prop_test(x, pred, out, weight = NULL,
  rev = c("neither", "rows", "columns", "both"), method = c("wald",
  "wilson", "agresti-couli", "jeffreys", "modified wilson", "wilsoncc",
  "modified jeffreys", "clopper-pearson", "arcsine", "logit", "witting",
  "pratt"), alternative = c("two.sided", "less", "greater"),
  conf.level = 0.95, correct = FALSE, exact = FALSE, ...)
Arguments
| x | a dataframe with categorical variable  | 
| pred | predictor/exposure, vector. | 
| out | outcome, vector. | 
| weight | an optional vector of count weights. | 
| rev | reverse order of cells. Options are "row", "columns", "both", and "neither" (default). | 
| method | a character string indicating method for calculating confidence interval, default is "wald". Options include, wald, wilson, agresti-couli, jeffreys, modified wilson, wilsoncc modified jeffreys, clopper-pearson, arcsine, logit, witting, and pratt. | 
| alternative | character string specifying the alternative hypothesis. Possible options are "two.sided" (default), "greater", or "less". | 
| conf.level | confidence level for confidence interval, default is 0.95. | 
| correct | a logical indicating whether Yate's continuity correction should be applied. | 
| exact | a logical indicating whether to output exact p-value, ignored if k-sample test. | 
| ... | further arguments passed to or from other methods. | 
Value
a list with class "prop_test" containing the following components:
| x | number of successes | 
| n | number of trials | 
| p | null proportion | 
| statistic | the value of Pearson's chi-squared test statistic | 
| p_value | p-value corresponding to chi-squared test statistic | 
| df | degrees of freedom | 
| method | the method used to calculate the confidence interval | 
| method_ci | confidence interval calculated using specified method | 
| exact_ci | exact confidence interval | 
| exact_p | p-value from exact test | 
Examples
vietnam <- data.frame(
   service = c(rep("yes", 2), rep("no", 2)),
   sleep = c(rep(c("yes", "no"), 2)),
   count = c(173, 160, 599, 851)
)
prop_test(vietnam, service, sleep, count)
Tests for equality of proportions
Description
Conduct 1-sample tests of proportions and tests for equality of k proportions.
Usage
## S3 method for class 'matrix'
prop_test(x, method = c("wald", "wilson",
  "agresti-couli", "jeffreys", "modified wilson", "wilsoncc",
  "modified jeffreys", "clopper-pearson", "arcsine", "logit", "witting",
  "pratt"), alternative = c("two.sided", "less", "greater"),
  conf.level = 0.95, correct = FALSE, exact = FALSE, ...)
Arguments
| x | a 2 x k matrix. | 
| method | a character string indicating method for calculating confidence interval, default is "wald". Options include, wald, wilson, agresti-couli, jeffreys, modified wilson, wilsoncc modified jeffreys, clopper-pearson, arcsine, logit, witting, and pratt. | 
| alternative | character string specifying the alternative hypothesis. Possible options are "two.sided" (default), "greater", or "less". | 
| conf.level | confidence level for confidence interval, default is 0.95. | 
| correct | a logical indicating whether Yate's continuity correction should be applied. | 
| exact | a logical indicating whether to output exact p-value, ignored if k-sample test. | 
| ... | further arguments passed to or from other methods. | 
Value
a list with class "prop_test" containing the following components:
| x | number of successes | 
| n | number of trials | 
| p | null proportion | 
| statistic | the value of Pearson's chi-squared test statistic | 
| p_value | p-value corresponding to chi-squared test statistic | 
| df | degrees of freedom | 
| method | the method used to calculate the confidence interval | 
| method_ci | confidence interval calculated using specified method | 
| exact_ci | exact confidence interval | 
| exact_p | p-value from exact test | 
Examples
matrix(c(23, 48, 76, 88), nrow = 2, ncol = 2) %>% prop_test()
Tests for equality of proportions
Description
Conduct 1-sample tests of proportions and tests for equality of k proportions.
Usage
## S3 method for class 'numeric'
prop_test(x, n, p = 0.5, method = c("wald", "wilson",
  "agresti-couli", "jeffreys", "modified wilson", "wilsoncc",
  "modified jeffreys", "clopper-pearson", "arcsine", "logit", "witting",
  "pratt"), alternative = c("two.sided", "less", "greater"),
  conf.level = 0.95, correct = FALSE, exact = FALSE, ...)
Arguments
| x | a vector of counts. | 
| n | a vector of counts of trials | 
| p | a probability for the null hypothesis when testing a single proportion; ignored if comparing multiple proportions. | 
| method | a character string indicating method for calculating confidence interval, default is "wald". Options include, wald, wilson, agresti-couli, jeffreys, modified wilson, wilsoncc modified jeffreys, clopper-pearson, arcsine, logit, witting, and pratt. | 
| alternative | character string specifying the alternative hypothesis. Possible options are "two.sided" (default), "greater", or "less". | 
| conf.level | confidence level for confidence interval, default is 0.95. | 
| correct | a logical indicating whether Yate's continuity correction should be applied. | 
| exact | a logical indicating whether to output exact p-value, ignored if k-sample test. | 
| ... | further arguments passed to or from other methods. | 
Value
a list with class "prop_test" containing the following components:
| x | number of successes | 
| n | number of trials | 
| p | null proportion | 
| statistic | the value of Pearson's chi-squared test statistic | 
| p_value | p-value corresponding to chi-squared test statistic | 
| df | degrees of freedom | 
| method | the method used to calculate the confidence interval | 
| method_ci | confidence interval calculated using specified method | 
| exact_ci | exact confidence interval | 
| exact_p | p-value from exact test | 
Examples
prop_test(7, 50, method = "wald", p = 0.2)
prop_test(7, 50, method = "wald", p = 0.2, exact = TRUE)
Tests for equality of proportions
Description
Conduct 1-sample tests of proportions and tests for equality of k proportions.
Usage
## S3 method for class 'table'
prop_test(x, method = c("wald", "wilson",
  "agresti-couli", "jeffreys", "modified wilson", "wilsoncc",
  "modified jeffreys", "clopper-pearson", "arcsine", "logit", "witting",
  "pratt"), alternative = c("two.sided", "less", "greater"),
  conf.level = 0.95, correct = FALSE, exact = FALSE, ...)
Arguments
| x | a 2 x k table. | 
| method | a character string indicating method for calculating confidence interval, default is "wald". Options include, wald, wilson, agresti-couli, jeffreys, modified wilson, wilsoncc modified jeffreys, clopper-pearson, arcsine, logit, witting, and pratt. | 
| alternative | character string specifying the alternative hypothesis. Possible options are "two.sided" (default), "greater", or "less". | 
| conf.level | confidence level for confidence interval, default is 0.95. | 
| correct | a logical indicating whether Yate's continuity correction should be applied. | 
| exact | a logical indicating whether to output exact p-value, ignored if k-sample test. | 
| ... | further arguments passed to or from other methods. | 
Value
a list with class "prop_test" containing the following components:
| x | number of successes | 
| n | number of trials | 
| p | null proportion | 
| statistic | the value of Pearson's chi-squared test statistic | 
| p_value | p-value corresponding to chi-squared test statistic | 
| df | degrees of freedom | 
| method | the method used to calculate the confidence interval | 
| method_ci | confidence interval calculated using specified method | 
| exact_ci | exact confidence interval | 
| exact_p | p-value from exact test | 
Examples
vietnam <- data.frame(
     service = c(rep("yes", 2), rep("no", 2), rep("maybe", 2)),
     sleep = rep(c("yes", "no"), 3),
     count = c(173, 160, 599, 851, 400, 212)
)
xtabs(count ~ service + sleep, data = vietnam) %>% prop_test()
Risk difference
Description
Calculate risk difference and 95 percent confidence interval using Wald method.
Usage
riskdiff(df, ...)
Arguments
| df | a dataframe with binary variables x and y or a 2 x 2 frequency table/matrix. If a table or matrix, x and y must be NULL. Used to select method. | 
| ... | further arguments passed to or from other methods. | 
Value
a list with class "rdiff" containing the following components:
| rd | risk difference | 
| conf.level | specified confidence level | 
| ci | calculated confidence interval | 
| p1 | proportion one | 
| p2 | proportion two | 
| tab | 2x2 table using for calculating risk difference | 
Examples
trial <- data.frame(
  disease = c(rep("yes", 2), rep("no", 2)),
  treatment = c(rep(c("estrogen", "placebo"), 2)),
  count = c(751, 623, 7755, 7479))
riskdiff(trial, treatment, disease, count, rev = "columns")
Risk difference
Description
Calculate risk difference and 95 percent confidence interval using Wald method.
Usage
## S3 method for class 'data.frame'
riskdiff(df, x = NULL, y = NULL, weight = NULL,
  conf.level = 0.95, rev = c("neither", "rows", "columns", "both"),
  ...)
Arguments
| df | a dataframe with binary variables x and y. | 
| x | binary predictor/exposure, vector. | 
| y | binary outcome, vector. | 
| weight | an optional vector of count weights. | 
| conf.level | confidence level for confidence interval, default is 0.95. | 
| rev | reverse order of cells. Options are "row", "columns", "both", and "neither" (default). | 
| ... | further arguments passed to or from other methods. | 
Value
a list with class "rdiff" containing the following components:
| rd | risk difference | 
| conf.level | specified confidence level | 
| ci | calculated confidence interval | 
| p1 | proportion one | 
| p2 | proportion two | 
| tab | 2x2 table using for calculating risk difference | 
Examples
trial <- data.frame(
  disease = c(rep("yes", 2), rep("no", 2)),
  treatment = c(rep(c("estrogen", "placebo"), 2)),
  count = c(751, 623, 7755, 7479))
riskdiff(trial, treatment, disease, count, rev = "columns")
Risk difference
Description
Calculate risk difference and 95 percent confidence interval using Wald method.
Usage
## S3 method for class 'matrix'
riskdiff(df, conf.level = 0.95, dnn = NULL,
  rev = c("neither", "rows", "columns", "both"), ...)
Arguments
| df | a 2 x 2 frequency matrix. | 
| conf.level | confidence level for confidence interval, default is 0.95. | 
| dnn | optional character vector of dimension names. | 
| rev | reverse order of cells. Options are "row", "columns", "both", and "neither" (default). | 
| ... | further arguments passed to or from other methods. | 
Value
a list with class "rdiff" containing the following components:
| rd | risk difference | 
| conf.level | specified confidence level | 
| ci | calculated confidence interval | 
| p1 | proportion one | 
| p2 | proportion two | 
| tab | 2x2 table using for calculating risk difference | 
Examples
matrix(c(12, 45, 69, 15), nrow = 2, ncol = 2) %>%
   riskdiff(dnn = c("New Drug", "Adverse Outcome"))
Risk difference
Description
Calculate risk difference and 95 percent confidence interval using Wald method.
Usage
## S3 method for class 'table'
riskdiff(df, conf.level = 0.95, rev = c("neither",
  "rows", "columns", "both"), ...)
Arguments
| df | a 2 x 2 frequency table. | 
| conf.level | confidence level for confidence interval, default is 0.95. | 
| rev | reverse order of cells. Options are "row", "columns", "both", and "neither" (default). | 
| ... | further arguments passed to or from other methods. | 
Value
a list with class "rdiff" containing the following components:
| rd | risk difference | 
| conf.level | specified confidence level | 
| ci | calculated confidence interval | 
| p1 | proportion one | 
| p2 | proportion two | 
| tab | 2x2 table using for calculating risk difference | 
Examples
trial <- data.frame(
  disease = c(rep("yes", 2), rep("no", 2)),
  treatment = c(rep(c("estrogen", "placebo"), 2)),
  count = c(751, 623, 7755, 7479))
xtabs(count ~ treatment + disease, data = trial) %>% riskdiff()
Create 2 x k frequency tables
Description
Helper function for creating 2 x k frequency tables.
Usage
tavolo(df, ...)
Arguments
| df | a dataframe with binary variable y and categorical variable x or a 2 x k frequency table/matrix. If a table or matrix, x and y must be NULL. Used to select method. | 
| ... | further arguments passed to or from other methods. | 
Value
| tab | 2 x k frequency table | 
Examples
trial <- data.frame(disease = c(rep("yes", 2), rep("no", 2)),
                    treatment = c(rep(c("estrogen", "placebo"), 2)),
                    count = c(751, 623, 7755, 7479))
tavolo(trial, treatment, disease, count)
Create 2 x k frequency tables
Description
Helper function for creating 2 x k frequency tables.
Usage
## S3 method for class 'data.frame'
tavolo(df, x, y, weight = NULL, rev = c("neither",
  "rows", "columns", "both"), ...)
Arguments
| df | a dataframe with binary variable y and categorical variable x. | 
| x | categorical predictor/exposure, vector. | 
| y | binary outcome, vector. | 
| weight | an optional vector of count weights. | 
| rev | character string indicating whether to switch row or column order, possible options are "neither", "rows", "columns", or "both". The default is "neither". | 
| ... | further arguments passed to or from other methods. | 
Value
| tab | 2 x k frequency table | 
Examples
trial <- data.frame(disease = c(rep("yes", 2), rep("no", 2)),
                    treatment = c(rep(c("estrogen", "placebo"), 2)),
                    count = c(751, 623, 7755, 7479))
tavolo(trial, treatment, disease, count)
Create 2 x k frequency tables
Description
Helper function for creating 2 x k frequency tables.
Usage
## S3 method for class 'matrix'
tavolo(df, dnn = NULL, rev = c("neither", "rows",
  "columns", "both"), ...)
Arguments
| df | a 2 x k frequency matrix. | 
| dnn | optional character vector of dimension names. | 
| rev | character string indicating whether to switch row or column order, possible options are "neither", "rows", "columns", or "both". The default is "neither". | 
| ... | further arguments passed to or from other methods. | 
Value
| tab | 2 x k frequency table | 
Examples
tavolo(matrix(c(23, 45, 67, 12), nrow = 2, ncol = 2), rev = "both")
Create 2 x k frequency tables
Description
Helper function for creating 2 x k frequency tables.
Usage
## S3 method for class 'table'
tavolo(df, rev = c("neither", "rows", "columns", "both"),
  ...)
Arguments
| df | a 2 x k frequency table. | 
| rev | character string indicating whether to switch row or column order, possible options are "neither", "rows", "columns", or "both". The default is "neither". | 
| ... | further arguments passed to or from other methods. | 
Value
| tab | 2 x k frequency table | 
Examples
trial <- data.frame(disease = c(rep("yes", 3), rep("no", 3)),
                    treatment = rep(c("estrogen", "placebo", "other"), 2),
                    count = c(751, 623, 7755, 7479, 9000, 456))
xtabs(count ~ treatment + disease, data = trial) %>% tavolo(rev = "columns")