--- title: "Customize Forest Plots and Tables" output: rmarkdown::html_vignette: highlight: pygments vignette: > %\VignetteIndexEntry{Customize Forest Plots and Tables} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 8.5, fig.height = 5.5 ) ``` ```{r setup} library(ggforestplotR) library(ggplot2) ``` This article focuses on utility and enhanced customization of forest plots and accompanied tables. ## Group rows and control strip placement `grouping` creates section panels, and `grouping_strip_position` controls which side gets the strip labels. ```{r grouping-right} coefs <- data.frame( term = c("Age", "BMI", "Smoking", "Stage II", "Stage III"), estimate = c(0.12, -0.10, 0.18, 0.30, 0.46), conf.low = c(0.03, -0.18, 0.04, 0.10, 0.18), conf.high = c(0.21, 0.02, 0.32, 0.50, 0.74), sample_size = c(120, 115, 98, 87, 83), p_value = c(0.04, 0.15, 0.29, 0.001, 0.075), section = c("Clinical", "Clinical", "Clinical", "Tumor", "Tumor") ) ggforestplot( coefs, grouping = "section", grouping_strip_position = "right", striped_rows = TRUE ) ``` ## Distinct variable separation Use `separate_groups` and `separate_lines` when you want a more distinct visual separation between variables. This is especially useful for categorical variables with many levels. `separate_groups` automatically appends the variable name to the level. ```{r separators} block_coefs <- data.frame( term = c("race_black", "race_white", "race_other", "age", "bmi"), label = c("Black", "White", "Other", "Age", "BMI"), estimate = c(0.24, 0.08, -0.04, 0.12, -0.09), conf.low = c(0.10, -0.04, -0.18, 0.03, -0.17), conf.high = c(0.38, 0.20, 0.10, 0.21, -0.01), variable_block = c("Race", "Race", "Race", "Age", "BMI") ) ggforestplot( block_coefs, label = "label", separate_groups = "variable_block", separate_lines = TRUE, striped_rows = TRUE ) + scale_y_discrete(limits = rev(c("BMI", "Age", "Race: White", "Race: Black", "Race: Other"))) ``` ## Add a side table `add_forest_table()` allows you to attach model information to the coefficient plot. The table can be added to either the left or right side and allows for some customization. You should **always** add the table **LAST**, after styling your plot because the function calls on `patchwork` internally. `patchwork` requires specific syntax to customize plots and is generally more difficult to get working correctly. You can choose which columns from your dataframe to include in the table using the `columns` argument, and can change the labels using `column_labels`. If some of the term labels need to be changed, use `term_labels` to assign them new values. Some of the column labels are automatically assigned if no value is provided. Notice how we are explicitly naming the *n* and *p.value* columns? This is necessary in most cases because aliases are not yet incorporated (but they will be...I promise I'm getting to it). ```{r left-side-table} ggforestplot( coefs, grouping = "section", grouping_strip_position = "right", n = "sample_size", p.value = "p_value", striped_rows = TRUE, term_labels = c("Smoking" = "Smoking status") ) + add_forest_table( columns = c("term", "sample_size", "estimate", "p_value"), column_labels = c("term" = "Variable", "sample_size" = "N", "estimate" = "Beta (95% CI)", "p_value" = "P-value") ) ``` ## Customize the table `add_forest_table` also lets you change some minor styling elements of the forest table. ```{r} ggforestplot( coefs, n = "sample_size", p.value = "p_value", striped_rows = TRUE ) + add_forest_table( position = "left", grid_lines = T, grid_line_linetype = 2, grid_line_colour = "red" ) ``` ## Split tables `add_split_table()` can be used to create more traditional looking forest plots. You can choose which summary information goes to which side. Like `add_forest_table()`, it should be added after any plot-level styling. Use the `estimate_fmt` argument to change how your estimates are displayed. You can also control digits via `estimate_digits` and `interval_digits`. ```{r split-table} ggforestplot( coefs, n = "sample_size", p.value = "p_value", striped_rows = TRUE ) + scale_x_continuous(limits = c(-.8,.8)) + add_split_table( left_columns = c("term","n"), right_columns = c("estimate","p"), column_labels = c("estimate" = "Beta [95% CI]"), estimate_fmt = "{estimate} [{conf.low}, {conf.high}]", estimate_digits = 2, interval_digits = 3 ) ``` ## Plotting other types of model coefficients You can use `exponentiate = TRUE` for models on the log-odds scale (or similar). ```{r logistic-regression-data} data(CO2) l1 <- glm(Treatment ~ conc + uptake + Type, family = binomial(link = "logit"), data = CO2) ``` ```{r logistic-regression, warning=FALSE} ggforestplot(l1, exponentiate = TRUE, striped_rows = T, term_labels = c("TypeMississippi" = "Mississippi")) + add_forest_table(position = "left", show_p = F) ``` We can do this for survival models as well. ```{r survival-analysis-data} lung <- survival::lung lung <- lung |> dplyr::mutate( status = dplyr::recode(status, `1` = 0, `2` = 1) ) s1 <- survival::coxph(Surv(time, status) ~ sex + age + ph.karno + pat.karno, data = lung) ``` ```{r survival-analysis-plot} ggforestplot(s1, exponentiate = T, striped_rows = T) + add_forest_table() ``` ## Compare multiple estimates The `group` argument is handy when comparing estimates from several models. ```{r comparison} comparison_coefs <- data.frame( term = rep(c("Age", "BMI", "Smoking", "Stage II", "Stage III"), 2), estimate = c(0.12, -0.10, 0.18, 0.30, 0.46, 0.08, -0.05, 0.24, 0.40, 0.58), conf.low = c(0.03, -0.18, 0.04, 0.10, 0.18, 0.00, -0.13, 0.10, 0.20, 0.30), conf.high = c(0.21, -0.02, 0.32, 0.50, 0.74, 0.16, 0.03, 0.38, 0.60, 0.86), model = rep(c("Model A", "Model B"), each = 5), section = rep(c("Clinical", "Clinical", "Clinical", "Tumor", "Tumor"), 2) ) ggforestplot( comparison_coefs, group = "model", grouping = "section", striped_rows = TRUE, dodge_width = 0.5, grouping_strip_position = "right" ) + theme(legend.position = "top") + scale_color_manual(values = c("#1F968BFF", "#453781FF")) + add_forest_table() ```