--- title: "Introduction to nisrarr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to nisrarr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) can_plot <- requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("scales", quietly = TRUE) has_prettyunits <- requireNamespace("prettyunits", quietly = TRUE) ``` # Overview of the NISRA Data Portal API This package interacts with the [NISRA Data Portal](https://data.nisra.gov.uk/), which allows users to access data produced by the Northern Ireland Statistics and Research Agency (NISRA) in a variety of machine-readable formats, and to interactively query, plot, or map the data. The NISRA Data Portal is built on [PxStat](https://github.com/CSOIreland/PxStat), a dissemination system developed by the Central Statistics Office (CSO) of Ireland. Guidance on the NISRA Data Portal is available [here](https://data.nisra.gov.uk/guide.html). ## Searching for data We can search for data using `nisra_search()`, which gives information on available datasets such as when it was last updated or what variables are in the data. ```{r search} library(nisrarr) x <- nisra_search() head(x) ``` If we don't know the exact name of the dataset we're interested in, we can search using a keyword that appears in the label or a set of variables that we need: ```{r search-keyword} nisra_search(keyword = "employ") nisra_search(variables = "Free School Meal Entitlement") ``` ## Fetching data We can use `nisra_read_dataset()` with the dataset code we found above to request the dataset from the API and convert it to a tibble. Every dataset will have a `Statistic` column and a `value` column, a column for the time period, and any other variables included in the breakdown: ```{r read-data, eval=can_plot} mye <- nisra_read_dataset("MYE01T04") head(mye) library(dplyr) library(ggplot2) mye <- mye |> filter( `Broad age band (4 cat)` == "Age 65+", Sex %in% c("Females", "Males") ) |> mutate(Year = as.numeric(Year)) ggplot(mye, aes(Year, value, colour = Sex)) + geom_line() + scale_y_continuous(labels = scales::label_comma()) + facet_wrap( vars(`Local Government District`), scales = "free_y", labeller = label_wrap_gen(width = 18) ) + labs( title = "Population aged 65+ by sex and local government district, 2001 to 2022", x = NULL, y = NULL, colour = NULL ) + theme(legend.position = "top") ``` ## Metadata nisrarr has some functionality for working with metadata. We can use the `get_metadata()` function on any dataset we download from the API to fetch some of the common or useful fields, such as whether these are official statistics, the subject of the statistics, and contact information: ```{r meta} get_metadata(mye) ``` If we need to work with any of these fields programmatically, we can fetch specific fields using `get_metadata_field()`: ```{r meta-field, eval=has_prettyunits} updated <- get_metadata_field(mye, "updated") updated |> lubridate::ymd_hms() |> prettyunits::time_ago() ``` ## Caching By default, nisrarr caches data fetch from the data portal API to speed up repeatedly fetching the same data. Results are cached for 1 hour then removed, or the cached values can be ignored by setting `flush_cache = TRUE` in `nisra_search()` or `nisra_read_dataset()`. Caching is useful when working interactively, but it is better to fetch data directly if it's part of a larger script or pipeline.