Help for package epidm

Version:

1.0.4

Title:

UK Epidemiological Data Management

Description:

Contains utilities and functions for the cleaning, processing and management of patient level public health data for surveillance and analysis held by the UK Health Security Agency, UKHSA.

URL:

https://github.com/alexbhatt/epidm, https://alexbhatt.github.io/epidm/

BugReports:

https://github.com/alexbhatt/epidm/issues

License:

GPL (≥ 3)

Depends:

R (≥ 3.1)

Imports:

data.table, DBI, odbc, phonics, purrr, readr, stats, stringi, stringr, utils

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.1.2

NeedsCompilation:

Packaged:

2022-07-11 10:55:44 UTC; Alex.Bhattacharya

Author:

Alex Bhattacharya

[aut, cre]

Maintainer:

Alex Bhattacharya <alex.bhatt@gmail.com>

Repository:

CRAN

Date/Publication:

2022-07-11 23:40:02 UTC

Continuous Inpatient (CIP) Spells

Description

A continuous inpatient (CIP) spell is a continuous period of care within the NHS, which does allow specific types of transfers to take place. It can therefore be made up of one or more provider spells. A CIP spell starts when a decision has been made to admit the patient, and a consultant has taken responsibility for their care. The spell ends when the patient dies or is discharged from hospital. This follows the NHS Digital Provider Spells Methodology: http://content.digital.nhs.uk/media/11859/Provider-Spells-Methodology/pdf/Spells_Methodology.pdf

Usage

cip_spells(
  x,
  group_vars,
  spell_start_date,
  admission_method,
  admission_source,
  spell_end_date,
  discharge_destination,
  patient_classification,
  .forceCopy = FALSE
)

Arguments

x

a data frame; will be converted to a data.table

group_vars

a vector containing any variables to be used for record grouping, minimum is a patient identifier

spell_start_date

Inpatient provider spell or episode admission date

admission_method

CDS admission method code

admission_source

CDS admission source code

spell_end_date

Inpatient provider spell or episode discharge date

discharge_destination

CDS discharge destination code

patient_classification

CDS patient classification code

.forceCopy

default FALSE; TRUE will force data.table to take a copy instead of editing the data without reference

Value

the original data.frame as a data.table with the following new fields:

cip_indx: an id field for the CIP spell
cip_spell_start: the start date for the CIP spell
cip_spell_end: the end date for the CIP spell

Examples


cip_test <- data.frame(
  id = c('465','465','465','465','8418','8418','8418',
         '8418','8418','8418','8418','8418','26443',
         '26443','26443','33299','33299','33299','33299',
         '33299','33299','33299','33299','33299','33299',
         '52635','52635','52635','52635','52635','52635',
         '52635','52635','52635','52635','52635','52635',
         '52635','52635','52635','52635','52635','52635',
         '52635','52635','52635','52635','52635','52635',
         '52635','52635','52635','78915','78915','78915'),
  provider = c('X1T','X1T','X1T','X1T','KHA','KHA','KHA',
               'KHA','KHA','KHA','KHA','KHA','BX2','BX2',
               'BX2','PXH','PXH','PXH','PXH','PXH','PXH',
               'PXH','PXH','PXH','PXH','9HA','9HA','9HA',
               '9HA','9HA','9HA','9HA','9HA','9HA','9HA',
               '9HA','9HA','9HA','9HA','9HA','9HA','YYT',
               'YYT','YYT','YYT','YYT','YYT','YYT','YYT',
               'YYT','YYT','YYT','ABX','ABX','ABX'),
  spell_start = as.Date(c(
    '2020-03-07','2020-03-07','2020-03-25','2020-04-03','2020-01-25',
    '2020-01-26','2020-07-14','2020-08-02','2020-08-12','2020-08-19',
    '2020-08-19','2020-11-19','2019-11-12','2020-04-17','2020-04-23',
    '2020-07-03','2020-01-17','2020-02-07','2020-03-20','2020-04-27',
    '2020-06-21','2020-07-02','2020-10-17','2020-11-27','2021-01-02',
    '2019-12-31','2020-01-02','2020-01-14','2020-01-16','2020-02-07',
    '2020-02-11','2020-02-14','2020-02-18','2020-02-21','2020-02-25',
    '2020-02-28','2020-03-09','2020-03-11','2020-03-12','2020-03-13',
    '2020-03-14','2020-02-04','2020-02-07','2020-02-11','2020-02-14',
    '2020-02-18','2020-02-21','2020-02-25','2020-02-28','2020-03-09',
    '2020-03-11','2020-03-12','2020-04-16','2020-04-24','2020-05-13')),
  spell_end = as.Date(c(
    '2020-03-07','2020-03-25','2020-04-02','2020-04-27','2020-01-25',
    '2020-01-27','2020-07-17','2020-08-07','2020-08-14','2020-08-19',
    '2020-08-22','2020-12-16','2020-04-17','2020-04-23','2020-05-20',
    '2020-07-24','2020-01-28','2020-02-07','2020-03-23','2020-04-29',
    '2020-06-21','2020-07-03','2020-11-27','2021-01-02','2021-01-10',
    '2019-12-31','2020-01-11','2020-01-14','2020-02-04','2020-02-07',
    '2020-02-11','2020-02-14','2020-02-18','2020-02-21','2020-02-25',
    '2020-02-28','2020-03-09','2020-03-11','2020-03-12','2020-03-13',
    '2020-03-30','2020-02-07','2020-02-11','2020-02-14','2020-02-18',
    '2020-02-21','2020-02-25','2020-02-28','2020-03-09','2020-03-11',
    '2020-03-12','2020-03-13','2020-04-24','2020-05-13','2020-06-11')),
  adm_meth = c('21','81','21','81','21','21','11','21','21','21','21',
               '21','21','81','21','81','21','21','21','21','21','21',
               '21','13','13','12','22','12','2D','13','13','13','13',
               '13','13','13','13','13','13','13','21','81','81','81',
               '81','81','13','81','81','13','13','13','21','11','81'),
  adm_src = c('19','51','19','51','19','51','19','51','19','19','19',
              '51','19','51','19','51','19','19','19','19','19','19',
              '19','51','19','19','19','19','19','19','19','19','19',
              '19','19','19','51','51','51','51','19','51','51','51',
              '51','51','51','51','51','51','51','51','19','51','51'),
  dis_meth = c('1','1','1','1','1','1','1','1','1','1','1','4','1','1',
               '4','1','1','1','1','1','1','1','8','1','4','1','1','1',
               '1','1','1','1','1','1','1','1','1','1','1','1','1','1',
               '1','1','1','1','1','1','1','1','1','1','1','1','2'),
  dis_dest = c('51','51','51','54','51','19','19','19','19','51','19',
               '79','51','51','79','65','19','19','19','19','19','29',
               '98','51','79','19','19','19','51','19','19','19','51',
               '51','51','19','19','51','51','19','51','51','51','51',
               '51','51','51','51','51','51','51','51','29','54','19'),
  patclass = c('1','1','1','1','1','1','1','1','1','1','1','1','1','1',
               '1','1','1','1','1','1','1','1','1','1','1','2','1','2',
               '1','2','2','2','2','2','2','2','2','2','2','2','1','1',
               '1','1','1','1','1','1','1','1','1','1','1','1','1')
)

cip_spells(x=cip_test,
  group_vars = c('id','provider'),
  patient_classification = 'patclass',
  spell_start_date = 'spell_start',
  admission_method = 'adm_meth',
  admission_source = 'adm_src',
  spell_end_date = 'spell_end',
  discharge_destination = 'dis_dest'
)[]

Download a csv from a zip

Description

A convenience function to allow you to pull data from NHS, ONS and ODR assets

Usage

csv_from_zip(x)

Arguments

x

a zip file from the web

Value

a zip file for ingestion into your chosen readr

Examples

## Not run: 
read.csv(csv_from_zip("https://files.digital.nhs.uk/assets/ods/current/succarc.zip"))

## End(Not run)

Bacterial Genus Gram Stain Lookup Table

Description

A reference table of bacterial gram stain results by genus to allow faster filtering of bacterial results. This dataset has been maintained manually against the PHE SGSS database. If there are organisms missing, please raise and issue or push request on the epidm GitHub

Usage

genus_gram_stain

Format

A data frame with four columns

organism_genus: The bacterial genus
gram_stain: A character string to indicate POSITIVE or NEGATIVE type
gram_positive: A 0/1 flag to indicate if the genus is gram positive
gram_negative: A 0/1 flag to indicate if the genus is gram negative

Grouping of intervals or events in time together

Description

Group across multiple observations of overlapping time intervals, with defined start and end dates, or events within a static/fixed or rolling window of time.

Usage

group_time(
  x,
  date_start,
  date_end,
  window,
  window_type = c("rolling", "static"),
  group_vars,
  indx_varname = "indx",
  min_varname = "date_min",
  max_varname = "date_max",
  .forceCopy = FALSE
)

Arguments

x

data frame, this will be converted to a data.table

date_start

column containing the start dates for the grouping, provided quoted

date_end

column containing the end dates for the interval, quoted

window

an integer representing a time window in days which will be applied to the start date for grouping events

window_type

character, to determine if a 'rolling' or 'static' grouping method should be used when grouping events

group_vars

in a vector, the all columns used to group records, quoted

indx_varname

a character string to set variable name for the index column which provides a grouping key; default is indx

min_varname

a character string to set variable name for the time period minimum

max_varname

a character string set variable name for the time period maximum

.forceCopy

default FALSE; TRUE will force data.table to take a copy instead of editing the data without reference

Value

the original data.frame as a data.table with the following new fields:

indx; renamed using indx_varname: an id field for the new aggregated events/intervals; note that where the date_start is NA, an indx value will also be NA
min_date; renamed using min_varname: the start date for the aggregated events/intervals
max_date; renamed using max_varname: the end date for the aggregated events/intervals

Examples

episode_test <- structure(
  list(
    pat_id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
               1L, 1L, 1L, 1L, 2L, 2L, 2L),
    species = c(rep("E. coli",7),rep("K. pneumonia",7)),
    spec_type = c(rep("Blood",7),rep("Blood",4),rep("Sputum",3)),
    sp_date = structure(c(18262, 18263, 18281, 18282, 18262, 18263, 18281,
                          18265, 18270, 18281, 18283, 18259, 18260, 18281),
                        class = "Date")
  ),
  row.names = c(NA, -14L), class = "data.frame")

group_time(x=episode_test,
           date_start='sp_date',
           window=14,
           window_type = 'static',
           indx_varname = 'static_indx',
           group_vars=c('pat_id','species','spec_type'))[]

spell_test <- data.frame(
  id = c(rep(99,6),rep(88,4),rep(3,3)),
  provider = c("YXZ",rep("ZXY",5),rep("XYZ",4),rep("YZX",3)),
  spell_start = as.Date(
    c(
      "2020-03-01",
      "2020-07-07",
      "2020-02-08",
      "2020-04-28",
      "2020-03-15",
      "2020-07-01",
      "2020-01-01",
      "2020-01-12",
      "2019-12-25",
      "2020-03-28",
      "2020-01-01",
      rep(NA,2)
    )
  ),
  spell_end = as.Date(
    c(
      "2020-03-10",
      "2020-07-26",
      "2020-05-22",
      "2020-04-30",
      "2020-05-20",
      "2020-07-08",
      "2020-01-23",
      "2020-03-30",
      "2020-01-02",
      "2020-04-20",
      "2020-01-01",
      rep(NA,2)
    )
  )
)

group_time(x = spell_test,
           date_start = 'spell_start',
           date_end = 'spell_end',
           group_vars = c('id','provider'),
           indx_varname = 'spell_id',
           min_varname = 'spell_min_date',
           max_varname = 'spell_max_date')[]

Inpatient Codes cleanup

Description

When HES/SUS ICD/OPCS codes are provided in wide format you may want to clean them up into long for easier analysis. This function helps by reshaping long as a separate table. Ensuring they're separate allows you to retain source data, and aggregate appropriately later.

Usage

inpatient_codes(
  x,
  field_strings,
  patient_id_vars,
  type = c("icd9", "icd10", "opcs"),
  .forceCopy = FALSE
)

Arguments

x

a data.frame or data.table containing inpatient data

field_strings

a vector or string containing the regex for the the columns

patient_id_vars

a vector containing colnames used to identify a patient episode or spell

type

a string to denote if the codes are diagnostic or procedural

.forceCopy

default FALSE; TRUE will force data.table to take a copy instead of editing the data without reference

Value

a separate table with codes and id in long form

Examples

inpatient_test <- data.frame(
id = c(1053L,5487L,8180L,528L,1085L,344L,2021L,2040L,
       6504L,10867L,12411L,7917L,2950L,2812L,7757L,12227L,2675L,
       8548L,536L,11830L,12708L,10421L,5503L,2494L,14001L),
spell_id = c("dwPDw","iSpUq","qpgk5","8vrJ1","BAur9","l6LZk",
             "KJllb","tgZID","fJkh8","Y9IPv","DAlUZ",
             "9Ooc4","hUxGn","wtMG9","dw3dO","cz3fI",
             "gdxZK","npplb","tynBh","Uu0Sd","gV1Ac",
             "vOpA1","ttlcD","Fqo29","ivTmN"),
primary_diagnosis_code = c("K602","U071-","I501","U071  ","J22X","J189",
                           "J189","I951","N130","U071","K510 D",NA,
                           "G409-","C780","N185","J955","K573","U071",
                           "I330","L309","M513","U071","A419","U071",
                           "N185-"),
secondary_diagnosis_code_1 = c("K641","J128-","I489","J128  ","Q348","F059",
                               "R296","R296","N131","J128","M0750A",NA,
                               "R401-","C782","Z491","C321","D125","J128",
                               "B952","J459","M4780","B972","N390","J128",
                               "Z491-"),
secondary_diagnosis_code_2 = c("E039","B972-","I10X","L031  ","Z115","I509",
                               "F051","I251","K862","B972","K590-",NA,
                               "E876-","C798","N085","Z938","I209","B972",
                               "I214","Z880","M8588","R296","B962","B972",
                               NA),
secondary_diagnosis_code_3 = c("I422","J9691","E119","I489  ","D509","I489",
                               "D509","I252","T391","J440","R21X-",NA,
                               "R945-","E119","M310","I480","I252","J9690",
                               "E111",NA,"Z115","R410","J181","Z518",NA),
secondary_diagnosis_code_4 = c(NA,"I10X-","E669","E109  ","K219","Z921","I251",
                               "I259","R458","B972","F200-",NA,"E039-",
                               "I10X",NA,"I500","F171","I489","E162",NA,
                               "I480","M2551","L892","E86X",NA),
secondary_diagnosis_code_5 = c(NA,"E119-","J449","F03X  ",NA,"Z518","I252",
                               "I209","C61X","A419","R761-",NA,"E119-",
                               "K219",NA,"Z115","F329","N179","N179",NA,
                               "H353","Z638","L033","R54X",NA),
secondary_diagnosis_code_6 = c(NA,NA,"Z966","I10X  ",NA,"N179","N183","Z115",
                               "K627","N390",NA,NA,"J459-","M4780",NA,
                               "Z900",NA,"I10X","R34X",NA,"I951","I10X",
                               "D510","F059",NA),
secondary_diagnosis_code_7 = c(NA,NA,"Z854","I679  ",NA,"N183","Z951","M190",
                               "R634","L031",NA,NA,"I10X-","M512",NA,
                               "Z921",NA,"E119","I959",NA,"H903","I678",
                               "K639","F03X",NA),
secondary_diagnosis_code_8 = c(NA,NA,"Z864","J459  ",NA,"E115","E119","N183",
                               "E111","E871",NA,NA,"R51X-","H409",NA,
                               "Z870",NA,NA,"J90X",NA,"M199","J459",
                               "N133","F29X",NA),
secondary_diagnosis_code_9 = c(NA,NA,"Z921","R296  ",NA,"L97X","I10X","M4806",
                               "E114","S099",NA,NA,"Q070-","H544",NA,
                               NA,NA,NA,"I501",NA,"K811","F03X","J90X",
                               "N189",NA),
secondary_diagnosis_code_10 = c(NA,NA,NA,"Z921  ",NA,"L089","Z921","N40X",
                                "G590","R296",NA,NA,"E668-","Z858",NA,NA,NA,
                                NA,"I489",NA,"K219","G20X","N202",
                                "F719",NA),
secondary_diagnosis_code_11 = c(NA,NA,NA,"Z515  ",NA,"R02X","Z507","Z864",
                                "E162","I489",NA,NA,"G473-","Z923",NA,NA,NA,
                                NA,"I447",NA,"J459","E119","L031",
                                "Z960",NA),
secondary_diagnosis_code_12 = c(NA,NA,NA,"Z501  ",NA,"B370","K579","Z955",
                                "E46X","Z921",NA,NA,"R600-","Z926",NA,NA,NA,
                                NA,"E86X",NA,"I10X",NA,"J981","Z922",
                                NA),
secondary_diagnosis_code_13 = c(NA,NA,NA,"Z507  ",NA,"E039","M109",NA,"I259",
                                "K709",NA,NA,"M1999","Z895",NA,NA,NA,NA,
                                "R33X",NA,"J40X",NA,"E119",NA,NA),
secondary_diagnosis_code_14 = c(NA,NA,NA,NA,NA,NA,"J459",NA,"N131","Z864",NA,
                                NA,"R468-","Z902",NA,NA,NA,NA,"R296",
                                NA,NA,NA,"I739",NA,NA),
secondary_diagnosis_code_15 = c(NA,NA,NA,NA,NA,NA,"Z880",NA,"K862","Z501",NA,
                                NA,"Z115-","Z971",NA,NA,NA,NA,"R468",
                                NA,NA,NA,"N183",NA,NA),
secondary_diagnosis_code_16 = c(NA,NA,NA,NA,NA,NA,"Z867",NA,"T391","Z505",NA,
                                NA,"Z501-","Z878",NA,NA,NA,NA,"R31X",
                                NA,NA,NA,"I489",NA,NA),
secondary_diagnosis_code_17 = c(NA,NA,NA,NA,NA,NA,"Z864",NA,"R458","Z518",NA,
                                NA,"Z507-","Z958",NA,NA,NA,NA,"Z115",
                                NA,NA,NA,"M549",NA,NA),
secondary_diagnosis_code_18 = c(NA,NA,NA,NA,NA,NA,"F03X",NA,"C61X",NA,NA,NA,
                                NA,"Z867",NA,NA,NA,NA,"I252",NA,NA,
                                NA,"I252",NA,NA),
secondary_diagnosis_code_19 = c(NA,NA,NA,NA,NA,NA,NA,NA,"K627",NA,NA,NA,NA,
                                "Z864",NA,NA,NA,NA,"I259",NA,NA,NA,
                                "I259",NA,NA),
secondary_diagnosis_code_20 = c(NA,NA,NA,NA,NA,NA,NA,NA,"R634",NA,NA,NA,NA,
                                "Z880",NA,NA,NA,NA,"I10X",NA,NA,NA,
                                "E669",NA,NA),
secondary_diagnosis_code_21 = c(NA,NA,NA,NA,NA,NA,NA,NA,"E111",NA,NA,NA,NA,
                                "Z800",NA,NA,NA,NA,"I352",NA,NA,NA,
                                "Z867",NA,NA),
secondary_diagnosis_code_22 = c(NA,NA,NA,NA,NA,NA,NA,NA,"E114",NA,NA,NA,NA,
                                "Z801",NA,NA,NA,NA,"R15X",NA,NA,NA,
                                "Z896",NA,NA),
secondary_diagnosis_code_23 = c(NA,NA,NA,NA,NA,NA,NA,NA,"G590",NA,NA,NA,NA,
                                NA,NA,NA,NA,NA,"R32X",NA,NA,NA,
                                "Z960",NA,NA),
secondary_diagnosis_code_24 = c(NA,NA,NA,NA,NA,NA,NA,NA,"E162",NA,NA,NA,NA,
                                NA,NA,NA,NA,NA,"R418",NA,NA,NA,
                                "Z874",NA,NA),
primary_procedure_code = c("H289",NA,"K634",NA,"X292",NA,NA,NA,NA,NA,
                           "H251",NA,"U051","L913","X403",NA,"H231",
                           "U071","M473","X384",NA,NA,NA,NA,"X403"),
primary_procedure_date = c("20170730",NA,"20201202",NA,"20170914",NA,NA,NA,
                           NA,NA,"20210105",NA,"20170724",
                           "20210111","20171114",NA,"20170622","20210104",
                           "20171013","20170313",NA,NA,NA,NA,
                           "20171107"),
secondary_procedure_code_1 = c("H626",NA,"Y534",NA,"U297",NA,NA,NA,NA,NA,
                               "Z286",NA,"Y981","Y031",NA,NA,"Z286",
                               "Y981",NA,NA,NA,NA,NA,NA,NA),
secondary_procedure_date_1 = c("20170730",NA,"20201202",NA,"20170928",NA,NA,NA,
                               NA,NA,"20210105",NA,"20170724",
                               "20210111",NA,NA,"20170622","20210104",NA,NA,NA,
                               NA,NA,NA,NA),
secondary_procedure_code_2 = c("H444",NA,"Z941",NA,NA,NA,NA,NA,NA,NA,NA,NA,
                               "U212",NA,NA,NA,NA,NA,NA,NA,NA,NA,
                               NA,NA,NA),
secondary_procedure_date_2 = c("20170730",NA,"20201202",NA,NA,NA,NA,NA,NA,NA,
                               NA,NA,"20170729",NA,NA,NA,NA,NA,NA,
                               NA,NA,NA,NA,NA,NA),
secondary_procedure_code_3 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"Y973",
                               NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
                               NA),
secondary_procedure_date_3 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
                               "20170729",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
                               NA,NA),
secondary_procedure_code_4 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"Y982",
                               NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
                               NA),
secondary_procedure_date_4 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
                               "20170729",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
                               NA,NA),
secondary_procedure_code_5 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"Z926",
                               NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
                               NA),
secondary_procedure_date_5 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
                               "20170729",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
                               NA,NA),
secondary_procedure_code_6 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"O161",
                               NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
                               NA),
secondary_procedure_date_6 = c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
                               "20170729",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
                               NA,NA)
)

inpatient_codes(x=inpatient_test,
                field_strings='diagnosis',
                patient_id_vars = c('id','spell_id'),
                type = 'icd10')

inpatient_codes(x=inpatient_test,
                field_strings=c('procedure_code','procedure_date'),
                patient_id_vars = c('id','spell_id'),
                type = 'opcs')

Link A&E to Inpatient records

Description

Link together ECDS A&E records to HES/SUS inpatient records on NHS number, Hospital Number and Date of Birth.

Usage

link_ae_inpatient(
  ae_data,
  ae_in,
  ae_out,
  inpatient_data,
  admission_date,
  spell_id,
  nhs_number = c("nhs_number", "nhs_number"),
  hospital_number = c("local_patient_identifier", "local_patient_identifier"),
  patient_dob = c("patient_birth_date", "date_birth"),
  org_code = c("organisation_code_of_provider", "organisation_code_code_of_provider"),
  .forceCopy = FALSE
)

Arguments

ae_data

the ECDS A&E dataset

ae_in

the ECDS arrival date

ae_out

the ECDS discharge date

inpatient_data

the HES/SUS inpatient dataset

admission_date

a vector containing the inpatient (HES/SUS) admission date

spell_id

the HES/SUS spell id

nhs_number

a vector containing the columns for the NHS numbers

hospital_number

a vector containing the columns for the Hospital numbers

patient_dob

a vector containing the columns for the date of birth

org_code

a vector containing the columns for the organisation codes

.forceCopy

a boolean to control if you want to copy the dataset before linking together

Value

a patient level linked hospital record

Lookup table switch handler

Description

A function to call an epidm lookup table and recode where we are aware of a new value.

Built in are the organism re-classifications and specimen_type groupings and a manual mode.

Usage

lookup_recode(src, type = c("species", "specimen", "manual"), .import = NULL)

Arguments

src

a character, vector or column containing the value(s) to be referenced

type

a character value to denote the lookup table used

.import

a list in the order list(new,old) containing the values for another lookup table existing in the environment

Value

a list object of the recoded field

Examples

df <- data.frame(
  spec = c(
    sample(grep(")",
                respeciate_organism$previous_organism_name,
                value=TRUE,
                invert = TRUE),
           9),
    "ESCHERICHIA COLI","SARS-COV-2","CANDIDA AUREUS"),
  type = sample(specimen_type_grouping$specimen_type,12),
  date = sample(seq.Date(from = Sys.Date()-365,
                         to = Sys.Date(),
                         by = "day"),12)
)
df <- df[order(df$date),]

# show the data before the changes
df

# check the lookup tables
# observe the changes
head(respeciate_organism[1:2])
df$species <- lookup_recode(df$spec,'species')
df[,c('spec','species','date')]

head(specimen_type_grouping)
df$grp <- lookup_recode(df$type,'specimen')
df[,c('species','type','grp','date')]

# for a tidyverse use
# df %>% mutate(spec=lookup_recode(spec,'species))

# manual input of your own lookup
# .import=list(new,old)
lookup_recode(
  "ALCALIGENES DENITRIFICANS",
  type = 'manual',
  .import=list(respeciate_organism$organism_species_name,
               respeciate_organism$previous_organism_name)
  )

HES/SUS Episode Date Cleaning

Description

Correcting for missing end dates on HES/SUS episodes

Usage

proxy_episode_dates(
  x,
  group_vars,
  spell_start_date,
  spell_end_date,
  discharge_destination,
  .dropTmp = TRUE,
  .forceCopy = FALSE
)

Arguments

x

a data frame; will be converted to a data.table

group_vars

a vector containing any variables to be used for record grouping, minimum is a patient identifier

spell_start_date

Inpatient provider spell or episode admission date

spell_end_date

Inpatient provider spell or episode discharge date

discharge_destination

CDS discharge destination code

.dropTmp

default TRUE; a logical to drop all tmp values used

.forceCopy

default FALSE; TRUE will force data.table to take a copy instead of editing the data without reference

Value

a data.table with cleaned start and end dates, and an indicator proxy_missing where the value has changed

Examples


proxy_test <- data.frame(
  id = c(
    rep(3051, 4),
    rep(7835,3),
    rep(9891,3),
    rep(1236,3)
  ),
  provider = c(
    rep("QKJ", 4),
    rep("JSD",3),
    rep("YJG",3),
    rep("LJG",3)
  ),
  spell_start = as.Date(c(
    "2020-07-03", "2020-07-14", "2020-07-23", "2020-08-05",
    "2020-11-01", "2020-11-13", "2020-12-01",
    "2020-03-28", "2020-04-06", "2020-04-09",
    "2020-10-06", "2020-11-05", "2020-12-25"
  )),
  spell_end = as.Date(c(
    "2020-07-11", "2020-07-22", "2020-07-30", "2020-07-30",
    "2020-11-11", NA, "2020-12-03",
    "2020-03-28", NA, "2020-04-09",
    "2020-10-06", "2020-11-05", NA
  )),
  disdest = c(
    19, 19, 51, 19,
    19, 19, 19,
    51, 98, 19,
    19, 19, 98
  )
)


proxy_episode_dates(
  x=proxy_test,
  group_vars = c('id','provider'),
  spell_start_date = 'spell_start',
  spell_end_date = 'spell_end',
  discharge_destination = 'disdest'
)[]

Respeciate unspecified samples

Description

Some samples within SGSS are submitted by laboratories as "GENUS SP" or "GENUS UNNAMED". However, they may also have a fully identified sample taken from the same site within a recent time period. This function captures species_col from another sample within X-days of an unspeciated isolate.

Usage

respeciate_generic(
  x,
  group_vars,
  species_col,
  date_col,
  window = c(0:Inf),
  .forceCopy = FALSE
)

Arguments

x

a data.frame or data.table object

group_vars

the minimum grouping set of variables for like samples in a character vector; suggest c('patient_id','specimen_type','genus')

species_col

a character containing the column with the organism species_col name

date_col

a character containing the column with the specimen/sample date_col

window

an integer representing the number of days for which you will allow a sample to be respeciated

.forceCopy

default FALSE; TRUE will force data.table to take a copy instead of editing the data without reference

Value

a data.table with a recharacterised species_col column

Examples

df <- data.frame(
ptid = c(round(runif(25,1,5))),
spec = sample(c("KLEBSIELLA SP",
                "KLEBSIELLA UNNAMED",
                "KLEBSIELLA PNEUMONIAE",
                "KLEBEIELLA OXYTOCA"),
              25,replace = TRUE),
type = "BLOOD",
specdate = sample(seq.Date(Sys.Date()-21,Sys.Date(),"day"),25,replace = TRUE)
)

respeciate_generic(x=df,
                   group_vars=c('ptid','type'),
                   species_col='spec',
                   date_col='specdate',
                   window = 14)[]

Respeciated organisms

Description

Occasionally, research shows that two organisms, previously thought to be different are in fact one and the same. The reverse is also true. This is a manually updated list. If there are organisms missing, or new respeciates to be added, please raise and issue or push request on the epidm GitHub

Usage

respeciate_organism

Format

previous_organism_name: What the organism used to be known as, in the form GENUS SPECIES
organism_species_name: What the organism is known as now, in the form GENUS SPECIES
organism_genus_name: The genus of the recoded organism
genus_change: A 0/1 flag to indicate if the genus has changed
genu_all_species: A 0/1 flag to indicate if all species under that genus should change

Specimen type grouping

Description

In order to help clean up an analysis based on a group of specimen types, a lookup table has been created to help group sampling sites. This is a manually updated list. If there are organisms missing, or new respeciates to be added, please raise and issue or push request on the epidm GitHub

Usage

specimen_type_grouping

Format

specimen_type: The primary specimen type with detail
specimen_group: A simple grouping of like specimen sites

Clean and Read a SQL query

Description

A utility function to read in a SQL query from a character object, clipboard or text file and remove all comments for use with database query packages

Usage

sql_clean(sql)

Arguments

sql

a SQL file or text string

Value

a cleaned SQL query without comments as a character string

Examples

testSQL <- c(
"/********* INTRO HEADER COMMENTS",
"*********/",
"  SELECT ",
"  [VAR 1]  -- with comments",
",[VAR 2]",",[VAR 3]",
"FROM DATASET ","-- output here")
sql_clean(testSQL)

Connect to a SQL database

Description

An function to help setup connections to SQL databases acting as a wrapper for the odbc and DBI packages. Used by other sql_* tools within epidm. This uses the credential manager within the system and assumes you are using a trusted connection.

Usage

sql_connect(server, database)

Arguments

server

a string containing the server connection; note that servers may require the use of double backslash ⁠\\⁠

database

a string containing the database name within the data store

Value

a SQL connection object

Examples

## Not run: 
sql <- list(
  dsn = list(ser = 'covid.ukhsa.gov.uk',
             dbn = 'infections')
)

sgss_con = sql_connect(server = sql$dsn$ser, database = sql$dsn$dbn)

## End(Not run)

Read a table from a SQL database

Description

Read a table object to a SQL database. Acts a wrapper for odbc and DBI packages.

Usage

sql_read(server, database, sql)

Arguments

server

a string containing the server connection

database

a string containing the database name within the data store

sql

a string containing a SQL query or to a .sql/.txt SQL query

Value

a table from a SQL database

Write a table to a SQL database

Description

Write a table object to a SQL database. Acts a wrapper for odbc and DBI packages with additional checks to ensure upload completes.

Usage

sql_write(x, server, database, tablename)

Arguments

x

a data.frame/data.table/tibble object

server

a string containing the server connection

database

a string containing the database name within the data store

tablename

a string containing the chosen SQL database table name

Value

writes a data.frame/data.table/tibble to a SQL database

Patient ID record grouping

Description

Groups patient records from multiple isolates with a single integer patientID by grouping patient identifiers.

Grouping is based on five stages:

matching nhs number and date of birth
Hospital number & Date of Birth
NHS number & Hospital Number
Date of Birth & Surname IF nhs unknown
Sex & Date of Birth & Fuzzy Name

Identifiers are copied over where they are missing or invalid to the grouped records.

Usage

uk_patient_id(
  x,
  nhs_number,
  hospital_number,
  date_of_birth,
  sex_mfu,
  forename = "NONAME",
  surname = "NONAME",
  .sortOrder,
  .keepValidNHS = FALSE,
  .forceCopy = FALSE,
  .experimental = FALSE
)

Arguments

x

a data.frame or data.table containing the cleaned line list

nhs_number

a column as a character containing the patient NHS numbers

hospital_number

a column as a character containing the patient Hospital numbers

date_of_birth

a column as a date variable containing the patient date of birth in date format

sex_mfu

column as a character containing the patient sex; NOTE only works if coded only as character versions of Male/Female/Unknown; does not currently work with additional options ⁠#future update⁠

forename

a column as a character containing the patient forename; leave as NONAME if unavailable

surname

a column as a character containing the patient surname; leave as NONAME if unavailable

.sortOrder

optional; a column as a character to allow a sorting order on the id generation

.keepValidNHS

optional, default FALSE; set TRUE if you wish to retain the column with the NHS checksum result stored as a BOOLEAN

.forceCopy

optional, default FALSE; TRUE will force data.table to take a copy instead of editing the data without reference

.experimental

optional, default FALSE; TRUE will enable the experimental features for recoding NA values based on the mode

Value

A dataframe with one new variable:

id: a unique patient id
valid_nhs: if retained using argument .keepValidNHS=TRUE, a BOOLEAN containing the result of the NHS checksum validation

Examples

id_test <- data.frame(
  nhs_n = c(
    9434765919,9434765919,9434765919,NA,NA,
    3367170666,5185293519,5185293519,5185293519,8082318562,NA,NA,NA
  ),
  hosp_n = c(
    '13','13','13','UNKNOWN','13','13','13','31','31','96','96',NA,'96'),
  sex = c(rep('F',6),rep('Male',4), 'U', 'U', 'M'),
  dateofbirth = as.Date(
    c(
      '1988-10-06','1988-10-06','1900-01-01','1988-10-06','1988-10-06',
      '1988-10-06','1988-10-06','1988-10-06','1988-10-06','2020-01-28',
      '2020-01-28','2020-01-28','2020-01-28'
    )
  ),
  firstname = c(
    'Danger','Danger','Denger','Danger','Danger','DANGER','Danger',
    'Danger','Danger','Crazy','Crazy','Krazy','C'
  ),
  lastname = c(
    'Mouse','Mause','Mouse','Moose','Moose','Mouse','MOUSe',
    'Mouse','Mouse','Frog','FROG','Frug','Frog'
  ),
  testdate = sample(seq.Date(Sys.Date()-21,Sys.Date(),"day"),13,replace = TRUE)
)
uk_patient_id(x = id_test,
              nhs_number = 'nhs_n',
              hospital_number = 'hosp_n',
              forename = 'firstname',
              surname = 'lastname',
              sex_mfu = 'sex',
              date_of_birth = 'dateofbirth',
              .sortOrder = 'testdate')[]

NHS Number Validity Check

Description

Check if NHS numbers are valid based on the checksum algorithm

This uses the first 9 digits, multiplied by 10 down to 2 eg digit 1x10, d2x9

The sum of the products of the first 9 digits are divided by 11

The remainder is checked against the 10th digit

Where the remainder is 11, it is replaced with 0

Usage

valid_nhs(nhs_number)

Arguments

nhs_number

a vector

Value

a vector, 1 if NHS number is valid, 0 if not valid

Examples

test <- floor(runif(1000,1000000000,9999999999))
valid_nhs(test)
valid_nhs(9434765919)

Continuous Inpatient (CIP) Spells

Description

Usage

Arguments

Value

Examples

Download a csv from a zip

Description

Usage

Arguments

Value

Examples

Bacterial Genus Gram Stain Lookup Table

Description

Usage

Format

Grouping of intervals or events in time together

Description

Usage

Arguments

Value

Examples

Inpatient Codes cleanup

Description

Usage

Arguments

Value

Examples

Link A&E to Inpatient records

Description

Usage

Arguments

Value

See Also

Lookup table switch handler

Description

Usage

Arguments

Value

Examples

HES/SUS Episode Date Cleaning

Description

Usage

Arguments

Value

Examples

Respeciate unspecified samples

Description

Usage

Arguments

Value

Examples

Respeciated organisms

Description

Usage

Format

Specimen type grouping

Description

Usage

Format

Clean and Read a SQL query

Description

Usage

Arguments

Value

Examples

Connect to a SQL database

Description

Usage

Arguments

Value

See Also

Examples

Read a table from a SQL database

Description

Usage

Arguments

Value

See Also

Write a table to a SQL database