Help for package mlspatial

Title:

Machine Learning and Mapping for Spatial Epidemiology

Version:

0.1.0

Description:

Provides tools for the integration, visualisation, and modelling of spatial epidemiological data using the method described in Azeez, A., & Noel, C. (2025). 'Predictive Modelling and Spatial Distribution of Pancreatic Cancer in Africa Using Machine Learning-Based Spatial Model' <doi:10.5281/zenodo.16529986> and <doi:10.5281/zenodo.16529016>. It facilitates the analysis of geographic health data by combining modern spatial mapping tools with advanced machine learning (ML) algorithms. 'mlspatial' enables users to import and pre-process shapefile and associated demographic or disease incidence data, generate richly annotated thematic maps, and apply predictive models, including Random Forest, 'XGBoost', and Support Vector Regression, to identify spatial patterns and risk factors. It is suited for spatial epidemiologists, public health researchers, and GIS analysts aiming to uncover hidden geographic patterns in health-related outcomes and inform evidence-based interventions.

RoxygenNote:

7.3.2

Suggests:

knitr, rmarkdown, tidyr, kernlab, writexl, testthat (≥ 3.0.0)

VignetteBuilder:

knitr

Depends:

R (≥ 4.1)

Imports:

sf, readxl, dplyr, ggplot2, randomForest, xgboost, e1071, caret, tmap, spdep, ggpubr, stats, methods

License:

MIT + file LICENSE

Encoding:

UTF-8

LazyData:

true

Config/testthat/edition:

NeedsCompilation:

Packaged:

2025-08-21 07:19:53 UTC; azeez

Author:

Adeboye Azeez [aut, cre], Colin Noel [aut]

Maintainer:

Adeboye Azeez <azizadeboye@gmail.com>

Repository:

CRAN

Date/Publication:

2025-08-26 19:40:02 UTC

Africa shapefile data

Description

A dataset containing spatial polygons of Africa.

Usage

africa_shp

Format

An sf object with spatial features.

Source

Your data source

Africa shapefile data 2

Description

A dataset containing spatial polygons of Africa.

Usage

africa_shps

Format

An sf object with spatial features.

Source

Your data source

Compute Moran's I & LISA, classify clusters

Description

Computes global and local Moran’s I to assess spatial autocorrelation and classifies observations into spatial cluster types (e.g., High-High).

Usage

compute_spatial_autocorr(sf_data, values, signif = 0.05)

Arguments

sf_data

An sf object containing spatial features.

values

A numeric vector or column name with the variable to test.

signif

Numeric significance level threshold for clusters (default 0.05).

Value

A named list with elements:

data: An sf object with added columns for standardized values, spatial lag, local Moran's I values, z-scores, p-values, and cluster classification.
moran: An object of class htest with global Moran's I test results.

Examples


library(sf)
library(spdep)
library(dplyr)

#Load and prepare spatial data
mapdata <- st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)
mapdata <- st_make_valid(mapdata)

#Variable to analyze
values <- rnorm(nrow(mapdata))

#Run function
result <- compute_spatial_autocorr(mapdata, values, signif = 0.05)

#Inspect results
head(result$data)
result$moran

Get RMSE/MAE/R² metrics on training data

Description

Evaluate Model Performance by calculating RMSE, MAE, and R² metrics.

Usage

eval_model(model, data, formula, model_type = c("rf", "xgb", "svr"))

Arguments

model

A trained model

data

A data frame

formula

A formula object

model_type

Character string: one of "rf", "xgb", or "svr"

Value

A numeric value representing the model's accuracy

Declare known global variables to suppress R CMD check NOTE Global variables used in evaluation functions

Description

This is to suppress R CMD check notes about undefined global variables.

Join spatial and incidence datasets

Description

Join spatial and incidence datasets

Usage

join_data(sf_data, tbl_data, by)

Arguments

sf_data

sf object

tbl_data

tibble of incidence

by

Column name to join on

Value

sf object with joined attributes

Load incidence data from Excel

Description

Load incidence data from Excel

Usage

load_incidence_data(xlsx_path)

Arguments

xlsx_path

Path to Excel file

Value

tibble of data

Load shapefile as sf + optionally convert to sp

Description

Load shapefile as sf + optionally convert to sp

Usage

load_shapefile(shp_path, to_sp = FALSE)

Arguments

shp_path

Path to shapefile (.shp)

to_sp

logical: also return Spatial object?

Value

list with sf and optionally sp object

Examples for model evaluation functions

Description

Examples for model evaluation functions

Examples


library(randomForest)
library(caret)
data(panc_incidence)
mapdata <- join_data(africa_shp, panc_incidence, by = "NAME")
rf_model <- randomForest(incidence ~ female + male + agea + ageb + agec + fagea + fageb + fagec +
magea + mageb + magec + yrb + yrc + yrd + yre, data = mapdata, ntree = 500,
importance = TRUE)

rf_preds <- predict(rf_model, newdata = mapdata)
rf_metrics <- postResample(pred = rf_preds, obs = mapdata$incidence)
print(rf_metrics)

Pancreatic Cancer Incidence Data

Description

This dataset contains pancreatic cancer incidence rates across African countries.

Usage

data(panc_incidence)

Format

A data frame with the following variables:

NAME: Character. Name of the country.
incidence: Double. Incidence rate per 100,000 population.
female: Double. Female pancreatic cancer patients.
male: Double. Male pancreatic cancer patients.
ageb: Double. Patients age between 20-54 years.
agec: Double. Patients age above 55 years.
agea: Double. Patients age below 20 years.
fageb: Double. Female patients age between 20-54 years.
fagec: Double. Female patients age above 55 years.
fagea: Double. Female patients age below 20 years.
mageb: Double. Male patients age between 20-54 years.
magec: Double. Male patients age above 55 years.
magea: Double. Male patients age below 20 years.
yra: Double. Incidence rate in year 2017.
yrb: Double. Incidence rate in year 2018.
yrc: Double. Incidence rate in year 2019.
yrd: Double. Incidence rate in year 2020.
yre: Double. Incidence rate in year 2021.

Source

Global Burden of Disease (GBD) 2021 estimates, Seattle, United States https://vizhub.healthdata.org/gbd-results/

Pancreatic Cancer Prevalence Data

Description

This dataset contains pancreatic cancer incidence rates across African countries.

Usage

data(panc_prevalence)

Format

A data frame with the following variables:

NAME: Character. Name of the country.
prevalence: Numeric. Prevalence rate per 100,000 population.
female: Numeric. Female pancreatic cancer patients.
male: Numeric. Male pancreatic cancer patients.
ageb: Numeric. Patients age between 20-54 years.
agec: Numeric. Patients age above 55 years.
agea: Numeric. Patients age below 20 years.
fageb: Numeric. Female patients age between 20-54 years.
fagec: Numeric. Female patients age above 55 years.
fagea: Numeric. Female patients age below 20 years.
mageb: Numeric. Male patients age between 20-54 years.
magec: Numeric. Male patients age above 55 years.
magea: Numeric. Male patients age below 20 years.
yra: Numeric. Incidence rate in year 2017.
yrb: Numeric. Incidence rate in year 2018.
yrc: Numeric. Incidence rate in year 2019.
yrd: Numeric. Incidence rate in year 2020.
yre: Numeric. Incidence rate in year 2021.

Source

Global Burden of Disease (GBD) 2021 estimates, Seattle, United States https://vizhub.healthdata.org/gbd-results/

Pancreatic Cancer Mortality Data

Description

This dataset contains pancreatic cancer incidence rates across African countries.

Usage

data(pancre_mort)

Format

A data frame with the following variables:

NAME: Character. Name of the country.
mortality: Numeric. Mortality rate per 100,000 population.
female: Numeric. Female pancreatic cancer patients.
male: Numeric. Male pancreatic cancer patients.
ageb: Numeric. Patients age between 20-54 years.
agec: Numeric. Patients age above 55 years.
agea: Numeric. Patients age below 20 years.
fageb: Numeric. Female patients age between 20-54 years.
fagec: Numeric. Female patients age above 55 years.
fagea: Numeric. Female patients age below 20 years.
mageb: Numeric. Male patients age between 20-54 years.
magec: Numeric. Male patients age above 55 years.
magea: Numeric. Male patients age below 20 years.
yra: Numeric. Incidence rate in year 2017.
yrb: Numeric. Incidence rate in year 2018.
yrc: Numeric. Incidence rate in year 2019.
yrd: Numeric. Incidence rate in year 2020.
yre: Numeric. Incidence rate in year 2021.

Source

Global Burden of Disease (GBD) 2021 estimates, https://vizhub.healthdata.org/gbd-results/

Arrange Multiple tmap Plots in a Grid

Description

Arrange a list of tmap objects into a grid layout.

Usage

plot_map_grid(maps, ncol = 2)

Arguments

maps

A list of tmap objects.

ncol

Number of columns in the grid (default is 2).

Value

A tmap object representing arranged maps.

Examples


library(sf)
library(tmap)

# Load sample spatial data
nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)

# Add mock variables to map
nc$var1 <- runif(nrow(nc), 0, 100)
nc$var2 <- runif(nrow(nc), 10, 200)

# Create individual maps
map1 <- tm_shape(nc) + tm_fill("var1", title = "Variable 1")
map2 <- tm_shape(nc) + tm_fill("var2", title = "Variable 2")

# Arrange the maps in a grid using your function
plot_map_grid(list(map1, map2), ncol = 2)

Plot observed vs predicted values with correlation

Description

Creates a scatterplot of observed vs predicted values, with a 1:1 reference line and Pearson's R².

Usage

plot_obs_vs_pred(observed, predicted, title = "")

Arguments

observed

Numeric vector of observed values.

predicted

Numeric vector of predicted values.

title

String for the plot title (default: "").

Value

No return value; called for side effect of displaying a plot.

Examples

observed <- c(10, 20, 30, 40)
predicted <- c(12, 18, 33, 39)
plot_obs_vs_pred(observed, predicted, title = "Observed vs Predicted")

Build a tmap for a single variable

Description

Creates a thematic map using the tmap package for a single variable in an sf object.

Usage

plot_single_map(sf_data, var, title, palette = "reds")

Arguments

sf_data

An sf object containing spatial data.

var

Variable name as a string to map.

title

Legend title for the fill legend.

palette

Color palette for the map (default is "reds").

Value

A tmap object representing the thematic map.

Examples


library(sf)
# Create example sf object
nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
nc$incidence <- runif(nrow(nc), 0, 100)

# Plot
p1 <- plot_single_map(nc, "incidence", "Incidence")

Train Random Forest model

Description

Trains a Random Forest regression model.

Usage

train_rf(data, formula, ntree = 500, seed = 123)

Arguments

data

A data frame containing the training data.

formula

A formula describing the model structure.

ntree

Number of trees to grow (default 500).

seed

Random seed for reproducibility (default 123).

Value

A trained randomForest model object.

Examples


library(randomForest)
data(mtcars)
rf_model <- train_rf(mtcars, mpg ~ cyl + hp + wt, ntree = 100)
print(rf_model)

Train Support Vector Regression (SVR) model

Description

Train Support

Usage

train_svr(data, formula)

Arguments

data

A data frame containing the training data.

formula

A formula specifying the model.

Details

Trains an SVR model using the radial kernel.

Value

A trained svm model object from the e1071 package.

Examples


# Load required package
library(e1071)

# Use built-in dataset
data(mtcars)

# Define regression formula
svr_formula <- mpg ~ cyl + disp + hp + wt

# Train SVR model
svr_model <- train_svr(data = mtcars, formula = svr_formula)

# Print model summary
print(svr_model)

# Predict on the same data (for illustration)
preds <- predict(svr_model, newdata = mtcars)
head(preds)

Train XGBoost model

Description

Train XGBoost model

Usage

train_xgb(data, formula, nrounds = 100, max_depth = 4, eta = 0.1)

Arguments

data

A data frame with the training data.

formula

A formula defining the model structure.

nrounds

Number of boosting iterations.

max_depth

Maximum tree depth.

eta

Learning rate.

Details

Trains an XGBoost regression model.

Value

A trained xgboost model object.

Examples


# Load required package
library(xgboost)

# Use built-in dataset
data(mtcars)

# Define regression formula
xgb_formula <- mpg ~ cyl + disp + hp + wt

# Train XGBoost model
xgb_model <- train_xgb(data = mtcars, formula = xgb_formula, nrounds = 50)

# Print model summary
print(xgb_model)