--- title: "mirai - Minimalist Async Evaluation Framework for R" vignette: > %\VignetteIndexEntry{mirai - Minimalist Async Evaluation Framework for R} %\VignetteEngine{litedown::vignette} %\VignetteEncoding{UTF-8} --- This vignette provides a quick introduction through 3 typical use cases. Other package vignettes contain more in-depth information on a variety of topics. ### Table of Contents 1. [Example 1: Compute-intensive Operations](#example-1-compute-intensive-operations) 2. [Example 2: I/O-bound Operations](#example-2-io-bound-operations) 3. [Example 3: Resilient Pipelines](#example-3-resilient-pipelines) ### Example 1: Compute-intensive Operations Use case: minimise execution times by performing long-running tasks concurrently in separate processes. Multiple long computes (model fits etc.) can be performed in parallel on available computing cores. Use `mirai()` to evaluate an expression asynchronously in a separate, clean R process. The following mimics an expensive calculation that eventually returns a random value. ``` r library(mirai) args <- list(time = 2L, mean = 4) m <- mirai( { Sys.sleep(time) rnorm(5L, mean) }, time = args$time, mean = args$mean ) ``` The mirai expression is evaluated in another process and hence must be self-contained, not referring to variables that do not already exist there. Above, the variables `time` and `mean` are passed as part of the `mirai()` call. A 'mirai' object is returned immediately - creating a mirai never blocks the session. Whilst the async operation is ongoing, attempting to access a mirai's data yields an 'unresolved' logical NA. ``` r m #> < mirai [] > m$data #> 'unresolved' logi NA ``` To check whether a mirai remains unresolved (yet to complete): ``` r unresolved(m) #> [1] TRUE ``` To wait for and collect the return value, use `collect_mirai()` or equivalently the mirai's `[]` method: ``` r collect_mirai(m) #> [1] 6.223694 2.189400 4.695696 4.807338 6.597992 m[] #> [1] 6.223694 2.189400 4.695696 4.807338 6.597992 ``` As a mirai represents an async operation, it is never necessary to wait for it - other code can continue to be run. Once it completes, the return value automatically becomes available at `$data`. ``` r m #> < mirai [$data] > m$data #> [1] 6.223694 2.189400 4.695696 4.807338 6.597992 ``` For easy programmatic use of `mirai()`, '.expr' accepts a pre-constructed language object, and also a list of named arguments passed via '.args'. So, the following would be equivalent to the above: ``` r expr <- quote({Sys.sleep(time); rnorm(5L, mean)}) m <- mirai(.expr = expr, .args = args) m[] #> [1] 4.703916 1.963779 4.930849 5.174166 3.128183 ``` [« Back to ToC](#table-of-contents) ### Example 2: I/O-bound Operations Use case: ensure execution flow of the main process is not blocked. High-frequency real-time data cannot be written to file/database synchronously without disrupting the execution flow. Cache data in memory and use `mirai()` to perform periodic write operations concurrently in a separate process. Below, '.args' is used to pass `environment()`, which is the calling environment. This provides a convenient method of passing in existing objects. ``` r library(mirai) x <- rnorm(1e6) file <- tempfile() m <- mirai(write.csv(x, file = file), .args = environment()) ``` A 'mirai' object is returned immediately. `unresolved()` may be used in control flow statements to perform actions which depend on resolution of the 'mirai', both before and after. This means there is no need to actually wait (block) for a 'mirai' to resolve, as the example below demonstrates. ``` r while (unresolved(m)) { cat("while unresolved\n") Sys.sleep(0.5) } #> while unresolved #> while unresolved cat("Write complete:", is.null(m$data)) #> Write complete: TRUE ``` Now actions which depend on the resolution may be processed, for example the next write. [« Back to ToC](#table-of-contents) ### Example 3: Resilient Pipelines Use case: isolating code that can potentially fail in a separate process to ensure continued uptime. As part of a data science / machine learning pipeline, iterations of model training may periodically fail for stochastic and uncontrollable reasons (e.g. buggy memory management on graphics cards). Running each iteration in a 'mirai' isolates this potentially-problematic code such that it does not bring down the entire pipeline, even if it fails. ``` r library(mirai) run_iteration <- function(i) { # simulates a stochastic error rate if (runif(1) < 0.1) stop("random error\n", call. = FALSE) sprintf("iteration %d successful\n", i) } for (i in 1:10) { m <- mirai(run_iteration(i), environment()) while (is_error_value(m[])) { cat(m$data) m <- mirai(run_iteration(i), environment()) } cat(m$data) } #> iteration 1 successful #> iteration 2 successful #> iteration 3 successful #> iteration 4 successful #> Error: random error #> iteration 5 successful #> iteration 6 successful #> iteration 7 successful #> iteration 8 successful #> iteration 9 successful #> Error: random error #> iteration 10 successful ``` Further, by testing the return value of each 'mirai' for errors, error-handling code is then able to automate recovery and re-attempts, as in the above example. The end result is a resilient and fault-tolerant pipeline that minimises downtime by eliminating interruptions of long computes. #### Further details on error handling If execution in a mirai fails, the error message is returned as a character string of class 'miraiError' and 'errorValue' to facilitate debugging. `is_mirai_error()` may be used to test for mirai execution errors. ``` r m1 <- mirai(stop("occurred with a custom message", call. = FALSE)) m1[] #> 'miraiError' chr Error: occurred with a custom message m2 <- mirai(mirai::mirai()) m2[] #> 'miraiError' chr Error in mirai::mirai(): missing expression, perhaps wrap in {}? is_mirai_error(m2$data) #> [1] TRUE is_error_value(m2$data) #> [1] TRUE ``` A full stack trace of evaluation within the mirai is recorded and accessible at `$stack.trace` on the error object. ``` r f <- function(x) if (x > 0) stop("positive") m3 <- mirai({f(-1); f(1)}, f = f) m3[] #> 'miraiError' chr Error in f(1): positive m3$data$stack.trace #> [[1]] #> stop("positive") #> #> [[2]] #> f(1) ``` Elements of the original error condition are also accessible via `$` on the error object. For example, additional metadata recorded by `rlang::abort()` is preserved: ``` r f <- function(x) if (x > 0) stop("positive") m4 <- mirai(rlang::abort("aborted", meta_uid = "UID001")) m4[] #> 'miraiError' chr Error: aborted m4$data$meta_uid #> [1] "UID001" ``` If a daemon instance is sent a user interrupt, the mirai will resolve to an object of class 'miraiInterrupt' and 'errorValue'. `is_mirai_interrupt()` may be used to test for such interrupts. ``` r m4 <- mirai(rlang::interrupt()) # simulates a user interrupt is_mirai_interrupt(m4[]) #> [1] TRUE ``` If execution of a mirai surpasses the timeout set via the '.timeout' argument, the mirai will resolve to an 'errorValue' of 5L (timed out). This can, amongst other things, guard against mirai processes that have the potential to hang and never return. ``` r m5 <- mirai(nanonext::msleep(1000), .timeout = 500) m5[] #> 'errorValue' int 5 | Timed out is_mirai_error(m5$data) #> [1] FALSE is_mirai_interrupt(m5$data) #> [1] FALSE is_error_value(m5$data) #> [1] TRUE ``` `is_error_value()` tests for all mirai execution errors, user interrupts and timeouts. [« Back to ToC](#table-of-contents)