--- title: "Databricks REPL" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Databricks REPL} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) library(brickster) ``` # Running Code Remotely `{brickster}` provides mechanisms to run code against Databricks, below is an overview of the available of those in the package: +--------------------------------------------------------------------------------------------------------------+-----------------------+-------------------------------------------------------------------------------+ | Method | Compatible Compute | Notes | +==============================================================================================================+=======================+===============================================================================+ | `db_sql_query` | SQL Warehouse | Simple and efficient function to run SQL | +--------------------------------------------------------------------------------------------------------------+-----------------------+-------------------------------------------------------------------------------+ | [SQL execution API](https://docs.databricks.com/api/workspace/statementexecution)\ | SQL Warehouse | Lower level functions that align 1:1 with API endpoints | | ([`db_sql_exec_*`](https://databrickslabs.github.io/brickster/reference/index.html#sql-statement-execution)) | | | +--------------------------------------------------------------------------------------------------------------+-----------------------+-------------------------------------------------------------------------------+ | Command execution context manager\ | Clusters\ | Higher level `R6` class for command execution contexts. | | (`db_context_manager`) | (Shared, Single User) | | +--------------------------------------------------------------------------------------------------------------+-----------------------+-------------------------------------------------------------------------------+ | [Command execution API](https://docs.databricks.com/api/workspace/commandexecution)\ | Clusters\ | Lower level functions that align 1:1 with API endpoints | | ([`db_context_*`](https://databrickslabs.github.io/brickster/reference/index.html#execution-contexts)) | (Shared, Single User) | | +--------------------------------------------------------------------------------------------------------------+-----------------------+-------------------------------------------------------------------------------+ | Databricks REPL\ | Clusters\ | Supports all notebook languages, R is only supported on single user clusters. | | (`db_repl`) | (Shared, Single User) | | +--------------------------------------------------------------------------------------------------------------+-----------------------+-------------------------------------------------------------------------------+ Databricks [REPL](https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop) (`db_repl()`) will be the focus of this article. # What is the Databricks REPL? The REPL temporarily connects the existing R console to a Databricks cluster (via [command execution APIs](https://docs.databricks.com/api/workspace/commandexecution)) and allows code in all supported languages to be sent interactively - as if it were running locally. ## Getting Started Using the REPL is simple, to start just provide `cluster_id`: ```{r} # start REPL db_repl(cluster_id = "") ``` The REPL will check the clusters state and start the cluster if inactive. The default language is `R`. After successfully connecting to the cluster you can run commands against the remote compute from the local session. ## Switching Languages The REPL has a shortcut you can enter `:` to change the active language. You can change between the following languages: | Language | Shortcut | |----------|----------| | R | `:r` | | Python | `:py` | | SQL | `:sql` | | Scala | `:scala` | | Shell | `:sh` | When you change between languages all variables should persist unless REPL is exited. ## Limitations - Development environments (e.g. RStudio, Positron) won't display variables from the remote contexts in the environment pane - HTML content will only render for Python, `{htmlwidgets}` rendering is restricted due to [notebook limitations that require a workaround](https://docs.databricks.com/en/visualizations/htmlwidgets.html) currently - Not designed to work with interactive serverless compute - Cannot persist or recover sessions - Multi-line expressions are only supported for R. Python, Scala, and SQL are limited to single line expressions.