Skip to content

Commit

Permalink
Merge pull request #77 from mlverse/updates-1
Browse files Browse the repository at this point in the history
RSTUDIO_PYTHON in Posit Connect
  • Loading branch information
edgararuiz authored Nov 29, 2023
2 parents c1baa21 + 4f65380 commit 8d30970
Show file tree
Hide file tree
Showing 4 changed files with 36 additions and 9 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: pysparklyr
Title: Provides a 'PySpark' Back-End for the 'sparklyr' Package
Version: 0.1.1.9003
Version: 0.1.1.9004
Authors@R: c(
person("Edgar", "Ruiz", , "[email protected]", role = c("aut", "cre")),
person(given = "Posit Software, PBC", role = c("cph", "fnd"))
Expand Down
35 changes: 29 additions & 6 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,34 @@

### Improvements

* When connecting,`spark_connect()`, it will automatically prompt the
user to install a Python Environment a viable one is not not found.
This way, the R user will not have to run `install_databricks()`/
`install_pyspark()` manually when using the package for the first time. (#69)

* Instead of simply warning that `RETICULATE_PYTHON` is set, it will now un-set
the variable. This allows `pysparklyr` to select the correct Python environment.
It will output a console message to the user when the variable is un-set. (#65)

* Adds enhanced RStudio Snippet for Databricks connections. It will automatically
check the cluster's version by pooling the Databricks REST API with the cluster's
ID. This to check if there is a pre-installed Python environment that will
suport the cluster's version. All these generate notifications in the snippet's
UI. It also adds integration with Posit Workbench's new 'Databricks' pane. The
snippet looks for a specific environment variable that Posit Workbench temporarily
sets with the value of the cluster ID, and initializes the snippet with that
value. (#53)

* Adds `install_ml` argument to `install_databricks()` and `install_pyspark().
This will allow the R user to avoid installing the large Python libraries
required for ML operations. This will be specially helpful for machines with
scant storage.
scant storage. (#63)

* Adds support for Databricks OAuth by adding a handler to the Posit Connect
integration. Internally, it centralizes the authentication processing into
one un-exported function. (#68)

* General improvements to all of console outputs

### Machine Learning

Expand All @@ -21,15 +44,15 @@ scant storage.

* Adds `ml_prepare_dataset()` in lieu of a Vector Assembler transformer

### Fixes

* Fixes error in use_envname() - No environment name provided, and no
environment was automatically identified (#71)

# pysparklyr 0.1.1

### Improvements

* When connecting (`spark_connect()`), it will automatically prompt the
R user to install a Python Environment if one with the expected name is
not found. This way, the R user will not have to run `install_databricks()`/
`install_pyspark()` manually when using the package for the first time.

* Adds URL sanitation routine for the Databricks Host. It will remove trailing
forward slashes, and add scheme (https) if missing. The Host sanitation can be
skipped by passing `host_sanitize = FALSE` to `spark_connect()`.
Expand Down
2 changes: 1 addition & 1 deletion R/databricks-utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ databricks_token <- function(token = NULL, fail = FALSE) {
" - No Databricks OAuth token found \n",
" - Not passed as a function argument"
),
"Please add your Host to 'DATABRICKS_TOKEN' inside your .Renviron file."
"Please add your Token to 'DATABRICKS_TOKEN' inside your .Renviron file."
))
} else {
name <- NULL
Expand Down
6 changes: 5 additions & 1 deletion R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,13 @@
NULL

reticulate_python_check <- function(ignore = FALSE, unset = TRUE, message = TRUE) {
if (ignore) {

in_connect <- Sys.getenv("R_CONFIG_ACTIVE") == "rsconnect"

if (ignore | in_connect) {
return(invisible)
}

env_var <- Sys.getenv("RETICULATE_PYTHON", unset = NA)
if (!is.na(env_var)) {
if (unset) {
Expand Down

0 comments on commit 8d30970

Please sign in to comment.