Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quarto_render changes the class of vectors containing NA values #168

Open
debdagybra opened this issue Mar 25, 2024 · 10 comments
Open

quarto_render changes the class of vectors containing NA values #168

debdagybra opened this issue Mar 25, 2024 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@debdagybra
Copy link

quarto_render() changes NA of numeric vectors to character ".na.real"

Similar to #124


library(quarto)
quarto_render("test.qmd",
              execute_params = list(
                test_vec = c(1, NA, 2.5, -6.33, NaN, Inf)
              ))

Content of "test.qmd"

---
title: "test"
format: html
params:
  test_vec: "test_vec"
---

### With params:  
class: `r class(params$test_vec)`  
values: `r params$test_vec`  
is.na: `r is.na(params$test_vec)`  

```{r setup, include = FALSE}
test_vec <- c(1, NA, 2.5, -6.33, NaN, Inf)
```

### Created within qmd file:  
class: `r class(test_vec)`  
values: `r test_vec`  
is.na: `r is.na(test_vec)`  

Returns:

With params:

class: character
values: 1, .na.real, 2.5, -6.33, NA, NA
is.na: FALSE, FALSE, FALSE, FALSE, TRUE, TRUE

Created within qmd file:

class: numeric
values: 1, NA, 2.5, -6.33, NaN,
is.na: FALSE, TRUE, FALSE, FALSE, TRUE, FALSE


This behaviour occurs also to columns of dataframes and tibbles.


Windows 11
rstudio version: 2023.12.1+402
Quarto 1.5.26
[>] Checking versions of quarto binary dependencies...
Pandoc version 3.1.11: OK
Dart Sass version 1.70.0: OK
Deno version 1.41.0: OK
[>] Checking versions of quarto dependencies......OK
[>] Checking Quarto installation......OK
Version: 1.5.26
Path: C:\Users\xxxxxx\AppData\Local\Programs\Quarto\bin
CodePage: 1252

[>] Checking tools....................OK
TinyTeX: (not installed)
Chromium: (not installed)

[>] Checking LaTeX....................OK
Tex: (not detected)

[>] Checking basic markdown render....OK

[>] Checking Python 3 installation....(None)
Unable to locate an installed version of Python 3.
Install Python 3 from https://www.python.org/downloads/

[>] Checking R installation...........OK
Version: 4.3.3
Path: C:/PROGRA1/R/R-431.3
LibPaths:
- C:/Users/xxxxxx/AppData/Local/R/win-library/4.3
- C:/Program Files/R/R-4.3.3/library
knitr: 1.45
rmarkdown: 2.26

[>] Checking Knitr engine render......OK

@debdagybra
Copy link
Author

Same with

  • Character : c("a", NA, "b") returns "a", ".na.character", "b"
  • Date and Datetime from the lubridate package: lubridate::as_date(c("2024-03-26", NA)) returns "19808", ".na.real"

@cderv
Copy link
Collaborator

cderv commented Mar 26, 2024

Indeed. Thanks for the report.

We are using yaml R package for writing the R objects, and they do use those special values because YAML spec does not have something for NA
https://github.com/vubiostat/r-yaml/blob/81f8903232bf125853901f62cdff3934b96eb1a5/inst/CHANGELOG#L112-L124

What would you expect a NA value in R to be in YAML ?

I don't think R NA can be represented in YAML without loosing information;

We could either add a handler for converting to NULL but this could cause issue probably to forcibly coerce. NULL and NA are not the same in R.

To be conservative, we could error when we detect any NA in the conversion, asking to check the execute_params object. Using --execute-params CLI flag to quarto render you would not be able to add NA.

Curious of your thought on this.

@cderv cderv added the enhancement New feature or request label Mar 26, 2024
@cderv cderv self-assigned this Mar 26, 2024
@debdagybra
Copy link
Author

I don't have enough knowlegde about YAML to answer your question about the NA's.

I don't think you should return an error when any value in execute_params is NA. It's quite restrictive and since it's allowed in basic R to have vectors containing NA, its should be possible to use them with quarto.

But the original object class and values should be preserved when called with params$, shouldn't they ?
It's odd to put a numeric vector in execute_params and get a character vector in the .qmd file.

With rmarkdown, the vector is well preserved.

library(rmarkdown)
rmarkdown::render("test.qmd",
                  params = list(
                test_vec = c(1, NA, 2.5, -6.33, NaN, Inf)
              ))

returns:

With params:

class: numeric
values: 1, NA, 2.5, -6.33, NaN,
is.na: FALSE, TRUE, FALSE, FALSE, TRUE, FALSE

Created within qmd file:

class: numeric
values: 1, NA, 2.5, -6.33, NaN,
is.na: FALSE, TRUE, FALSE, FALSE, TRUE, FALSE


And in quarto itself, for logical vectors, the class is preserved correctly.

library(quarto)
quarto_render("test.qmd",
              execute_params = list(
                test_vec = c(TRUE, NA, FALSE)
              ))

returns :

With params:

class: logical
values: TRUE, NA, FALSE
is.na: FALSE, TRUE, FALSE

Created within qmd file:

class: logical
values: TRUE, NA, FALSE
is.na: FALSE, TRUE, FALSE

@debdagybra debdagybra changed the title quarto_render changes NA of numeric vectors to character ".na.real" quarto_render changes the class of vectors containing NA values Mar 26, 2024
@cderv
Copy link
Collaborator

cderv commented Mar 27, 2024

Let me reintroduce the context here.

quarto_render() is wrapper around quarto render for which one of the flag --execute-params which can take a YAML file to defined parameter. Doc is here https://quarto.org/docs/computations/parameters.html#rendering

quarto render document.qmd --execute-params params.yml

This means that for quarto, the only way to pass parameters is to use YAML syntax. And YAML syntax does not know about R objects.

Now, quarto_render() R function is a wrapper as I said, and instead of just asking for a YAML file to be provided as argument, a R list of object can be passed and the R function will take care of writing the YAML to pass to Quarto.

This is where the big difference here is with rmarkdown::render() where parameters are directly processed in R because rmarkdown is directly running R. Quarto is not.

So for example you can't pass a dataframe, or any other R specific object directly to quarto render because you would not be able to provide this as a YAML value. And so you cannot either in quarto_render() because there is no conversion in YAML spec for such object.

NA and its family is among those objects - there is no 1-1 representation in YAML. So when I asked the question "what value would you expect", this means :

If you were to use CLI with quarto render and not calling from R, how would you set up your params? You would not be able to pass some values.

So that is why I am thinking of an error if unsupported values are passed to execute_params because they are just not supported in quarto. Unfortunately, this is a limitation and you can't pass specific R objects.

For more example, this has been discussed also at

I won't close this here though because we indeed need to do something (prevent rendering or do force coercion to NULL ?) to avoid this .na.real problem.

Maybe in the future we'll find a solution in Quarto to have an API for yaml params that handles computation language specifics

@cderv cderv added bug Something isn't working and removed enhancement New feature or request labels Mar 27, 2024
@debdagybra
Copy link
Author

Thanks for the explanation and the new epic :)

I still don't understand why it's working with logical vectors.

According to the link you sent in your first message, if I understand well, the vector c(TRUE, NA, FALSE), should be converted to character c("TRUE", ".na", "FALSE") ? But it doesn"t, instead we get c(TRUE, NA, FALSE).

Can't we do the same with numeric and character vectors ?

So for example you can't pass a dataframe, or any other R specific object directly to quarto render because you would not be able to provide this as a YAML value. And so you cannot either in quarto_render() because there is no conversion in YAML spec for such object.

When I pass a dataframe or a tibble, I get also a df or tibble with params$, so I guess that quarto_render has done some magic to pass the class and/or attributes to the yaml ?

Maybe I'm naive but can we also pass the class of vectors in order to convert them back later ?

@cderv
Copy link
Collaborator

cderv commented Mar 27, 2024

According to the link you sent in your first message, if I understand well, the vector c(TRUE, NA, FALSE), should be converted to character c("TRUE", ".na", "FALSE") ? But it doesn"t, instead we get c(TRUE, NA, FALSE).

Oh that is interesting ! Thanks for pointing this out!

This is an issue from trying to solve #124 with 5207b6c

quarto-r/R/utils.R

Lines 6 to 16 in ba8485a

write_yaml <- function(x, file) {
handlers <- list(
# Handle yes/no from 1.1 to 1.2
# https://github.com/vubiostat/r-yaml/issues/131
logical = function(x) {
value <- ifelse(x, "true", "false")
structure(value, class = "verbatim")
}
)
yaml::write_yaml(x, file, handlers = handlers)
}

The handler for logical doesn't not handle NA specifically, and so if it encounters NA logical, it will use NA as verbatim instead of the .na which is what yaml::as.yaml() would have output.

It seems it does not cause issues for a .qmd file using engine: knitr, but it will for one using engine: jupyter

When I pass a dataframe or a tibble, I get also a df or tibble with params$, so I guess that quarto_render has done some magic to pass the class and/or attributes to the yaml ?

Can you share an example of this please ?

Maybe I'm naive but can we also pass the class of vectors in order to convert them back later ?

This is not as simple right now. Quarto is a tool to work with any computations engine, so anything done as a built-in feature must be working for R Python Julia and maybe other in the future. Hence also the EPIC as the parameter feature is not yet at that level.

Here we are:

  • Writing a list from R to YAML using yaml R package (which is 1.1 YAML spec)
  • Will then be read by Quarto directly using a YAML reader following (1.2 YAML spec)
  • and I think it is then encoded to JSON to be passed as a message to the engines

So c(TRUE, NA, FALSE) really became internally [ true, 'NA', false ] in quarto which is wrong really, but seems to work (by chance) with knitr engine because this is read as jsonlite::parse_json(..., simplifyVector = TRUE) which does the coercion from "NA" as string to NA as logical value

str(jsonlite::parse_json('[true, "NA", false]', simplifyVector = TRUE))
#>  logi [1:3] TRUE NA FALSE

This is indeed R specific here. Python does not have NA equivalent I think.

Take this .Qmd file

---
title: "test"
format: html
---

```{python}
#| tags: [parameters]
#| echo: true

test_vec = "test_vec"
```

```{python}
test_vec
```

If you render with your example,

library(quarto)
quarto_render("index.qmd",
              execute_params = list(
                  test_vec = c(TRUE, NA, FALSE)
              ))

The NA is a string
image

I got into some details, but I hope this illustrate why this is not as simple.

In R Markdown, rmarkdown::render(params = ) runs in R and pass the params as is without conversion to the rendering knitting processing.

So this explain the current limitation and why this require some more design (quarto-dev/quarto-cli#9197) if we want to support an API for parameter that could allow engine specific consideration.

@debdagybra
Copy link
Author

When I pass a dataframe or a tibble, I get also a df or tibble with params$, so I guess that quarto_render has done some magic to pass the class and/or attributes to the yaml ?

Can you share an example of this please ?

I was wrong, the dataframes and tibbles are converted to lists.
I thought they were preserved because the functions from dplyr were still working.

@debdagybra
Copy link
Author

debdagybra commented Mar 27, 2024

By the way, the workaround you suggested here https://forum.posit.co/t/param-converted-from-data-frame-to-list/155556/8 with RDS files works very well !
Thanks!

@debdagybra
Copy link
Author

Until it's resolved, maybe you can add a warning in quarto_render() to notify the user that some data are modified (when NA or when dataframe, ...) and guide them to the workaround with RDS file. A warning could save them a lot of time.

@cderv
Copy link
Collaborator

cderv commented Mar 28, 2024

Thanks for the feedback. I'll make it more apparent in the doc, and I'll probably throw an error for those specific R values that can't be translated. IMO, they shouldn't be used in execute_params at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants