This repository has been archived by the owner on Jun 30, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #7 from extendr/conversions
Data-conversion chapter
- Loading branch information
Showing
5 changed files
with
187 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"hash": "83a91459805cb87056cd6423ed50e8dd", | ||
"result": { | ||
"engine": "knitr", | ||
"markdown": "---\ntitle: \"Conversion to and from R data\"\n---\n\n\nOne of the key goals with extendr, is to provide a framework that allows you to write Rust functions, that interact with R, without having to know the intricacies within R internals, or even R's C-facilities. However, this is unavoidable if one wishes to understand why the extendr-api is the way it is.\n\nThus, for introducing extendr, we shall mention facts about R internals, but these are not necessary to keep in mind going forward.\n\n\n::: {.cell}\n\n:::\n\n\nA fundamental data-type in R is the 32-bit integer, `int` in C, and `i32` in Rust. Passing that type around is essential, and straight forward:\n\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr(use_try_from = true)]\nfn ultimate_answer() -> i32 {\n return 42_i32;\n}\n```\n:::\n\n\nAnd now this function is available within your R-session, as the output is 42.\n\nAlso, another fundamental data-type in R is `numeric` / `f64`, which we can also pass back and forth uninhibitated, e.g.\n\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr(use_try_from = true)]\nfn return_tau() -> f64 {\n std::f64::consts::TAU\n} \n```\n:::\n\n\nwhere $\\tau := 2\\pi =$ $6.2831853$.\n\nHowever, passing data from R to Rust must be done with a bit of care: In R, representing a true integer in literal form requires using `L` after the literal. \n\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr(use_try_from = true)]\nfn bit_left_shift_once(number: i32) -> i32 {\n number << 1\n}\n```\n:::\n\n\nThis function supposedly is a clever way to multiply by two, however passing `bit_left_shift_once(21.1)` results in\n\n\n::: {.cell}\n::: {.cell-output .cell-output-error}\n\n```\nError in bit_left_shift_once(21.1): Expected an integer or a float representing a whole number, got 21.1\n```\n\n\n:::\n:::\n\nwhere `bit_left_shift_once(21)` is 42, as expected.\n\nR also has the concept of missing numbers, `NA` encoded within its data-model. However `i32`/`f64` do not natively have a representation for `NA` e.g.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbit_left_shift_once(NA_integer_)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\nError in bit_left_shift_once(NA_integer_): Must not be NA.\n```\n\n\n:::\n\n```{.r .cell-code}\nbit_left_shift_once(NA_real_)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\nError in bit_left_shift_once(NA_real_): Must not be NA.\n```\n\n\n:::\n\n```{.r .cell-code}\nbit_left_shift_once(NA)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\nError in bit_left_shift_once(NA): Must not be NA.\n```\n\n\n:::\n:::\n\n\nInstead, we have to rely on extendr's scalar variants of R types, `Rint` / `Rfloat` to encompass the notion of `NA` in our functions:\n\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr(use_try_from = true)]\nfn double_me(value: Rint) -> Rint {\n if value.is_na() {\n Rint::na()\n } else {\n (value.inner() << 1).into()\n }\n}\n```\n:::\n\nwhich means, we can now handle missing values in the arguments\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndouble_me(NA_integer_)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] NA\n```\n\n\n:::\n\n```{.r .cell-code}\ndouble_me(NA_real_)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] NA\n```\n\n\n:::\n\n```{.r .cell-code}\ndouble_me(NA)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] NA\n```\n\n\n:::\n:::\n\n\nOne may notice here that `NA_real_` was accepted even for an `Rint`. The reason\nfor this, is when you specify a type without `&`/`&mut`, the value is coerced\nin a similar way, as R coerces values. In order to have strict type-checking\nduring run-time, use `&` / `&mut`, as\n\n\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr(use_try_from = true)]\nfn wrong_input(value: &Rint) -> Rint {\n value.clone()\n}\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nwrong_input(NA_integer_)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\nError in wrong_input(NA_integer_): Must not be NA.\n```\n\n\n:::\n\n```{.r .cell-code}\nwrong_input(NA_real_)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\nError in wrong_input(NA_real_): expected 13, got 14\n```\n\n\n:::\n\n```{.r .cell-code}\nwrong_input(21.0)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\nError in wrong_input(21): expected 13, got 14\n```\n\n\n:::\n\n```{.r .cell-code}\nwrong_input(21L)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 21\n```\n\n\n:::\n:::\n\n\nHere, only the last literal is a true `Rint`. \n\n## Vectors\n\nMost data in R are vectors. Scalar values are in fact 1-sized vectors, and\neven lists are defined by a vector-type. A vector type in Rust is `Vec`. A\n`Vec` has a type-information, length, and capacity. This means, that if necessary,\nwe may expand any given `Vec`-data to contain more values, and only when capacity\nis exceeded, will there be a reallocation.\n\nNaively, we may define a function like so\n\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr(use_try_from = true)]\nfn repeat_us(mut values: Vec<i32>) -> Vec<i32> {\n assert_eq!(values.capacity(), values.len(), \"must have zero capacity left\");\n values[0] = 100;\n values.push(55);\n values\n}\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(1L, 2L, 33L)\nrepeat_us(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 100 2 33 55\n```\n\n\n:::\n:::\n\n\nEven if the argument is `mut Vec<_>`, what happens is that the R vector gets\nconverted to a Rust owned type, and it is that type that we can modify, and augment, with syncing to the original data.\n\nOf course, a slice e.g. `&[i32]` / `&mut [i32]` could be used instead, and this allows us to modify the original data, i.e.\n\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr(use_try_from = true)]\nfn zero_middle_element(values: &mut [i32]) {\n let len = values.len();\n let middle = len / 2;\n values[middle] = 0;\n}\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(100L, 200L, 300L)\nzero_middle_element(x)\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 100 0 300\n```\n\n\n:::\n:::\n\n\nThis is great! If we wanted to insert an `NA` in the middle, we would have had to operate on `&mut [Rint]` instead. \n\nA slice is a representation of a sequence of elements that are part of a larger collection. Since they represent only part of a collection (vector, in this case), we cannot add new elements to this. To do so, we have to rely on extendr provided types, that provide a `Vec`-like API to R's vector-types. These are the `Integers`, `Logicals`, `Doubles`, and `Strings` types.\n\n## Strings are special\n\n", | ||
"supporting": [], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": {}, | ||
"engineDependencies": {}, | ||
"preserve": {}, | ||
"postProcess": true | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"hash": "f9505dcfe899fb366467fde237addb75", | ||
"result": { | ||
"engine": "knitr", | ||
"markdown": "---\ntitle: \"Conversion to and from R data\"\nformat: typst\n---\n\n\nOne of the key goals with extendr, is to provide a framework that allows you to write Rust functions, that interact with R, without having to know the intricacies within R internals, or even R's C-facilities. However, this is unavoidable if one wishes to understand why the extendr-api is the way it is.\n\nThus, for introducing extendr, we shall mention facts about R internals, but these are not necessary to keep in mind going forward.\n\nA fundamental data-type in R is the 32-bit integer, `int` in C, and `i32` in Rust. Passing that type around is essential, and straight forward:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(rextendr)\nnames(knitr::knit_engines$get())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n [1] \"awk\" \"bash\" \"coffee\" \"gawk\" \"groovy\" \n [6] \"haskell\" \"lein\" \"mysql\" \"node\" \"octave\" \n[11] \"perl\" \"php\" \"psql\" \"Rscript\" \"ruby\" \n[16] \"sas\" \"scala\" \"sed\" \"sh\" \"stata\" \n[21] \"zsh\" \"asis\" \"asy\" \"block\" \"block2\" \n[26] \"bslib\" \"c\" \"cat\" \"cc\" \"comment\" \n[31] \"css\" \"ditaa\" \"dot\" \"embed\" \"eviews\" \n[36] \"exec\" \"fortran\" \"fortran95\" \"go\" \"highlight\" \n[41] \"js\" \"julia\" \"python\" \"R\" \"Rcpp\" \n[46] \"sass\" \"scss\" \"sql\" \"stan\" \"targets\" \n[51] \"tikz\" \"verbatim\" \"ojs\" \"mermaid\" \"glue\" \n[56] \"glue_sql\" \"gluesql\" \"extendr\" \"extendrsrc\"\n```\n\n\n:::\n:::\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr(use_try_from = true)]\nfn ultimate_answer() -> i32 {\n return 42_i32;\n}\n```\n:::\n\n\nAnd now this function is available within your R-session, as the output is 42.\n\nAlso, another fundamental data-type in R is `numeric` / `f64`, which we can also pass back and forth uninhibitated, e.g.\n\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr]\nfn return_tau() -> f64 {\n std::f64::consts::TAU\n} \n```\n:::\n\n\nwhere $\\tau$\n\n", | ||
"supporting": [], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": {}, | ||
"engineDependencies": {}, | ||
"preserve": null, | ||
"postProcess": false | ||
} | ||
} |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
project: | ||
type: website | ||
execute-dir: project | ||
|
||
execute: | ||
freeze: auto | ||
|
Oops, something went wrong.