Skip to content
This repository has been archived by the owner on Jun 30, 2024. It is now read-only.

Commit

Permalink
Merge pull request #7 from extendr/conversions
Browse files Browse the repository at this point in the history
Data-conversion chapter
  • Loading branch information
JosiahParry authored Mar 2, 2024
2 parents 394195b + 2775247 commit 5708d46
Show file tree
Hide file tree
Showing 5 changed files with 187 additions and 0 deletions.
15 changes: 15 additions & 0 deletions _freeze/conversion/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"hash": "83a91459805cb87056cd6423ed50e8dd",
"result": {
"engine": "knitr",
"markdown": "---\ntitle: \"Conversion to and from R data\"\n---\n\n\nOne of the key goals with extendr, is to provide a framework that allows you to write Rust functions, that interact with R, without having to know the intricacies within R internals, or even R's C-facilities. However, this is unavoidable if one wishes to understand why the extendr-api is the way it is.\n\nThus, for introducing extendr, we shall mention facts about R internals, but these are not necessary to keep in mind going forward.\n\n\n::: {.cell}\n\n:::\n\n\nA fundamental data-type in R is the 32-bit integer, `int` in C, and `i32` in Rust. Passing that type around is essential, and straight forward:\n\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr(use_try_from = true)]\nfn ultimate_answer() -> i32 {\n return 42_i32;\n}\n```\n:::\n\n\nAnd now this function is available within your R-session, as the output is 42.\n\nAlso, another fundamental data-type in R is `numeric` / `f64`, which we can also pass back and forth uninhibitated, e.g.\n\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr(use_try_from = true)]\nfn return_tau() -> f64 {\n std::f64::consts::TAU\n} \n```\n:::\n\n\nwhere $\\tau := 2\\pi =$ $6.2831853$.\n\nHowever, passing data from R to Rust must be done with a bit of care: In R, representing a true integer in literal form requires using `L` after the literal. \n\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr(use_try_from = true)]\nfn bit_left_shift_once(number: i32) -> i32 {\n number << 1\n}\n```\n:::\n\n\nThis function supposedly is a clever way to multiply by two, however passing `bit_left_shift_once(21.1)` results in\n\n\n::: {.cell}\n::: {.cell-output .cell-output-error}\n\n```\nError in bit_left_shift_once(21.1): Expected an integer or a float representing a whole number, got 21.1\n```\n\n\n:::\n:::\n\nwhere `bit_left_shift_once(21)` is 42, as expected.\n\nR also has the concept of missing numbers, `NA` encoded within its data-model. However `i32`/`f64` do not natively have a representation for `NA` e.g.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbit_left_shift_once(NA_integer_)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\nError in bit_left_shift_once(NA_integer_): Must not be NA.\n```\n\n\n:::\n\n```{.r .cell-code}\nbit_left_shift_once(NA_real_)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\nError in bit_left_shift_once(NA_real_): Must not be NA.\n```\n\n\n:::\n\n```{.r .cell-code}\nbit_left_shift_once(NA)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\nError in bit_left_shift_once(NA): Must not be NA.\n```\n\n\n:::\n:::\n\n\nInstead, we have to rely on extendr's scalar variants of R types, `Rint` / `Rfloat` to encompass the notion of `NA` in our functions:\n\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr(use_try_from = true)]\nfn double_me(value: Rint) -> Rint {\n if value.is_na() {\n Rint::na()\n } else {\n (value.inner() << 1).into()\n }\n}\n```\n:::\n\nwhich means, we can now handle missing values in the arguments\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndouble_me(NA_integer_)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] NA\n```\n\n\n:::\n\n```{.r .cell-code}\ndouble_me(NA_real_)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] NA\n```\n\n\n:::\n\n```{.r .cell-code}\ndouble_me(NA)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] NA\n```\n\n\n:::\n:::\n\n\nOne may notice here that `NA_real_` was accepted even for an `Rint`. The reason\nfor this, is when you specify a type without `&`/`&mut`, the value is coerced\nin a similar way, as R coerces values. In order to have strict type-checking\nduring run-time, use `&` / `&mut`, as\n\n\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr(use_try_from = true)]\nfn wrong_input(value: &Rint) -> Rint {\n value.clone()\n}\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nwrong_input(NA_integer_)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\nError in wrong_input(NA_integer_): Must not be NA.\n```\n\n\n:::\n\n```{.r .cell-code}\nwrong_input(NA_real_)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\nError in wrong_input(NA_real_): expected 13, got 14\n```\n\n\n:::\n\n```{.r .cell-code}\nwrong_input(21.0)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\nError in wrong_input(21): expected 13, got 14\n```\n\n\n:::\n\n```{.r .cell-code}\nwrong_input(21L)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 21\n```\n\n\n:::\n:::\n\n\nHere, only the last literal is a true `Rint`. \n\n## Vectors\n\nMost data in R are vectors. Scalar values are in fact 1-sized vectors, and\neven lists are defined by a vector-type. A vector type in Rust is `Vec`. A\n`Vec` has a type-information, length, and capacity. This means, that if necessary,\nwe may expand any given `Vec`-data to contain more values, and only when capacity\nis exceeded, will there be a reallocation.\n\nNaively, we may define a function like so\n\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr(use_try_from = true)]\nfn repeat_us(mut values: Vec<i32>) -> Vec<i32> {\n assert_eq!(values.capacity(), values.len(), \"must have zero capacity left\");\n values[0] = 100;\n values.push(55);\n values\n}\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(1L, 2L, 33L)\nrepeat_us(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 100 2 33 55\n```\n\n\n:::\n:::\n\n\nEven if the argument is `mut Vec<_>`, what happens is that the R vector gets\nconverted to a Rust owned type, and it is that type that we can modify, and augment, with syncing to the original data.\n\nOf course, a slice e.g. `&[i32]` / `&mut [i32]` could be used instead, and this allows us to modify the original data, i.e.\n\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr(use_try_from = true)]\nfn zero_middle_element(values: &mut [i32]) {\n let len = values.len();\n let middle = len / 2;\n values[middle] = 0;\n}\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(100L, 200L, 300L)\nzero_middle_element(x)\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 100 0 300\n```\n\n\n:::\n:::\n\n\nThis is great! If we wanted to insert an `NA` in the middle, we would have had to operate on `&mut [Rint]` instead. \n\nA slice is a representation of a sequence of elements that are part of a larger collection. Since they represent only part of a collection (vector, in this case), we cannot add new elements to this. To do so, we have to rely on extendr provided types, that provide a `Vec`-like API to R's vector-types. These are the `Integers`, `Logicals`, `Doubles`, and `Strings` types.\n\n## Strings are special\n\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": {},
"postProcess": true
}
}
15 changes: 15 additions & 0 deletions _freeze/conversion/execute-results/typ.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"hash": "f9505dcfe899fb366467fde237addb75",
"result": {
"engine": "knitr",
"markdown": "---\ntitle: \"Conversion to and from R data\"\nformat: typst\n---\n\n\nOne of the key goals with extendr, is to provide a framework that allows you to write Rust functions, that interact with R, without having to know the intricacies within R internals, or even R's C-facilities. However, this is unavoidable if one wishes to understand why the extendr-api is the way it is.\n\nThus, for introducing extendr, we shall mention facts about R internals, but these are not necessary to keep in mind going forward.\n\nA fundamental data-type in R is the 32-bit integer, `int` in C, and `i32` in Rust. Passing that type around is essential, and straight forward:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(rextendr)\nnames(knitr::knit_engines$get())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n [1] \"awk\" \"bash\" \"coffee\" \"gawk\" \"groovy\" \n [6] \"haskell\" \"lein\" \"mysql\" \"node\" \"octave\" \n[11] \"perl\" \"php\" \"psql\" \"Rscript\" \"ruby\" \n[16] \"sas\" \"scala\" \"sed\" \"sh\" \"stata\" \n[21] \"zsh\" \"asis\" \"asy\" \"block\" \"block2\" \n[26] \"bslib\" \"c\" \"cat\" \"cc\" \"comment\" \n[31] \"css\" \"ditaa\" \"dot\" \"embed\" \"eviews\" \n[36] \"exec\" \"fortran\" \"fortran95\" \"go\" \"highlight\" \n[41] \"js\" \"julia\" \"python\" \"R\" \"Rcpp\" \n[46] \"sass\" \"scss\" \"sql\" \"stan\" \"targets\" \n[51] \"tikz\" \"verbatim\" \"ojs\" \"mermaid\" \"glue\" \n[56] \"glue_sql\" \"gluesql\" \"extendr\" \"extendrsrc\"\n```\n\n\n:::\n:::\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr(use_try_from = true)]\nfn ultimate_answer() -> i32 {\n return 42_i32;\n}\n```\n:::\n\n\nAnd now this function is available within your R-session, as the output is 42.\n\nAlso, another fundamental data-type in R is `numeric` / `f64`, which we can also pass back and forth uninhibitated, e.g.\n\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr]\nfn return_tau() -> f64 {\n std::f64::consts::TAU\n} \n```\n:::\n\n\nwhere $\\tau$\n\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": null,
"postProcess": false
}
}
7 changes: 7 additions & 0 deletions _freeze/site_libs/clipboard/clipboard.min.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
project:
type: website
execute-dir: project

execute:
freeze: auto
Expand Down
Loading

0 comments on commit 5708d46

Please sign in to comment.