Add sd.greta_array method #506

njtierney · 2022-03-18T05:23:06Z

resolves #504

tests currently fail

njtierney · 2022-03-18T07:42:23Z

Oh this is juicy, turns out that tensorflow implements the same use of SD as numpy.

Numpy uses N, not N-1 , when computing variance
R uses N-1 when computing variance

If we want to make greta "R equivalent", then we need to use N - 1.
Which to me makes sense, since greta is about providing an interface into TF while remaining familiar to an R user.
We could also have an argument where you can control which one you use...which adds a bit of confusion / spice to everything here.

Here's a bit of proof about this, using the current implementation of it in this branch:

# ugh, OK so tensorflow divides by N, not N - 1

regular_var <- function(x, n_minus_1 = TRUE){
  total_ss <- sum((x - mean(x))^2)
  if (n_minus_1){
    return(total_ss / (length(x) - 1))
  } else if (!n_minus_1){
    return(total_ss / length(x))
  }
}

regular_sd <- function(x, n_minus_1 = TRUE){
  var <- regular_var(x, n_minus_1 = n_minus_1)
  sqrt(var)
}

# this is what R does
sd(1:100)
#> [1] 29.01149
regular_sd(1:100, n_minus_1 = TRUE)
#> [1] 29.01149

# this is what numpy does
# https://numpy.org/doc/stable/reference/generated/numpy.var.html#numpy.var
# which is what tensorflow replicated
# 
regular_sd(1:100, n_minus_1 = FALSE)
#> [1] 28.86607

library(greta)
#> 
#> Attaching package: 'greta'
#> The following objects are masked from 'package:stats':
#> 
#>     binomial, cov2cor, poisson, sd
#> The following objects are masked from 'package:base':
#> 
#>     %*%, apply, backsolve, beta, chol2inv, colMeans, colSums, diag,
#>     eigen, forwardsolve, gamma, identity, rowMeans, rowSums, sweep,
#>     tapply
greta_sitrep()
#> ℹ checking if python available
#> ✓ python (version 3.7) available
#> 
#> ℹ checking if TensorFlow available
#> ✓ TensorFlow (version 1.14.0) available
#> 
#> ℹ checking if TensorFlow Probability available
#> ✓ TensorFlow Probability (version 0.7.0) available
#> 
#> ℹ checking if greta conda environment available
#> ✓ greta conda environment available
#> 
#> ℹ Initialising python and checking dependencies, this may take a moment.
#> ✓ Initialising python and checking dependencies ... done!
#> 
#> ℹ greta is ready to use!
x <- as_data(1:100)

greta_sd <- as.numeric(calculate(val = sd(x)))
greta_sd
#> [1] 28.86607

greta_sd == regular_sd(1:100, n_minus_1 = FALSE)
#> [1] TRUE

^{Created on 2022-03-18 by the reprex package (v2.0.1)}

Session info

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.3 (2022-03-10)
#>  os       macOS Big Sur/Monterey 10.16
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_AU.UTF-8
#>  ctype    en_AU.UTF-8
#>  tz       Australia/Perth
#>  date     2022-03-18
#>  pandoc   2.17.1.1 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  backports     1.4.1      2021-12-13 [1] CRAN (R 4.1.0)
#>  base64enc     0.1-3      2015-07-28 [1] CRAN (R 4.1.0)
#>  callr         3.7.0      2021-04-20 [1] CRAN (R 4.1.0)
#>  cli           3.2.0      2022-02-14 [1] CRAN (R 4.1.2)
#>  coda          0.19-4     2020-09-30 [1] CRAN (R 4.1.0)
#>  codetools     0.2-18     2020-11-04 [1] CRAN (R 4.1.3)
#>  crayon        1.5.0      2022-02-14 [1] CRAN (R 4.1.2)
#>  digest        0.6.29     2021-12-01 [1] CRAN (R 4.1.0)
#>  ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate      0.15       2022-02-18 [1] CRAN (R 4.1.2)
#>  fansi         1.0.2      2022-01-14 [1] CRAN (R 4.1.2)
#>  fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.1.0)
#>  fs            1.5.2      2021-12-08 [1] CRAN (R 4.1.0)
#>  future        1.24.0     2022-02-19 [1] CRAN (R 4.1.2)
#>  globals       0.14.0     2020-11-22 [1] CRAN (R 4.1.0)
#>  glue          1.6.2      2022-02-24 [1] CRAN (R 4.1.2)
#>  greta       * 0.4.1.9000 2022-03-18 [1] local
#>  here          1.0.1      2020-12-13 [1] CRAN (R 4.1.0)
#>  highr         0.9        2021-04-16 [1] CRAN (R 4.1.0)
#>  hms           1.1.1      2021-09-26 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.2      2021-08-25 [1] CRAN (R 4.1.0)
#>  jsonlite      1.8.0      2022-02-22 [1] CRAN (R 4.1.2)
#>  knitr         1.37       2021-12-16 [1] CRAN (R 4.1.0)
#>  lattice       0.20-45    2021-09-22 [1] CRAN (R 4.1.3)
#>  lifecycle     1.0.1      2021-09-24 [1] CRAN (R 4.1.0)
#>  listenv       0.8.0      2019-12-05 [1] CRAN (R 4.1.0)
#>  magrittr      2.0.2      2022-01-26 [1] CRAN (R 4.1.2)
#>  Matrix        1.4-0      2021-12-08 [1] CRAN (R 4.1.3)
#>  parallelly    1.30.0     2021-12-17 [1] CRAN (R 4.1.0)
#>  pillar        1.7.0      2022-02-01 [1] CRAN (R 4.1.2)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.1.0)
#>  png           0.1-7      2013-12-03 [1] CRAN (R 4.1.0)
#>  prettyunits   1.1.1      2020-01-24 [1] CRAN (R 4.1.0)
#>  processx      3.5.2      2021-04-30 [1] CRAN (R 4.1.0)
#>  progress      1.2.2      2019-05-16 [1] CRAN (R 4.1.0)
#>  ps            1.6.0      2021-02-28 [1] CRAN (R 4.1.0)
#>  purrr         0.3.4      2020-04-17 [1] CRAN (R 4.1.0)
#>  R.cache       0.15.0     2021-04-30 [1] CRAN (R 4.1.0)
#>  R.methodsS3   1.8.1      2020-08-26 [1] CRAN (R 4.1.0)
#>  R.oo          1.24.0     2020-08-26 [1] CRAN (R 4.1.0)
#>  R.utils       2.11.0     2021-09-26 [1] CRAN (R 4.1.0)
#>  R6            2.5.1      2021-08-19 [1] CRAN (R 4.1.0)
#>  Rcpp          1.0.8.2    2022-03-11 [1] CRAN (R 4.1.2)
#>  reprex        2.0.1      2021-08-05 [1] CRAN (R 4.1.0)
#>  reticulate    1.24       2022-01-26 [1] CRAN (R 4.1.2)
#>  rlang         1.0.2      2022-03-04 [1] CRAN (R 4.1.2)
#>  rmarkdown     2.11       2021-09-14 [1] CRAN (R 4.1.0)
#>  rprojroot     2.0.2      2020-11-15 [1] CRAN (R 4.1.0)
#>  rstudioapi    0.13       2020-11-12 [1] CRAN (R 4.1.0)
#>  sessioninfo   1.2.2.9000 2022-03-01 [1] Github (r-lib/sessioninfo@d70760d)
#>  stringi       1.7.6      2021-11-29 [1] CRAN (R 4.1.0)
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 4.1.0)
#>  styler        1.6.2      2021-09-23 [1] CRAN (R 4.1.0)
#>  tensorflow    2.8.0      2022-02-09 [1] CRAN (R 4.1.2)
#>  tfruns        1.5.0      2021-02-26 [1] CRAN (R 4.1.0)
#>  tibble        3.1.6      2021-11-07 [1] CRAN (R 4.1.0)
#>  utf8          1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs         0.3.8      2021-04-29 [1] CRAN (R 4.1.0)
#>  whisker       0.4        2019-08-28 [1] CRAN (R 4.1.0)
#>  withr         2.5.0      2022-03-03 [1] CRAN (R 4.1.2)
#>  xfun          0.30       2022-03-02 [1] CRAN (R 4.1.2)
#>  yaml          2.3.5      2022-02-21 [1] CRAN (R 4.1.2)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
#> 
#> ─ Python configuration ───────────────────────────────────────────────────────
#>  python:         /Users/njtierney/Library/r-miniconda/envs/greta-env/bin/python
#>  libpython:      /Users/njtierney/Library/r-miniconda/envs/greta-env/lib/libpython3.7m.dylib
#>  pythonhome:     /Users/njtierney/Library/r-miniconda/envs/greta-env:/Users/njtierney/Library/r-miniconda/envs/greta-env
#>  version:        3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 05:59:23)  [Clang 11.1.0 ]
#>  numpy:          /Users/njtierney/Library/r-miniconda/envs/greta-env/lib/python3.7/site-packages/numpy
#>  numpy_version:  1.16.4
#>  tensorflow:     /Users/njtierney/Library/r-miniconda/envs/greta-env/lib/python3.7/site-packages/tensorflow
#>  
#>  NOTE: Python version was forced by use_python function
#> 
#> ──────────────────────────────────────────────────────────────────────────────

M1 workaround CRAN patch

Increment version number to 0.4.2.9000

njtierney · 2022-03-30T02:44:51Z

One alternative would be to write out the following in tensorflow:

regular_sd <- function(x, n_minus_1 = TRUE){
  if (!n_minus_1) {
    n_dim <- length(dim(x))
    reduction_dims <- seq_len(n_dim - 1)
    sd_result <- tf$math$reduce_std(x, axis = reduction_dims, keepdims = !drop)
  } else if (n_minus_1){
    # replace these parts with tf_sum and friends?
    x_mean_sq <- tf_mean(x) * tf_mean(x)
    total_ss <- tf_sum(x - tf_mean_sq)
    var <- total_ss / (length(x) - 1)
    sd_result <- sqrt(var)
    # total_ss <- sum((x - mean(x))^2)
    # var <- (total_ss / (length(x) - 1))
    # sd_result <- sqrt(var)
  }
  return(sd_result)
}

And provide an optional ddof argument, as provided in NumPy (but not TF): https://numpy.org/doc/stable/reference/generated/numpy.std.html

with ddof being 1, which is R's default. If it is 0, perhaps just run reduce_std

…tandard deviation

github-actions · 2022-03-30T03:14:31Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if dbadadf is merged into master:

:ballot_box_with_check:basic_example: 27.8s -> 27.5s [-16.27%, +14.16%]
:ballot_box_with_check:create_model: 586ms -> 585ms [-2.36%, +2.01%]
:ballot_box_with_check:create_normal: 151ms -> 151ms [-7.38%, +7.55%]
:ballot_box_with_check:run_mcmc: 10.8s -> 10.7s [-3.21%, +1.28%]
Further explanation regarding interpretation and methodology can be found in the documentation.

tests currently fail

…tandard deviation

…reta into implement-sd-var-504

…dimension that is being returned

github-actions · 2022-03-31T06:53:45Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 1ab817d is merged into master:

:ballot_box_with_check:basic_example: 23.8s -> 24.1s [-17.16%, +19.76%]
:ballot_box_with_check:create_model: 483ms -> 484ms [-1.23%, +1.57%]
:ballot_box_with_check:create_normal: 125ms -> 127ms [-4.11%, +6.93%]
:ballot_box_with_check:run_mcmc: 9.15s -> 9.16s [-0.84%, +1.08%]
Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions · 2022-03-31T07:11:44Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 897c035 is merged into master:

:ballot_box_with_check:basic_example: 26.4s -> 26.5s [-11.7%, +11.86%]
:ballot_box_with_check:create_model: 551ms -> 554ms [-2.71%, +3.7%]
:ballot_box_with_check:create_normal: 144ms -> 144ms [-10.02%, +9.99%]
:ballot_box_with_check:run_mcmc: 10.4s -> 10.4s [-2.99%, +1.99%]
Further explanation regarding interpretation and methodology can be found in the documentation.

lorenzwalthert · 2022-03-31T20:09:51Z

Looks like {touchstone} is working. :D. Note that you can make confidence intervals arbitrary tight by running more iterations in each expression you benchmark, see the doc for benchmark_run().

implement first pass at using sd.greta_array method

71ebf73

tests currently fail

njtierney added this to the 0.4.2 milestone Mar 18, 2022

njtierney mentioned this pull request Mar 18, 2022

sd() on greta_array does not return a greta_array #504

Open

njtierney and others added 14 commits March 21, 2022 08:41

export is_mac_arm64()

1ccf56b

downgrade check for M1 from an error to a message

069043a

only evaluate if not an M1 mac

d57b83e

change back to error not message

65095c4

remove documentation for is_mac_arm64; add is_not_cran() function

83140c2

evaluate code depending on whether this is being assessed on CRAN

981abca

use message instead of stop

76f5475

change to 0.4.2

c1fbc4f

update codemeta

924c6ea

update cran-comments

afaef4e

remove is_not_cran() function

6426e2f

Merge pull request greta-dev#508 from njtierney/m1-workaround-507

1b69af2

M1 workaround CRAN patch

Increment version number to 0.4.2.9000

45657cd

Merge pull request greta-dev#510 from njtierney/use-dev-version

4f61e0b

Increment version number to 0.4.2.9000

njtierney added the Up Next label Mar 28, 2022

first attempt at using tensorflow to make different calculations of s…

ef1069d

…tandard deviation

njtierney removed the Up Next label Mar 30, 2022

njtierney added 6 commits March 31, 2022 14:35

implement using correct tensorflow dimensions etc

7b9c5b9

implement first pass at using sd.greta_array method

0ebf81d

tests currently fail

first attempt at using tensorflow to make different calculations of s…

da46fdd

…tandard deviation

implement using correct tensorflow dimensions etc

908be48

Merge branch 'implement-sd-var-504' of https://github.com/njtierney/g…

3f0447a

…reta into implement-sd-var-504

might need to explore how things are happening with skip dim and the …

7321aa0

…dimension that is being returned

njtierney modified the milestones: 0.5.0, 0.6.0 Feb 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sd.greta_array method #506

Add sd.greta_array method #506

njtierney commented Mar 18, 2022

njtierney commented Mar 18, 2022

njtierney commented Mar 30, 2022

github-actions bot commented Mar 30, 2022

github-actions bot commented Mar 31, 2022

github-actions bot commented Mar 31, 2022

lorenzwalthert commented Mar 31, 2022 •

edited

Loading

Add sd.greta_array method #506

Are you sure you want to change the base?

Add sd.greta_array method #506

Conversation

njtierney commented Mar 18, 2022

njtierney commented Mar 18, 2022

njtierney commented Mar 30, 2022

github-actions bot commented Mar 30, 2022

github-actions bot commented Mar 31, 2022

github-actions bot commented Mar 31, 2022

lorenzwalthert commented Mar 31, 2022 • edited Loading

lorenzwalthert commented Mar 31, 2022 •

edited

Loading