Should we change the way random walks are generated to allow dimensions #53

spsanderson · 2024-07-30T16:54:23Z

spsanderson
Jul 30, 2024
Maintainer

should we change the way random walks are generated in order to allow users to choose a type and thereby only need to call a function like 1D, 2D or 3D

library(plotly)

# Function to generate a single random walk
single_random_walk <- function(.n, .mu, .sd, .type) {
  if (.type != "random") {
    cumsum(sample(c(-1,1), .n, .5))
  } else {
    cumsum(stats::rnorm(.n, .mu, .sd))
  }
}

double_random_walk <- function(.n, .mu, .sd, .type){
  dplyr::tibble(
    x = single_random_walk(.n, .mu, .sd, .type),
    y = single_random_walk(.n, .mu, .sd, .type)
  )
}
double_random_walk(10, 0, 1, "random")

triple_random_walk <- function(.n, .mu, .sd, .type){
  dplyr::tibble(
    x = single_random_walk(.n, .mu, .sd, .type),
    y = single_random_walk(.n, .mu, .sd, .type),
    z = single_random_walk(.n, .mu, .sd, .type)
  )
}
test_3d <- triple_random_walk(1000, 0, .1, "not random")

plot_ly(
  data = test_3d,
  x = ~x,
  y = ~y,
  z = ~z,
  type = "scatter3d",
  mode = 'lines',
  opacity = 1, line = list(width = 1, color='steelblue', reverscale = FALSE)
)

What it really comes down to is how do we efficiently generate random walks of dimensions 1, 2 or 3 without having a function for each dimension

spsanderson · 2024-09-16T20:10:14Z

spsanderson
Sep 16, 2024
Maintainer Author

Here is an attempt at a new function to replace the current:

#' Generate Multiple Random Normal Walks in Multiple Dimensions
#'
#' @family Generator Functions
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @description
#' The `random_normal_walk` function generates multiple random walks in 1, 2, or 3 dimensions.
#' Each walk is a sequence of steps where each step is a random draw from a normal distribution.
#' The user can specify the number of walks, the number of steps in each walk, and the
#' parameters of the normal distribution (mean and standard deviation). The function
#' also allows for sampling a proportion of the steps and optionally sampling with replacement.
#'
#' @param .num_walks An integer specifying the number of random walks to generate. Default is 25.
#' @param .n An integer specifying the number of steps in each walk. Default is 100.
#' @param .mu A numeric value indicating the mean of the normal distribution. Default is 0.
#' @param .sd A numeric value indicating the standard deviation of the normal distribution. Default is 0.1.
#' @param .initial_value A numeric value indicating the initial value of the walks. Default is 0.
#' @param .samp A logical value indicating whether to sample the normal distribution values. Default is TRUE.
#' @param .replace A logical value indicating whether sampling is with replacement. Default is TRUE.
#' @param .sample_size A numeric value between 0 and 1 specifying the proportion of `.n` to sample. Default is 0.8.
#' @param .dimensions An integer specifying the number of dimensions (1, 2, or 3). Default is 1.
#'
#' @return A tibble containing the generated random walks with columns depending on the number of dimensions:
#' \itemize{
#'   \item `walk_number`: Factor representing the walk number.
#'   \item `step_number`: Step index.
#'   \item `y`: If `.dimensions = 1`, the value of the walk at each step.
#'   \item `x`, `y`: If `.dimensions = 2`, the values of the walk in two dimensions.
#'   \item `x`, `y`, `z`: If `.dimensions = 3`, the values of the walk in three dimensions.
#' }
#'
#' The following are also returned based upon how many dimensions there are and could be any of x, y and or z:
#' \itemize{
#'   \item `walk_number`: Factor representing the walk number.
#'   \item `x`: Step index.
#'   \item `y`: Normal distribution values.
#'   \item `cum_sum`: Cumulative sum of `y`.
#'   \item `cum_prod`: Cumulative product of `y`.
#'   \item `cum_min`: Cumulative minimum of `y`.
#'   \item `cum_max`: Cumulative maximum of `y`.
#' }
#' 
#' The tibble includes attributes for the function parameters.
#'
#' @examples
#' # Generate 10 random walks in 1 dimension with 50 steps each
#' set.seed(123)
#' random_normal_walk(.num_walks = 10, .n = 50)
#'
#' # Generate random walks in 2 dimensions
#' set.seed(123)
#' random_normal_walk(.num_walks = 10, .n = 50, .dimensions = 2)
#'
#' # Generate random walks in 3 dimensions
#' set.seed(123)
#' random_normal_walk(.num_walks = 10, .n = 50, .dimensions = 3)
#'
#' @export
random_normal_walk <- function(.num_walks = 25, .n = 100, .mu = 0, .sd = 0.1,
                               .initial_value = 0, .samp = TRUE, .replace = TRUE,
                               .sample_size = 0.8, .dimensions = 1) {
  
  # Defensive checks
  if (.num_walks < 0) {
    rlang::abort(".num_walks cannot be less than 0", use_cli_format = TRUE)
  }
  if (.n < 0) {
    rlang::abort(".n cannot be less than 0", use_cli_format = TRUE)
  }
  if (.mu < 0) {
    rlang::abort(".mu cannot be less than 0", use_cli_format = TRUE)
  }
  if (.sd < 0) {
    rlang::abort(".sd cannot be less than 0", use_cli_format = TRUE)
  }
  if (.sample_size < 0 || .sample_size > 1) {
    rlang::abort(".sample_size cannot be less than 0 or more than 1", 
                 use_cli_format = TRUE)
  }
  if (!.dimensions %in% c(1, 2, 3)) {
    rlang::abort("Number of dimensions must be 1, 2, or 3.", use_cli = TRUE)
  }
  
  # Variables
  num_walks     <- as.integer(.num_walks)
  n             <- as.integer(.n)
  mu            <- as.numeric(.mu)
  sd            <- as.numeric(.sd)
  initial_value <- as.numeric(.initial_value)
  replace       <- as.logical(.replace)
  samp          <- as.logical(.samp)
  samp_size     <- round(.sample_size * n, 0)
  periods       <- if (samp) samp_size else n
  
  # Define dimension names
  dim_names <- switch(.dimensions,
                      `1` = c("y"),
                      `2` = c("x", "y"),
                      `3` = c("x", "y", "z"))
  
  # Function to generate a single random walk
  generate_walk <- function(walk_num) {
    # Generate random steps for each dimension
    rand_steps <- purrr::map(
      dim_names, 
      ~ if (samp) {
          sample(stats::rnorm(n, mu, sd), size = periods, replace = replace)
        } else {
          stats::rnorm(periods, mu, sd)
        }
      )
    
    # Set column names
    rand_steps <- stats::setNames(rand_steps, dim_names)
    rand_steps <- purrr::map(rand_steps, \(x) dplyr::as_tibble(x)) |> 
      purrr::list_cbind()
    colnames(rand_steps) <- dim_names
    rand_steps <- purrr::map(
      rand_steps, \(x) x |> 
        unlist(use.names = FALSE)) |> 
      dplyr::as_tibble()
    
    # Combine into a tibble
    dplyr::tibble(
      walk_number = factor(walk_num),
      step_number = 1:periods
    ) |>
      dplyr::bind_cols(rand_steps)
  }
  
  # Generate all walks
  res <- map_dfr(1:num_walks, generate_walk)
  res <- res |>
    dplyr::group_by(walk_number) |>
    std_cum_sum_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    std_cum_prod_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    std_cum_min_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    std_cum_max_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    std_cum_mean_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    dplyr::ungroup()
  
  # Add attributes
  attr(res, "n")             <- n
  attr(res, "num_walks")     <- num_walks
  attr(res, "mu")            <- mu
  attr(res, "sd")            <- sd
  attr(res, "initial_value") <- initial_value
  attr(res, "replace")       <- replace
  attr(res, "samp")          <- samp
  attr(res, "samp_size")     <- samp_size
  attr(res, "periods")       <- periods
  attr(res, "fns")           <- "random_normal_walk"
  attr(res, "dimensions")    <- .dimensions
  
  # Return the result
  return(res)
}

Sample Output:

> random_normal_walk(.dimensions = 1) |> 
+   glimpse()
Rows: 2,000
Columns: 8
$ walk_number <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ step_number <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 2…
$ y           <dbl> -0.1002532219, -0.0797601456, 0.1147956309, -0.0575512640, -0.0071935353, -0…
$ cum_sum_y   <dbl> -0.100253222, -0.180013367, -0.065217737, -0.122769001, -0.129962536, -0.333…
$ cum_prod_y  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cum_min_y   <dbl> -0.1002532, -0.1002532, -0.1002532, -0.1002532, -0.1002532, -0.2030538, -0.2…
$ cum_max_y   <dbl> -0.10025322, -0.07976015, 0.11479563, 0.11479563, 0.11479563, 0.11479563, 0.…
$ cum_mean_y  <dbl> -1.002532e-01, -9.000668e-02, -2.173925e-02, -3.069225e-02, -2.599251e-02, -…
> random_normal_walk(.dimensions = 2) |>
+   glimpse()
Rows: 2,000
Columns: 14
$ walk_number <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ step_number <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 2…
$ x           <dbl> 6.738246e-03, 7.403996e-02, -4.919611e-02, -7.715558e-02, -2.005732e-01, -1.…
$ y           <dbl> -0.019300710, -0.122338203, 0.045713054, 0.142202671, 0.010560777, -0.160084…
$ cum_sum_x   <dbl> 0.006738246, 0.080778210, 0.031582097, -0.045573484, -0.246146678, -0.398783…
$ cum_sum_y   <dbl> -0.019300710, -0.141638913, -0.095925859, 0.046276812, 0.056837589, -0.10324…
$ cum_prod_x  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cum_prod_y  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cum_min_x   <dbl> 0.006738246, 0.006738246, -0.049196113, -0.077155581, -0.200573194, -0.20057…
$ cum_min_y   <dbl> -0.01930071, -0.12233820, -0.12233820, -0.12233820, -0.12233820, -0.16008473…
$ cum_max_x   <dbl> 0.006738246, 0.074039964, 0.074039964, 0.074039964, 0.074039964, 0.074039964…
$ cum_max_y   <dbl> -0.01930071, -0.01930071, 0.04571305, 0.14220267, 0.14220267, 0.14220267, 0.…
$ cum_mean_x  <dbl> 0.006738246, 0.040389105, 0.010527366, -0.011393371, -0.049229336, -0.066463…
$ cum_mean_y  <dbl> -0.0193007100, -0.0708194567, -0.0319752864, 0.0115692029, 0.0113675177, -0.…
> random_normal_walk(.dimensions = 3) |>
+   glimpse()
Rows: 2,000
Columns: 20
$ walk_number <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ step_number <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 2…
$ x           <dbl> -0.1257393403, 0.1444225094, -0.0614083652, 0.0951444936, 0.0676052630, -0.0…
$ y           <dbl> -0.083598254, -0.089906891, -0.055312913, 0.044652238, 0.134638354, 0.123363…
$ z           <dbl> 0.0722389305, 0.0023475955, -0.0002119357, 0.0320583149, 0.0707547864, -0.07…
$ cum_sum_x   <dbl> -0.12573934, 0.01868317, -0.04272520, 0.05241930, 0.12002456, 0.06812772, 0.…
$ cum_sum_y   <dbl> -0.08359825, -0.17350515, -0.22881806, -0.18416582, -0.04952747, 0.07383572,…
$ cum_sum_z   <dbl> 0.07223893, 0.07458653, 0.07437459, 0.10643291, 0.17718769, 0.10357439, 0.17…
$ cum_prod_x  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cum_prod_y  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cum_prod_z  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cum_min_x   <dbl> -0.1257393, -0.1257393, -0.1257393, -0.1257393, -0.1257393, -0.1257393, -0.1…
$ cum_min_y   <dbl> -0.08359825, -0.08990689, -0.08990689, -0.08990689, -0.08990689, -0.08990689…
$ cum_min_z   <dbl> 0.0722389305, 0.0023475955, -0.0002119357, -0.0002119357, -0.0002119357, -0.…
$ cum_max_x   <dbl> -0.1257393, 0.1444225, 0.1444225, 0.1444225, 0.1444225, 0.1444225, 0.1444225…
$ cum_max_y   <dbl> -0.08359825, -0.08359825, -0.05531291, 0.04465224, 0.13463835, 0.13463835, 0…
$ cum_max_z   <dbl> 0.07223893, 0.07223893, 0.07223893, 0.07223893, 0.07223893, 0.07223893, 0.07…
$ cum_mean_x  <dbl> -0.125739340, 0.009341585, -0.014241732, 0.013104824, 0.024004912, 0.0113546…
$ cum_mean_y  <dbl> -0.083598254, -0.086752573, -0.076272686, -0.046041455, -0.009905493, 0.0123…
$ cum_mean_z  <dbl> 0.072238931, 0.037293263, 0.024791530, 0.026608226, 0.035437538, 0.017262398…

0 replies

spsanderson · 2024-09-24T16:00:40Z

spsanderson
Sep 24, 2024
Maintainer Author

Here is another example of an updated random_normal_walk_with_drift()

random_normal_drift_walk <- function(.num_walks = 25, .n = 100, .mu = 0, .sd = 1, 
                                     .drift = 0.1, .initial_value = 0, .dimensions = 1) {
  
  # Convert inputs to appropriate types
  num_walks <- as.integer(.num_walks)
  num_steps <- as.integer(.n)
  mu <- as.numeric(.mu)
  sd <- as.numeric(.sd)
  drift <- as.numeric(.drift)
  initial_value <- as.numeric(.initial_value)
  
  # Checks
  if (num_walks <= 0) {
    rlang::abort("Number of walks must be a positive integer.", use_cli = TRUE)
  }
  if (num_steps <= 0) {
    rlang::abort("Number of steps must be a positive integer.", use_cli = TRUE)
  }
  if (sd <= 0) {
    rlang::abort("Standard deviation must be a positive number.", use_cli = TRUE)
  }
  if (is.na(mu)) {
    rlang::abort("Mean must be a number.", use_cli = TRUE)
  }
  if (is.na(drift)) {
    rlang::abort("Drift must be a number.", use_cli = TRUE)
  }
  if (is.na(initial_value)) {
    rlang::abort("Initial value must be a number.", use_cli = TRUE)
  }
  if (!.dimensions %in% c(1, 2, 3)) {
    rlang::abort("Number of dimensions must be 1, 2, or 3.", use_cli = TRUE)
  }
  
  # Create drift sequences for each dimension
  dr <- purrr::map(
    1:.dimensions, 
    ~ seq(from = drift, to = drift * num_steps, length.out = num_steps)
    )
  
  # Define dimension names
  dim_names <- switch(.dimensions,
                      `1` = c("y"),
                      `2` = c("x", "y"),
                      `3` = c("x", "y", "z"))
  
  # Function to generate a single random walk with drift for multiple dimensions
  single_random_walk_with_drift <- function(num_steps, mu, sd, dr) {
    walks_per_dim <- purrr::map2(dr, dim_names, function(drift_seq, dim) {
      wn <- stats::rnorm(n = num_steps, mean = mu, sd = sd)
      rw <- cumsum(wn)
      res <- wn + rw + drift_seq
      res
    })
    
    # Set Column Names
    rand_steps <- stats::setNames(walks_per_dim, dim_names)
    rand_steps <- purrr::map(rand_steps, \(x) dplyr::as_tibble(x)) |>
      purrr::list_cbind()
    colnames(rand_steps) <- dim_names
    rand_steps <- purrr::map(
      rand_steps, \(x) x |>
        unlist(use.names = FALSE)) |>
      dplyr::as_tibble()
  }
  
  # Generate all walks for each dimension
  walks <- purrr::map(
    1:num_walks, 
    ~ single_random_walk_with_drift(num_steps, mu, sd, dr)
    )
    
  
  # Create a tibble with all walks for all dimensions
  res <- dplyr::bind_rows(walks, .id = "walk_number") |>
    dplyr::mutate(walk_number = as.factor(walk_number)) |>
    dplyr::group_by(walk_number) |>
    dplyr::mutate(step_number = 1:num_steps) |>
    dplyr::select(walk_number, step_number, dplyr::all_of(dim_names)) |>
    std_cum_sum_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    std_cum_prod_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    std_cum_min_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    std_cum_max_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    std_cum_mean_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    dplyr::ungroup()
  
  # Add attributes
  attr(res, "n") <- num_steps
  attr(res, "num_walks") <- num_walks
  attr(res, "mu") <- mu
  attr(res, "sd") <- sd
  attr(res, "drift") <- drift
  attr(res, "fns") <- "random_normal_drift_walk"
  attr(res, "dimensions") <- .dimensions
  
  return(res)
}

> random_normal_drift_walk() |> dplyr::glimpse()
Rows: 2,500
Columns: 8
$ walk_number <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ step_number <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 2…
$ y           <dbl> -0.704207824, 0.005405342, -1.630592090, 2.540589878, -1.391934499, 0.230809…
$ cum_sum_y   <dbl> -0.7042078, -0.6988025, -2.3293946, 0.2111953, -1.1807392, -0.9499297, -3.74…
$ cum_prod_y  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cum_min_y   <dbl> -0.7042078, -0.7042078, -1.6305921, -1.6305921, -1.6305921, -1.6305921, -2.7…
$ cum_max_y   <dbl> -0.704207824, 0.005405342, 0.005405342, 2.540589878, 2.540589878, 2.54058987…
$ cum_mean_y  <dbl> -0.70420782, -0.34940124, -0.77646486, 0.05279883, -0.23614784, -0.15832161,…
> random_normal_drift_walk(.dimensions = 2) |> dplyr::glimpse()
Rows: 2,500
Columns: 14
$ walk_number <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ step_number <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 2…
$ x           <dbl> 1.3000643, -0.5741537, 1.2487231, -1.8004786, 2.5158755, 0.8701070, -0.27102…
$ y           <dbl> 0.84614643, 0.90587021, 4.98652494, 2.05560699, 2.91931159, 3.88258039, 0.17…
$ cum_sum_x   <dbl> 1.3000643, 0.7259106, 1.9746337, 0.1741550, 2.6900305, 3.5601376, 3.2891090,…
$ cum_sum_y   <dbl> 0.8461464, 1.7520166, 6.7385416, 8.7941486, 11.7134602, 15.5960405, 15.77078…
$ cum_prod_x  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cum_prod_y  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cum_min_x   <dbl> 1.3000643, -0.5741537, -0.5741537, -1.8004786, -1.8004786, -1.8004786, -1.80…
$ cum_min_y   <dbl> 0.8461464, 0.8461464, 0.8461464, 0.8461464, 0.8461464, 0.8461464, 0.1747481,…
$ cum_max_x   <dbl> 1.300064, 1.300064, 1.300064, 1.300064, 2.515875, 2.515875, 2.515875, 2.5158…
$ cum_max_y   <dbl> 0.8461464, 0.9058702, 4.9865249, 4.9865249, 4.9865249, 4.9865249, 4.9865249,…
$ cum_mean_x  <dbl> 1.30006431, 0.36295528, 0.65821123, 0.04353876, 0.53800610, 0.59335626, 0.46…
$ cum_mean_y  <dbl> 0.8461464, 0.8760083, 2.2461805, 2.1985371, 2.3426920, 2.5993401, 2.2529698,…
>random_normal_drift_walk(.dimensions = 3) |> dplyr::glimpse()
Rows: 2,500
Columns: 20
$ walk_number <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ step_number <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 2…
$ x           <dbl> -0.782059844, -3.404440324, -3.190737459, -0.872538854, -3.275845149, -0.158…
$ y           <dbl> -0.943494797, -2.753556318, -2.827901366, -3.170963854, -6.324274173, -6.328…
$ z           <dbl> -1.0782543, 0.9098405, -0.9205521, 0.1392489, 0.5606996, -0.3060099, -1.6772…
$ cum_sum_x   <dbl> -0.7820598, -4.1865002, -7.3772376, -8.2497765, -11.5256216, -11.6845978, -1…
$ cum_sum_y   <dbl> -0.9434948, -3.6970511, -6.5249525, -9.6959163, -16.0201905, -22.3483258, -2…
$ cum_sum_z   <dbl> -1.0782543, -0.1684139, -1.0889659, -0.9497171, -0.3890174, -0.6950273, -2.3…
$ cum_prod_x  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cum_prod_y  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cum_prod_z  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cum_min_x   <dbl> -0.7820598, -3.4044403, -3.4044403, -3.4044403, -3.4044403, -3.4044403, -3.4…
$ cum_min_y   <dbl> -0.9434948, -2.7535563, -2.8279014, -3.1709639, -6.3242742, -6.3281353, -6.3…
$ cum_min_z   <dbl> -1.078254, -1.078254, -1.078254, -1.078254, -1.078254, -1.078254, -1.677289,…
$ cum_max_x   <dbl> -0.7820598, -0.7820598, -0.7820598, -0.7820598, -0.7820598, -0.1589761, 1.35…
$ cum_max_y   <dbl> -0.9434948, -0.9434948, -0.9434948, -0.9434948, -0.9434948, -0.9434948, -0.9…
$ cum_max_z   <dbl> -1.0782543, 0.9098405, 0.9098405, 0.9098405, 0.9098405, 0.9098405, 0.9098405…
$ cum_mean_x  <dbl> -0.78205984, -2.09325008, -2.45907921, -2.06244412, -2.30512433, -1.94743296…
$ cum_mean_y  <dbl> -0.9434948, -1.8485256, -2.1749842, -2.4239791, -3.2040381, -3.7247210, -3.8…
$ cum_mean_z  <dbl> -1.078254341, -0.084206938, -0.362988647, -0.237429272, -0.077803488, -0.115…

0 replies

spsanderson · 2024-09-24T19:15:08Z

spsanderson
Sep 24, 2024
Maintainer Author

Here is an example of discrete_walk()

test_discrete_walk <- function(.num_walks = 25, .n = 100, .upper_bound = 1,
                                  .lower_bound = -1, .upper_probability = 0.5,
                                  .initial_value = 100, .dimensions = 1) {
  
  # Variables
  num_walks <- as.integer(.num_walks)
  periods <- as.integer(.n)
  upper_bound <- as.numeric(.upper_bound)
  lower_bound <- as.numeric(.lower_bound)
  upper_probability <- as.numeric(.upper_probability)
  lower_probability <- 1 - upper_probability
  initial_value <- as.numeric(.initial_value)
  dimensions <- as.integer(.dimensions)
  
  # Checks
  if (!is.integer(num_walks) | num_walks < 1) {
    rlang::abort(
      message = "The number of walks must be an integer greater than 0.",
      use_cli_format = TRUE
    )
  }
  
  if (!is.integer(periods) | periods < 1) {
    rlang::abort(
      message = "The number of periods must be an integer greater than 0.",
      use_cli_format = TRUE
    )
  }
  
  if (!is.numeric(upper_bound)) {
    rlang::abort(
      message = "The upper bound must be a numeric value.",
      use_cli_format = TRUE
    )
  }
  
  if (!is.numeric(lower_bound)) {
    rlang::abort(
      message = "The lower bound must be a numeric value.",
      use_cli_format = TRUE
    )
  }
  
  if (!is.numeric(upper_probability) | upper_probability < 0 | upper_probability > 1) {
    rlang::abort(
      message = "The upper probability must be a numeric value between 0 and 1.",
      use_cli_format = TRUE
    )
  }
  
  if (!is.numeric(initial_value)) {
    rlang::abort(
      message = "The initial value must be a numeric value.",
      use_cli_format = TRUE
    )
  }
  
  if (!is.integer(dimensions) | dimensions < 1 | dimensions > 3) {
    rlang::abort(
      message = "The number of dimensions must be an integer between 1 and 3.",
      use_cli_format = TRUE
    )
  }
  
  if (!.dimensions %in% c(1, 2, 3)) {
    rlang::abort(
      message = "The number of dimensions must be an integer between 1 and 3.",
      use_cli_format = TRUE
    )
  }
  
  # Define dimension names
  dim_names <- switch(.dimensions,
                      `1` = c("y"),
                      `2` = c("x", "y"),
                      `3` = c("x", "y", "z"))
  
  # Generate walks for each dimension
  single_discrete_walk <- function(periods, upper_bound, lower_bound,
                                   upper_probability, lower_probability){
    rand_steps <- purrr::map(
      dim_names,
      ~ replicate(
          n = periods,
          sample(
            x = c(upper_bound, lower_bound),
            size = 1,
            prob = c(upper_probability, lower_probability))
          )
    )
      
    # Set Column Names
    rand_steps <- stats::setNames(rand_steps, dim_names)
    rand_steps <- purrr::map(rand_steps, \(x) dplyr::as_tibble(x)) |> 
      purrr::list_cbind()
    colnames(rand_steps) <- dim_names
    rand_steps <- purrr::map(
      rand_steps, \(x) x |> 
        unlist(use.names = FALSE)) |> 
      dplyr::as_tibble()
    
    # Combine into a tibble
    dplyr::tibble(
      walk_number = factor(num_walks),
      step_number = 1:periods
    ) |>
      dplyr::bind_cols(rand_steps)
  }
  
  # Generate walks
  walks <- purrr::map(
    1:num_walks, 
    ~ single_discrete_walk(periods, upper_bound, lower_bound,
                            upper_probability, lower_probability)
  )
  
  # Create a tibble with all walks for all dimensions
  res <- dplyr::bind_rows(walks, .id = "walk_number") |>
    dplyr::mutate(walk_number = as.factor(walk_number)) |>
    dplyr::group_by(walk_number) |>
    dplyr::select(walk_number, step_number, dplyr::all_of(dim_names)) |>
    std_cum_sum_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    std_cum_prod_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    std_cum_min_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    std_cum_max_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    std_cum_mean_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    dplyr::ungroup()
  
  # Attributes
  attr(res, "n")             <- periods
  attr(res, "num_walks")     <- .num_walks
  attr(res, "upper_bound")   <- upper_bound
  attr(res, "lower_bound")   <- lower_bound
  attr(res, "upper_probability") <- upper_probability
  attr(res, "lower_probability") <- lower_probability
  attr(res, "initial_value") <- initial_value
  attr(res, "fns")           <- "discrete_walk"
  attr(res, "dimension")     <- dimensions
  
  # Return
  return(res)
  
}

> test_discrete_walk() |> dplyr::glimpse()
Rows: 2,500
Columns: 8
$ walk_number <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ step_number <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 2…
$ y           <dbl> 1, 1, 1, -1, 1, -1, 1, 1, -1, 1, -1, -1, -1, -1, 1, 1, -1, -1, 1, -1, 1, 1, …
$ cum_sum_y   <dbl> 101, 102, 103, 102, 103, 102, 103, 104, 103, 104, 103, 102, 101, 100, 101, 1…
$ cum_prod_y  <dbl> 200, 400, 800, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cum_min_y   <dbl> 101, 101, 101, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 9…
$ cum_max_y   <dbl> 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 1…
$ cum_mean_y  <dbl> 101.00000, 101.00000, 101.00000, 100.50000, 100.60000, 100.33333, 100.42857,…
> test_discrete_walk(.dimensions = 2) |> dplyr::glimpse()
Rows: 2,500
Columns: 14
$ walk_number <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ step_number <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 2…
$ x           <dbl> 1, 1, 1, 1, -1, 1, -1, 1, -1, -1, 1, 1, 1, -1, -1, -1, -1, 1, -1, 1, 1, 1, -…
$ y           <dbl> 1, -1, 1, 1, -1, 1, 1, 1, 1, 1, -1, 1, -1, -1, -1, -1, 1, -1, -1, 1, 1, -1, …
$ cum_sum_x   <dbl> 101, 102, 103, 104, 103, 104, 103, 104, 103, 102, 103, 104, 105, 104, 103, 1…
$ cum_sum_y   <dbl> 101, 100, 101, 102, 101, 102, 103, 104, 105, 106, 105, 106, 105, 104, 103, 1…
$ cum_prod_x  <dbl> 200, 400, 800, 1600, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cum_prod_y  <dbl> 200, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ cum_min_x   <dbl> 101, 101, 101, 101, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, …
$ cum_min_y   <dbl> 101, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99,…
$ cum_max_x   <dbl> 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 1…
$ cum_max_y   <dbl> 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 1…
$ cum_mean_x  <dbl> 101.00000, 101.00000, 101.00000, 101.00000, 100.60000, 100.66667, 100.42857,…
$ cum_mean_y  <dbl> 101.0000, 100.0000, 100.3333, 100.5000, 100.2000, 100.3333, 100.4286, 100.50…
> test_discrete_walk(.dimensions = 3) |> dplyr::glimpse()
Rows: 2,500
Columns: 20
$ walk_number <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ step_number <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 2…
$ x           <dbl> -1, -1, -1, 1, 1, 1, -1, -1, 1, 1, -1, -1, 1, -1, -1, -1, -1, -1, 1, -1, 1, …
$ y           <dbl> 1, -1, 1, 1, 1, 1, -1, -1, -1, 1, -1, 1, -1, -1, -1, 1, 1, 1, -1, 1, 1, -1, …
$ z           <dbl> 1, 1, 1, -1, 1, 1, -1, -1, -1, -1, 1, 1, -1, 1, 1, -1, 1, 1, -1, -1, -1, 1, …
$ cum_sum_x   <dbl> 99, 98, 97, 98, 99, 100, 99, 98, 99, 100, 99, 98, 99, 98, 97, 96, 95, 94, 95…
$ cum_sum_y   <dbl> 101, 100, 101, 102, 103, 104, 103, 102, 101, 102, 101, 102, 101, 100, 99, 10…
$ cum_sum_z   <dbl> 101, 102, 103, 102, 103, 104, 103, 102, 101, 100, 101, 102, 101, 102, 103, 1…
$ cum_prod_x  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cum_prod_y  <dbl> 200, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ cum_prod_z  <dbl> 200, 400, 800, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ cum_min_x   <dbl> 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, …
$ cum_min_y   <dbl> 101, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99,…
$ cum_min_z   <dbl> 101, 101, 101, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 9…
$ cum_max_x   <dbl> 99, 99, 99, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101,…
$ cum_max_y   <dbl> 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 1…
$ cum_max_z   <dbl> 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 1…
$ cum_mean_x  <dbl> 99.00000, 99.00000, 99.00000, 99.50000, 99.80000, 100.00000, 99.85714, 99.75…
$ cum_mean_y  <dbl> 101.00000, 100.00000, 100.33333, 100.50000, 100.60000, 100.66667, 100.42857,…
$ cum_mean_z  <dbl> 101.0000, 101.0000, 101.0000, 100.5000, 100.6000, 100.6667, 100.4286, 100.25…
> test_discrete_walk(.dimensions = 4)
Error in `test_discrete_walk()`:
! The number of dimensions must be an integer between 1 and 3.
Run `rlang::last_trace()` to see where the error occurred.
Called from: signal_abort(cnd, .file)

tst <- test_discrete_walk(.dimensions = 2, .num_walks = 9, .n = 250)
  
tst |>
  ggplot(aes(x = cum_sum_x, y = cum_sum_y)) +
  facet_wrap(~ walk_number, scales = "free") +
  geom_path(aes(color = step_number), linewidth = 0.5) +
  geom_point(data = tst |> 
               group_by(walk_number) |> 
               filter(step_number == min(step_number) | step_number == max(step_number)) |> 
               ungroup(), 
             aes(color = step_number), size = 3) +
  scale_color_viridis_c(option = "plasma") +
  theme_minimal()

0 replies

spsanderson · 2024-09-25T18:47:45Z

spsanderson
Sep 25, 2024
Maintainer Author

Here is an example of Brownian Motion:

#' Brownian Motion
#'
#' @family Generator Functions
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @description Create a Brownian Motion Tibble
#'
#' @details Brownian Motion, also known as the Wiener process, is a
#' continuous-time random process that describes the random movement of particles
#' suspended in a fluid. It is named after the physicist Robert Brown,
#' who first described the phenomenon in 1827.
#'
#' The equation for Brownian Motion can be represented as:
#'
#'     W(t) = W(0) + sqrt(t) * Z
#'
#' Where W(t) is the Brownian motion at time t, W(0) is the initial value of the
#' Brownian motion, sqrt(t) is the square root of time, and Z is a standard
#' normal random variable.
#'
#' Brownian Motion has numerous applications, including modeling stock prices in
#' financial markets, modeling particle movement in fluids, and modeling random
#' walk processes in general. It is a useful tool in probability theory and
#' statistical analysis.
#'
#' @param .n Total time of the simulation.
#' @param .num_walks Total number of simulations.
#' @param .delta_time Time step size.
#' @param .initial_value Integer representing the initial value.
#' @param .dimensions The default is 1. Allowable values are 1, 2 and 3.
#'
#' @examples
#' library(ggplot2)
#'
#' set.seed(123)
#' brownian_motion()
#'
#' set.seed(123)
#' brownian_motion() |>
#'   ggplot(aes(x = x, y = y, group = walk_number, color = walk_number)) +
#'   geom_line() +
#'   labs(title = "Brownian Motion", x = "Time", y = "Value") +
#'   theme_minimal() +
#'   theme(legend.position = "none")
#'
#' @return A tibble containing the generated random walks with columns depending on the number of dimensions:
#' \itemize{
#'   \item `walk_number`: Factor representing the walk number.
#'   \item `step_number`: Step index.
#'   \item `y`: If `.dimensions = 1`, the value of the walk at each step.
#'   \item `x`, `y`: If `.dimensions = 2`, the values of the walk in two dimensions.
#'   \item `x`, `y`, `z`: If `.dimensions = 3`, the values of the walk in three dimensions.
#' }
#'
#' The following are also returned based upon how many dimensions there are and could be any of x, y and or z:
#' \itemize{
#'   \item `cum_sum`: Cumulative sum of `dplyr::all_of(.dimensions)`.
#'   \item `cum_prod`: Cumulative product of `dplyr::all_of(.dimensions)`.
#'   \item `cum_min`: Cumulative minimum of `dplyr::all_of(.dimensions)`.
#'   \item `cum_max`: Cumulative maximum of `dplyr::all_of(.dimensions)`.
#'   \item `cum_mean`: Cumulative mean of `dplyr::all_of(.dimensions)`.
#' }
#'
#' @name brownian_motion
NULL

#' @export
#' @rdname brownian_motion

brownian_motion_v2 <- function(.num_walks = 25, .n = 100, .delta_time = 1,
                            .initial_value = 0, .dimensions = 1) {
  
  # Tidyeval ----
  num_sims <- as.numeric(.num_walks)
  t <- as.numeric(.n)
  initial_value <- as.numeric(.initial_value)
  delta_time <- as.numeric(.delta_time)
  
  # Checks
  if (!is.numeric(num_sims) | !is.numeric(t) | !is.numeric(initial_value) |
      !is.numeric(delta_time)){
    rlang::abort(
      message = "The parameters `.num_walks`, `.n`, `.delta_time`, and `.initial_value` must be numeric.",
      use_cli_format = TRUE
    )
  }
  
  # .num_walks and .n must be >= 1
  if (num_sims < 1 | t < 1){
    rlang::abort(
      message = "The parameters of `.num_walks` and `.n` must be >= 1.",
      use_cli_format = TRUE
    )
  }
  
  # .delta_time must be > 0
  if (delta_time <= 0){
    rlang::abort(
      message = "The parameter `.delta_time` must be > 0.",
      use_cli_format = TRUE
    )
  }
  
  if (!is.logical(return_tibble)){
    rlang::abort(
      message = "The parameter `.return_tibble` must be either TRUE/FALSE",
      use_cli_format = TRUE
    )
  }
  
  if (!.dimensions %in% c(1, 2, 3)) {
    rlang::abort("Number of dimensions must be 1, 2, or 3.", use_cli = TRUE)
  }
  
  # Define dimension names
  dim_names <- switch(.dimensions,
                      `1` = c("y"),
                      `2` = c("x", "y"),
                      `3` = c("x", "y", "z"))
  
  # Matrix of random draws - one for each simulation
  generate_brownian_motion <- function(num_sims) {
    rand_steps <- purrr::map(
      dim_names,
      ~ stats::rnorm(t, mean = 0, sd = sqrt(delta_time))
    )
    
    # Set column names
    rand_steps <- stats::setNames(rand_steps, dim_names)
    rand_steps <- purrr::map(rand_steps, \(x) dplyr::as_tibble(x)) |> 
      purrr::list_cbind()
    colnames(rand_steps) <- dim_names
    rand_steps <- purrr::map(
      rand_steps, \(x) x |> 
        unlist(use.names = FALSE)) |> 
      dplyr::as_tibble()
    
    # Combine into a tibble
    dplyr::tibble(
      walk_number = factor(num_sims),
      step_number = 1:t
    ) |>
      dplyr::bind_cols(rand_steps)
  }
  
  # Get the Brownian Motion and convert to price paths
  res <- purrr::map(1:num_sims, ~ generate_brownian_motion(.x)) |> 
    dplyr::bind_rows() |>
    dplyr::select(walk_number, step_number, dplyr::any_of(dim_names)) |>
    dplyr::group_by(walk_number) |>
    std_cum_sum_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    std_cum_prod_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    std_cum_min_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    std_cum_max_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    std_cum_mean_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    dplyr::ungroup()
  
  # Return ----
  attr(res, "n") <- .n
  attr(res, "num_walks") <- .num_walks
  attr(res, "delta_time") <- .delta_time
  attr(res, "initial_value") <- .initial_value
  attr(res, "return_tibble") <- .return_tibble
  attr(res, "fns") <- "brownian_motion"
  attr(res, "dimension")     <- 1
  
  return(res)
}

Output:

> brownian_motion_v2()
# A tibble: 2,500 × 8
   walk_number step_number      y cum_sum_y cum_prod_y cum_min_y cum_max_y cum_mean_y
   <fct>             <int>  <dbl>     <dbl>      <dbl>     <dbl>     <dbl>      <dbl>
 1 1                     1  0.463     0.463          0     0.463     0.463      0.463
 2 1                     2  0.365     0.829          0     0.365     0.463      0.414
 3 1                     3  1.11      1.93           0     0.365     1.11       0.645
 4 1                     4 -0.204     1.73           0    -0.204     1.11       0.433
 5 1                     5  0.701     2.43           0    -0.204     1.11       0.486
 6 1                     6  0.599     3.03           0    -0.204     1.11       0.505
 7 1                     7 -0.101     2.93           0    -0.204     1.11       0.419
 8 1                     8 -0.117     2.81           0    -0.204     1.11       0.352
 9 1                     9 -0.658     2.15           0    -0.658     1.11       0.239
10 1                    10  2.28      4.44           0    -0.658     2.28       0.444
# ℹ 2,490 more rows
# ℹ Use `print(n = ...)` to see more rows
> brownian_motion_v2(.dimensions = 2)
# A tibble: 2,500 × 14
   walk_number step_number      x       y cum_sum_x cum_sum_y cum_prod_x cum_prod_y cum_min_x
   <fct>             <int>  <dbl>   <dbl>     <dbl>     <dbl>      <dbl>      <dbl>     <dbl>
 1 1                     1 -1.93   0.0384    -1.93     0.0384          0          0     -1.93
 2 1                     2  2.09  -1.03       0.160   -0.995           0          0     -1.93
 3 1                     3 -0.914 -1.16      -0.754   -2.16            0          0     -1.93
 4 1                     4 -1.21  -2.18      -1.96    -4.34            0          0     -1.93
 5 1                     5 -1.77  -0.184     -3.73    -4.52            0          0     -1.93
 6 1                     6 -1.25   0.316     -4.97    -4.21            0          0     -1.93
 7 1                     7  0.789  0.826     -4.19    -3.38            0          0     -1.93
 8 1                     8  1.22  -0.653     -2.97    -4.03            0          0     -1.93
 9 1                     9 -0.275 -0.787     -3.24    -4.82            0          0     -1.93
10 1                    10 -1.35  -0.971     -4.59    -5.79            0          0     -1.93
# ℹ 2,490 more rows
# ℹ 5 more variables: cum_min_y <dbl>, cum_max_x <dbl>, cum_max_y <dbl>, cum_mean_x <dbl>,
#   cum_mean_y <dbl>
# ℹ Use `print(n = ...)` to see more rows
> brownian_motion_v2(.dimensions = 3)
# A tibble: 2,500 × 20
   walk_number step_number      x      y        z cum_sum_x cum_sum_y cum_sum_z cum_prod_x
   <fct>             <int>  <dbl>  <dbl>    <dbl>     <dbl>     <dbl>     <dbl>      <dbl>
 1 1                     1  0.425 -0.189 -0.253      0.425    -0.189     -0.253          0
 2 1                     2  1.89   0.350 -1.23       2.31      0.161     -1.48           0
 3 1                     3 -0.383 -1.29  -0.533      1.93     -1.13      -2.01           0
 4 1                     4 -0.539  1.02  -0.838      1.39     -0.106     -2.85           0
 5 1                     5 -0.285 -0.706 -0.0792     1.10     -0.812     -2.93           0
 6 1                     6 -2.65  -0.176  0.00443   -1.55     -0.988     -2.93           0
 7 1                     7  0.869  0.292 -0.760     -0.676    -0.696     -3.69           0
 8 1                     8  0.707  0.405 -1.34       0.0307   -0.291     -5.03           0
 9 1                     9  0.686  0.952  0.176      0.717     0.661     -4.85           0
10 1                    10  0.663 -0.578  0.209      1.38      0.0826    -4.65           0
# ℹ 2,490 more rows
# ℹ 11 more variables: cum_prod_y <dbl>, cum_prod_z <dbl>, cum_min_x <dbl>, cum_min_y <dbl>,
#   cum_min_z <dbl>, cum_max_x <dbl>, cum_max_y <dbl>, cum_max_z <dbl>, cum_mean_x <dbl>,
#   cum_mean_y <dbl>, cum_mean_z <dbl>
# ℹ Use `print(n = ...)` to see more rows

0 replies

spsanderson · 2024-09-25T19:09:48Z

spsanderson
Sep 25, 2024
Maintainer Author

Geometric Brownian Motion:

#' Geometric Brownian Motion
#'
#' @family Generator Functions
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @description Create a Geometric Brownian Motion.
#'
#' @details Geometric Brownian Motion (GBM) is a statistical method for modeling
#' the evolution of a given financial asset over time. It is a type of stochastic
#' process, which means that it is a system that undergoes random changes over
#' time.
#'
#' GBM is widely used in the field of finance to model the behavior of stock
#' prices, foreign exchange rates, and other financial assets. It is based on
#' the assumption that the asset's price follows a random walk, meaning that it
#' is influenced by a number of unpredictable factors such as market trends,
#' news events, and investor sentiment.
#'
#' The equation for GBM is:
#'
#'      dS/S = mdt + sdW
#'
#' where S is the price of the asset, t is time, m is the expected return on the
#' asset, s is the volatility of the asset, and dW is a small random change in
#' the asset's price.
#'
#' GBM can be used to estimate the likelihood of different outcomes for a given
#' asset, and it is often used in conjunction with other statistical methods to
#' make more accurate predictions about the future performance of an asset.
#'
#' This function provides the ability of simulating and estimating the parameters
#' of a GBM process. It can be used to analyze the behavior of financial
#' assets and to make informed investment decisions.
#'
#' @param .n Total time of the simulation, how many `n` points in time.
#' @param .num_walks Total number of simulations.
#' @param .delta_time Time step size.
#' @param .initial_value Integer representing the initial value.
#' @param .mu Expected return
#' @param .sigma Volatility
#' @param .dimensions The default is 1. Allowable values are 1, 2 and 3.
#'
#' @examples
#' library(ggplot2)
#'
#' set.seed(123)
#' geometric_brownian_motion()
#'
#' set.seed(123)
#' geometric_brownian_motion() |>
#'   ggplot(aes(x = x, y = y, group = walk_number, color = walk_number)) +
#'   geom_line() +
#'   labs(title = "Geometric Brownian Motion", x = "Time", y = "Value") +
#'   theme_minimal() +
#'   theme(legend.position = "none")
#'
#' @return A tibble containing the generated random walks with columns depending
#' on the number of dimensions:
#' \itemize{
#'   \item `walk_number`: Factor representing the walk number.
#'   \item `step_number`: Step index.
#'   \item `y`: If `.dimensions = 1`, the value of the walk at each step.
#'   \item `x`, `y`: If `.dimensions = 2`, the values of the walk in two dimensions.
#'   \item `x`, `y`, `z`: If `.dimensions = 3`, the values of the walk in three dimensions.
#' }
#'
#' The following are also returned based upon how many dimensions there are and 
#' could be any of x, y and or z:
#' \itemize{
#'   \item `cum_sum`: Cumulative sum of `dplyr::all_of(.dimensions)`.
#'   \item `cum_prod`: Cumulative product of `dplyr::all_of(.dimensions)`.
#'   \item `cum_min`: Cumulative minimum of `dplyr::all_of(.dimensions)`.
#'   \item `cum_max`: Cumulative maximum of `dplyr::all_of(.dimensions)`.
#'   \item `cum_mean`: Cumulative mean of `dplyr::all_of(.dimensions)`.
#' }
#'
#' @name geometric_brownian_motion
NULL

#' @export
#' @rdname geometric_brownian_motion

geometric_brownian_motion_v2 <- function(.num_walks = 25, .n = 100,
                                      .mu = 0, .sigma = 0.1,
                                      .initial_value = 100,
                                      .delta_time = 0.003,
                                      .dimensions = 1) {
  
  # Tidyeval ----
  # Thank you to https://robotwealth.com/efficiently-simulating-geometric-brownian-motion-in-r/
  num_sims <- as.numeric(.num_walks)
  t <- as.numeric(.n)
  mu <- as.numeric(.mu)
  sigma <- as.numeric(.sigma)
  initial_value <- as.numeric(.initial_value)
  delta_time <- as.numeric(.delta_time)
  
  # Checks ----
  
  if (!is.numeric(num_sims) | !is.numeric(t) | !is.numeric(mu) |
      !is.numeric(sigma) | !is.numeric(initial_value) | !is.numeric(delta_time)){
    rlang::abort(
      message = "The parameters of `.n', `.num_walks`, `.mu`, `.sigma`,
            `.initial_value`, and `.delta_time` must be numeric.",
      use_cli_format = TRUE
    )
  }
  
  # .mu and .sigma and .detla_time must be >= 0
  if (mu < 0 | sigma < 0 | delta_time < 0){
    rlang::abort(
      message = "The parameters of `.mu`, `.sigma`, and `.delta_time` must be >= 0.",
      use_cli_format = TRUE
    )
  }
  
  # .num_walks and .n must be >= 1
  if (num_sims < 1 | t < 1){
    rlang::abort(
      message = "The parameters of `.num_walks` and `.n` must be >= 1.",
      use_cli_format = TRUE
    )
  }
  if (!.dimensions %in% c(1, 2, 3)) {
    rlang::abort("Number of dimensions must be 1, 2, or 3.", use_cli = TRUE)
  }
  
  # Define dimension names
  dim_names <- switch(.dimensions,
                      `1` = c("y"),
                      `2` = c("x", "y"),
                      `3` = c("x", "y", "z"))
  
  # matrix of random draws - one for each day for each simulation
  generate_gbm <- function(num_sims){
    rand_steps <- purrr::map(
      dim_names,
      ~ exp((mu - sigma * sigma / 2) * delta_time + sigma * stats::rnorm(t) * sqrt(delta_time)) |>
        cumprod()
    )
    
    # Set column names
    rand_steps <- stats::setNames(rand_steps, dim_names)
    rand_steps <- purrr::map(rand_steps, \(x) dplyr::as_tibble(x)) |> 
      purrr::list_cbind()
    colnames(rand_steps) <- dim_names
    rand_steps <- purrr::map(
      rand_steps, \(x) x |> 
        unlist(use.names = FALSE)) |> 
      dplyr::as_tibble()
    
    # Combine into a tibble
    dplyr::tibble(
      walk_number = factor(num_sims),
      step_number = 1:t
    ) |>
      dplyr::bind_cols(rand_steps)
  }
  
  res <- purrr::map(1:num_sims, ~ generate_gbm(.x)) |>
    dplyr::bind_rows() |>
    dplyr::select(walk_number, step_number, dplyr::all_of(dim_names)) |>
    dplyr::group_by(walk_number) |>
    std_cum_min_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    std_cum_max_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    std_cum_mean_augment(.value = dplyr::all_of(dim_names), .initial_value = initial_value) |>
    dplyr::ungroup()
  
  # Return
  
  attr(res, "n") <- .n
  attr(res, "num_walks") <- .num_walks
  attr(res, "mean") <- .mu
  attr(res, "sigma") <- .sigma
  attr(res, "initial_value") <- .initial_value
  attr(res, "delta_time") <- .delta_time
  attr(res, "fns") <- "geometric_brownian_motion"
  attr(res, "dimension")     <- .dimensions
  
  return(res)
}

Output:

> geometric_brownian_motion_v2() |> dplyr::glimpse()
Rows: 2,500
Columns: 6
$ walk_number <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ step_number <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 2…
$ y           <dbl> 1.0032995, 0.9950700, 0.9950563, 0.9939038, 0.9992407, 0.9955925, 0.9961966,…
$ cum_min_y   <dbl> 101.0033, 100.9951, 100.9951, 100.9939, 100.9939, 100.9939, 100.9939, 100.99…
$ cum_max_y   <dbl> 101.0033, 101.0033, 101.0033, 101.0033, 101.0033, 101.0033, 101.0033, 101.00…
$ cum_mean_y  <dbl> 101.0033, 100.9992, 100.9978, 100.9968, 100.9973, 100.9970, 100.9969, 100.99…
> geometric_brownian_motion_v2(.dimensions = 2) |> dplyr::glimpse()
Rows: 2,500
Columns: 10
$ walk_number <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ step_number <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 2…
$ x           <dbl> 0.9900641, 0.9942326, 0.9967875, 1.0028786, 0.9901350, 0.9843250, 0.9902295,…
$ y           <dbl> 0.9949847, 0.9999695, 0.9929160, 1.0004247, 0.9977902, 0.9912838, 0.9940214,…
$ cum_min_x   <dbl> 100.9901, 100.9901, 100.9901, 100.9901, 100.9901, 100.9843, 100.9843, 100.98…
$ cum_min_y   <dbl> 100.9950, 100.9950, 100.9929, 100.9929, 100.9929, 100.9913, 100.9913, 100.99…
$ cum_max_x   <dbl> 100.9901, 100.9942, 100.9968, 101.0029, 101.0029, 101.0029, 101.0029, 101.00…
$ cum_max_y   <dbl> 100.9950, 101.0000, 101.0000, 101.0004, 101.0004, 101.0004, 101.0004, 101.00…
$ cum_mean_x  <dbl> 100.9901, 100.9921, 100.9937, 100.9960, 100.9948, 100.9931, 100.9927, 100.99…
$ cum_mean_y  <dbl> 100.9950, 100.9975, 100.9960, 100.9971, 100.9972, 100.9962, 100.9959, 100.99…
> geometric_brownian_motion_v2(.dimensions = 3) |> dplyr::glimpse()
Rows: 2,500
Columns: 14
$ walk_number <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ step_number <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 2…
$ x           <dbl> 0.9994094, 1.0055157, 1.0106583, 1.0017301, 0.9962237, 0.9957554, 1.0023433,…
$ y           <dbl> 0.9929238, 0.9964699, 0.9971869, 1.0032907, 1.0052320, 1.0103826, 1.0215538,…
$ z           <dbl> 0.9920106, 0.9802242, 0.9715859, 0.9724813, 0.9825850, 0.9928515, 0.9939654,…
$ cum_min_x   <dbl> 100.9994, 100.9994, 100.9994, 100.9994, 100.9962, 100.9958, 100.9958, 100.99…
$ cum_min_y   <dbl> 100.9929, 100.9929, 100.9929, 100.9929, 100.9929, 100.9929, 100.9929, 100.99…
$ cum_min_z   <dbl> 100.9920, 100.9802, 100.9716, 100.9716, 100.9716, 100.9716, 100.9716, 100.97…
$ cum_max_x   <dbl> 100.9994, 101.0055, 101.0107, 101.0107, 101.0107, 101.0107, 101.0107, 101.01…
$ cum_max_y   <dbl> 100.9929, 100.9965, 100.9972, 101.0033, 101.0052, 101.0104, 101.0216, 101.02…
$ cum_max_z   <dbl> 100.9920, 100.9920, 100.9920, 100.9920, 100.9920, 100.9929, 100.9940, 100.99…
$ cum_mean_x  <dbl> 100.9994, 101.0025, 101.0052, 101.0043, 101.0027, 101.0015, 101.0017, 101.00…
$ cum_mean_y  <dbl> 100.9929, 100.9947, 100.9955, 100.9975, 100.9990, 101.0009, 101.0039, 101.00…
$ cum_mean_z  <dbl> 100.9920, 100.9861, 100.9813, 100.9791, 100.9798, 100.9820, 100.9837, 100.98…

geometric_brownian_motion_v2(.dimensions = 2) |>
  ggplot(aes(x = x, y = y)) +
  facet_wrap(~ walk_number, scales = "free") +
  geom_path() +
  theme_minimal()

0 replies

spsanderson · 2024-11-11T14:25:58Z

spsanderson
Nov 11, 2024
Maintainer Author

I am implementing this into the development branch.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we change the way random walks are generated to allow dimensions #53

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Should we change the way random walks are generated to allow dimensions #53

spsanderson Jul 30, 2024 Maintainer

Replies: 6 comments

spsanderson Sep 16, 2024 Maintainer Author

spsanderson Sep 24, 2024 Maintainer Author

spsanderson Sep 24, 2024 Maintainer Author

spsanderson Sep 25, 2024 Maintainer Author

spsanderson Sep 25, 2024 Maintainer Author

spsanderson Nov 11, 2024 Maintainer Author

spsanderson
Jul 30, 2024
Maintainer

spsanderson
Sep 16, 2024
Maintainer Author

spsanderson
Sep 24, 2024
Maintainer Author

spsanderson
Sep 24, 2024
Maintainer Author

spsanderson
Sep 25, 2024
Maintainer Author

spsanderson
Sep 25, 2024
Maintainer Author

spsanderson
Nov 11, 2024
Maintainer Author