Some suggestions #22

ramnathv · 2019-05-12T17:21:07Z

@robinsones First off, great package! It makes creating and analyzing funnels clean and easy. Based on my navigation of the package's API, I had some suggestions:

You might want to make landed and registered datasets in the package so users can get started with the examples without having to define them. This can be extended to define out-of-memory versions of the data by taking advantage of dbplyr::src_dbi.

The notion of a join type is great and factors in the multiple scenarios that one might run into. However, I had trouble visualizing the execution and how each type results in a different output compared to the pure version of the join function. So I put together a helper function that allows one to visualize the differences. Based on this, here is my understanding of the working of after_join:

Join the tables using the regular version of {mode}_join
Filter out all records where event_x occurs before event_y
For each user_id only retain versions of events specified by type
- any-any will retain all records in Step (2)
- first-first, first-firstafter, and lastbefore-firstafter will retain only ONE record per user.

Is my understanding correct? It might be useful to add something like this to the documentation so it is really clear to users how these after_joins work.

after_join_all

after_join_all <- function(x, y, by, mode = 'inner', ...){
  types <- c(
    'first-first', 'first-firstafter', 'lastbefore-firstafter',
    'any-firstafter', 'any-any'
  )
  by_type <- function(type){
    after_join(x, y, ..., mode = mode, type = type) %>% 
      mutate(!!type := 'Y') 
  }
  join_fun <- match.fun(paste0(mode, '_join'))
  all_types <- types %>% 
    purrr::map(by_type)
  join_fun(x, y, by = by) %>% 
    Reduce(left_join, all_types, init = .)
}

user_id	timestamp.x	timestamp.y	first-first	first-firstafter	lastbefore-firstafter	any-firstafter	any-any
1	2018-07-01	2018-07-02	✅	✅	✅	✅	✅
3	2018-07-02	2018-07-02	✅	✅	✅	✅	✅
4	2018-07-01	2018-06-10	NA	NA	NA	NA	NA
4	2018-07-01	2018-07-02	NA	✅	✅	✅	✅
4	2018-07-04	2018-06-10	NA	NA	NA	NA	NA
4	2018-07-04	2018-07-02	NA	NA	NA	NA	NA
5	2018-07-10	2018-07-11	✅	✅	✅	✅	✅
5	2018-07-12	2018-07-11	NA	NA	NA	NA	NA
6	2018-07-07	2018-07-10	✅	✅	NA	✅	✅
6	2018-07-07	2018-07-11	NA	NA	NA	NA	✅
6	2018-07-08	2018-07-10	NA	NA	✅	✅	✅
6	2018-07-08	2018-07-11	NA	NA	NA	NA	✅

The text was updated successfully, but these errors were encountered:

robinsones · 2019-05-13T18:07:17Z

Thanks Ramnath! I'll work on these this week.

robinsones · 2019-05-23T18:47:49Z

@ramnathv from my understanding of this function, you'd want to specify by_user and by_time in it, and the by argument would be the same as the by_user. Is that right? If so, it seems simpler to replace by with by_user and by_time

How I reproduced your result:

landed <- tibble::tribble(
  ~user_id, ~timestamp,
  1, "2018-07-01",
  2, "2018-07-01",
  3, "2018-07-02",
  4, "2018-07-01",
  4, "2018-07-04",
  5, "2018-07-10",
  5, "2018-07-12",
  6, "2018-07-07",
  6, "2018-07-08"
) %>%
  mutate(timestamp = as.Date(timestamp))

registered <- tibble::tribble(
  ~user_id, ~timestamp,
  1, "2018-07-02",
  3, "2018-07-02",
  4, "2018-06-10",
  4, "2018-07-02",
  5, "2018-07-11",
  6, "2018-07-10",
  6, "2018-07-11",
  7, "2018-07-07"
) %>%
  mutate(timestamp = as.Date(timestamp))

after_join_all(x = landed, 
               y = registered, 
               by = "user_id",
               by_user = "user_id", 
               by_time = "timestamp")

ramnathv · 2019-05-23T19:12:43Z

@robinsones That will work! Note that I am proposing this only as an internal utility function to help make clear the distinction of the output produced by the different variations, since it is a little nuanced. It will only be useful when working with toy datasets.

robinsones self-assigned this May 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some suggestions #22

Some suggestions #22

ramnathv commented May 12, 2019

robinsones commented May 13, 2019

robinsones commented May 23, 2019 •

edited

Loading

ramnathv commented May 23, 2019

Some suggestions #22

Some suggestions #22

Comments

ramnathv commented May 12, 2019

robinsones commented May 13, 2019

robinsones commented May 23, 2019 • edited Loading

ramnathv commented May 23, 2019

robinsones commented May 23, 2019 •

edited

Loading