Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accommodating Leap Years #8

Open
tpakeman opened this issue Dec 4, 2020 · 0 comments
Open

Accommodating Leap Years #8

tpakeman opened this issue Dec 4, 2020 · 0 comments

Comments

@tpakeman
Copy link

tpakeman commented Dec 4, 2020

Unfortunately this block does not correctly handle leap years.

It's quite a hard problem to diagnose and debug, but the issue lies in the day_in_period dimension.

What does this dimension do?

  • This dimension is used to calculate an ordinal - the position of the event_date within the period.
  • For example, if you are comparing January 2019 to January 2020, then you want January 3rd 2019 to be day '3', and January 3rd 2020 to also be day '3'.

Here is the SQL in the block:

sql:
   {% if current_date_range._is_filtered %}
     CASE
       WHEN {% condition current_date_range %} ${event_raw} {% endcondition %}
       THEN TIMESTAMP_DIFF(${event_raw},{% date_start current_date_range %},DAY)+1

       WHEN ${event_raw} between ${period_2_start} and ${period_2_end}
       THEN TIMESTAMP_DIFF(${event_raw}, ${period_2_start},DAY)+1

       WHEN ${event_raw} between ${period_3_start} and ${period_3_end}
       THEN TIMESTAMP_DIFF(${event_raw}, ${period_3_start},DAY)+1

       WHEN ${event_raw} between ${period_4_start} and ${period_4_end}
       THEN TIMESTAMP_DIFF(${event_raw}, ${period_4_start},DAY)+1
     END

   {% else %} NULL
   {% endif %} ;;

The leap year problem

  • Let's take the example of comparing 2020 to 2019. 2020 is a leap year and 2019 is not.
  • To simplify things, let's just assume the user is filtering to show the past 1 year, compared to the previous year. Here is the relevant part of the dimension:
CASE
        WHEN {% condition current_date_range %} ${event_raw} {% endcondition %}
        THEN TIMESTAMP_DIFF(${event_raw},{% date_start current_date_range %},DAY)+1
        WHEN ${event_raw} between ${period_2_start} and ${period_2_end}
        THEN TIMESTAMP_DIFF(${event_raw}, ${period_2_start},DAY)+1
...
  • This dimension compares the event date to the start of the period it falls in.
  • In our example, every date in 2020 will fall into 'This period' and every date in 2019 will fall into 'Last period'.
  • That means 2020 dates will have their distance in days measured from the start of the period - 'January 1st 2020', while 2019 dates will have their distance measured from 'January 1st 2019'
  • The issue comes with dates that fall after February 29th. Why?
  • Let's take a date like March 1st:
    • In 2020, March 1st is 31 + 29 days after Jan 1st = 60 days
    • In 2019, March 1st is 31 + 28 days after Jan 1st = 59 days

The effects

Why is this a problem?

  • We use this dimension to create the date_in_period dimension which is what coerces all of our dates to line up with the 'current period' dates
    • i.e. we want March 2019 to line up against the 'March 2020' label so we can compare them in the same row
    • sql: TIMESTAMP_ADD({% date_start current_date_range %},INTERVAL (${day_in_period}-1) DAY) ;;
  • Because leap years offset the calculation by one day, it means every date in a non-leap year is effectively off by one and the dates don't line up properly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant