Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvement to subscription_history #9

Closed
1 of 4 tasks
dejanzalik opened this issue Jul 17, 2023 · 17 comments
Closed
1 of 4 tasks

improvement to subscription_history #9

dejanzalik opened this issue Jul 17, 2023 · 17 comments
Assignees

Comments

@dejanzalik
Copy link

Is there an existing feature request for this?

  • I have searched the existing issues

Describe the Feature

My overall question / issue is regarding recharge tables needing additional details where an order_line_item can be connected to its relevant subscription so that further details can be extracted (e.g. subscription interval frequency)

Backstory: in the provided recharge model "recharge__subscription_overview" the row that's being pulled from the subscription_history table is the most recent one. While this might work for some subscription setups, it does not for ours for the following reason: customers can swap the product that's tied to a subscription, which means that the external_product_id_ecommerce, as well as the external_variant_id_ecommerce, will change within one subscription. For us this creates the following mismatch:

if, after an order has happened (which will include the specific external_product_id_ecommerce and external_variant_id_ecommerce), the customer swaps the product (which will change the external_product_id_ecommerce and external_variant_id_ecommerce in the most recent row in the subscription_history table), the link from the most recent record in the subscription_history table and the order_line_item (or charge_line_item) table will break, since both the external_product_id_ecommerce and the external_variant_id_ecommerce are now different. this will lead to those specific order_line_items from historical records to not have these fields populated unless the subscription hasn't been changed (and even then problems arise since customers - at least in our case - can change the quantity and interval frequency, which will not change the external_product_id_ecommerce and external_variant_id_ecommerce, but will simply have wrong records attached for quantity and interval frequency that might not equal the ones from the past)

I wonder if there is a way to know which record in subscription_history has been "used" when an order was created, which would then allow the specific row to be matched with the item from the order_line_item table (or charge_line_item). Unfortunately subscription_id being removed added more difficulty to this, so I wonder if it's at all possible to reach out to recharge and see if this (or potentially additional fields) could be added to the table.

thanks in advance for any effort and possible solutions!

Describe alternatives you've considered

i've tried the following (up until subscription_id was still available in the order_line_item table)

  • using date stamps to see if a link can be established between order_date and subscription_updated_at
  • using row_number() to see if logic can be build for which subscription state will most likely be the one that's "used" during the upcoming order

Are you interested in contributing this feature?

  • Yes.
  • Yes, but I will need assistance and will schedule time during your office hours for guidance.
  • No.

Anything else?

No response

@fivetran-joemarkiewicz
Copy link
Collaborator

Hi @dejanzalik thanks so much for taking the time to write this issue up and let us know your experience using the package and some nuances you have identified.

Our current sprint just kicked off, but we will be sure to continue investigating this and scope out a possible solution. You can expect us to spend more dedicated time on this investigation in our upcoming sprint (starting mid next week). My team or I will be sure to share more updates and finding here.

Thank you again 😄

@fivetran-catfritz
Copy link
Contributor

Hi @dejanzalik to update you, I am picking up this issue and am aiming to have an update in the next couple weeks. Stay tuned!

@fivetran-catfritz fivetran-catfritz self-assigned this Aug 10, 2023
@fivetran-catfritz
Copy link
Contributor

Hello again @dejanzalik, we have heard some preliminary news from the folks at Recharge! The purchase_item_id found in the charge_line_item and order_line_item tables will be the same as the subscription_id if it is for a subscription, so we can easily join on that column. (Also to note purchase_item_id will be the same as onetime_id if it is associated with a one-time charge.) We will need to update our models and test it out on our end, but I wanted to let you know in case you wanted to try it yourself in the meantime.

@fivetran-catfritz
Copy link
Contributor

@dejanzalik I have now also created a test branch with the changes. If you would like to try it out, it would be much appreciated!

You can install the test branch by adding the below in place of the normal recharge package lines in your packages.yml:

  - git: https://github.com/fivetran/dbt_recharge.git
    revision: feature/subscription-updates
    warn-unpinned: false

@dejanzalik
Copy link
Author

great, thanks so much @fivetran-catfritz! I'll look into it!

@fivetran-catfritz
Copy link
Contributor

HI @dejanzalik just following up if you've had a chance to take a look? I'll be working on our own validations in the next couple weeks, but having user input is invaluable! 😃

@dejanzalik
Copy link
Author

hey @fivetran-catfritz, sorry had a couple of packed days, but also forgot to respond here 😃 whoopsy.

So from what I can see in the changes in the branch, the info was added where I can identify and link the product in the order_line_item (or charge_line_item) table with its corresponding subscription (from the subscription or subscription history table).

What I still cannot do however, is determine the actual status of the subscription at the point of when the order was made (e.g. what was the frequency at that point) since the product itself can be changed, as well as the frequency and quantity, by the customer.

not really sure how this could be done - I tried multiple approaches by utilizing dates as well as SKUs, but could not find a proper way. am I missing something here?

thanks in advance for working on it!

@fivetran-catfritz
Copy link
Contributor

fivetran-catfritz commented Aug 31, 2023

Thanks @dejanzalik for taking a look! Hmm I wonder if this is something you could get by bringing in the subscription_history source where updated_at = created_at, to get the state at the time of creation? In this history table, whenever the updated_at value changes, that is when a new row is created. This makes me realize I need to make some changes in the source, but curious what you think in the meantime.

@dejanzalik
Copy link
Author

I was thinking the same when I was testing it for the first time, but unfortunately the customer can make multiple changes within the same day, which means that multiple rows link to one row from the orders (or charges) table. I'll try to work on it a bit more and see what I can come up with - maybe I can use the last update within a day and take it the status of that day and can test if it checks out with the real order status. Will keep you posted!

@fivetran-catfritz
Copy link
Contributor

Hi @dejanzalik would having a model that includes the subscription history with details be beneficial? You can take a look at an example here. I've was thinking a flag for the earliest associated order might help, although that might not address everything. Perhaps it would be simpler to discuss this further if you're open to a call?

@dejanzalik
Copy link
Author

I think the subscription history table as it is now is perfectly fine in terms of details (at least for us), the only thing that would be great to have out of the box is some sort of way to more easily connect the order line item with the corresponding actual subscription state at that point. However, I am not sure if the fact that our case might be different from other use cases simply means that we need to custom model it on our side for our setup.

Also: I unfortunately haven't been able to spend too much time on this in the recent past because of other projects, but will put some time on the side to dig deeper and we could maybe have a call after that to align, if it works for you.

@dejanzalik
Copy link
Author

hey @fivetran-catfritz, so I tried the above logic and so far it looks good on my side. basically what I did is

  1. aggregate the table subscription_history to the last change of the day
  2. join to the orders / order_lines table using purchase_item_id and subscription_id and order_date >= subscription_update_at (to only get the records that were updated before the order was processed)
  3. once joined, rank records from the subscription_history table by partitioning by order_id, purchase_item_id and ordering by subscription_updated_at descending (in order to get the closest record to the order date)
  4. filter for those where rank = 1

I will keep looking into it and test whether everything fits, but this basically delivered what I was looking for - now I know which order frequency was active when order was placed

@fivetran-catfritz
Copy link
Contributor

Thanks for the update, @dejanzalik! Would you be open to sharing the model you made? It might be useful to incorporate all or parts of it. Also I will have capacity to pick this task back up in the next week--would you have any availability for a call?

@dejanzalik
Copy link
Author

hey @fivetran-catfritz, sure - please find below. I would be available for a call on Tuesday or Thursday (I'm on Vienna time - CMT+2 - it's currently 13:32). Let me know if that works for you.

with

recharge_order_detail_raw as (

    select
    orders.*,
    order_line_item.* except(market, order_id),
    from {{ ref('stg_recharge__order') }} as orders
    left join {{ ref('stg_recharge__order_line_item') }} as order_line_item on order_line_item.order_id = orders.order_id
)

,subscription_history as (

    select
    *
    from (

        select

        market,
        subscription_id,
        date(subscription_updated_at) as subscription_updated_at,
        product_title,
        sku,
        order_interval_frequency,
        quantity,
        row_number() over (partition by subscription_id, date(subscription_updated_at) order by subscription_updated_at desc) as last_record_of_the_day
        
        from {{ ref('stg_recharge__subscription_history') }}

    ) 
    where last_record_of_the_day = 1

)

,recharge_order_detail as (

    select
    * except(order_without_subscription),
    count(external_product_id_ecommerce) over (partition by external_order_id_ecommerce) as count_order_products_recharge,
    row_number() over (partition by external_order_id_ecommerce order by external_product_id_ecommerce, external_variant_id_ecommerce) as rank_line_item,
    row_number() over (partition by external_order_id_ecommerce, external_product_id_ecommerce order by external_product_id_ecommerce, external_variant_id_ecommerce) as rank_matching_items
    from (

        select

        market,
        customer_id,
        order_id,
        cast(external_order_id_ecommerce as int64) as external_order_id_ecommerce,
        order_processed_at,
        order_type,

        order_line_item_title,
        sku,
        cast(external_product_id_ecommerce as int64) as external_product_id_ecommerce,
        cast(external_variant_id_ecommerce as int64) as external_variant_id_ecommerce,
        quantity,
        purchase_item_id,
        purchase_item_type, --subscription yes/no identifier

        avg(case when purchase_item_type != "subscription" then 1 else 0 end) over (partition by order_id) as order_without_subscription

    from recharge_order_detail_raw

    ) 
    where 1=1
    and order_without_subscription < 1 -- ko orders without subscriptions

)

,add_subscription_history_details as (

    select
    * except(closest_record_to_order_date)
    from (

        select
        recharge_order_detail.*,
        subscription_history.subscription_updated_at,
        subscription_history.order_interval_frequency,
        case
            when purchase_item_type = "onetime" then 1
            else row_number() over (partition by order_id, purchase_item_id order by subscription_updated_at desc)
        end as closest_record_to_order_date 
        from recharge_order_detail as recharge_order_detail
        left join subscription_history as subscription_history on subscription_history.subscription_id = recharge_order_detail.purchase_item_id and date(order_processed_at) >= subscription_updated_at

    )
    where closest_record_to_order_date = 1

)

,check_tables as (

  select "a. recharge_order_detail" as table_name, count(*) as total_rows from recharge_order_detail
  union all
  select "b. add_subscription_history_details" as table_name, count(*) as total_rows from add_subscription_history_details

)

{# select * from check_tables order by 1 #}

select * from add_subscription_history_details --some subscription products do not have a match the with subscription history table because the historical table started being recorded after the order has happened (subscription history table only starts when source is connected)

@fivetran-catfritz
Copy link
Contributor

Thank you @dejanzalik. I will message you on dbt's Slack to set up the call!

@fivetran-catfritz
Copy link
Contributor

fivetran-catfritz commented Sep 19, 2023

Hi @dejanzalik To give you an update, I still haven't heard back from Recharge, but I did a little more investigation on my end. I think we can accomplish what you need without getting an answer from them, however curious what you think!

I focused on the timestamps instead of date so even if there were multiple updates in one day, I think we can still match. I joined the subscription_history record where the order_processed_at timestamp was between its subscription_updated_at and the next subscription_history record's subscription_updated_at. I added a lead function in the subscription_history CTE, which I then used that for the join in the final CTE. I think this way also avoids having to rank via row number.

with
recharge_order_detail_raw as (

    select
    orders.*,
    order_line_item.* except(market, order_id),
    from {{ ref('stg_recharge__order') }} as orders
    left join {{ ref('stg_recharge__order_line_item') }} as order_line_item on order_line_item.order_id = orders.order_id

), subscription_history as (

    select
    *
    from (

        select

        market,
        subscription_id,
        date(subscription_updated_at) as subscription_updated_at,
        product_title,
        sku,
        order_interval_frequency,
        quantity,
        lead(subscription_updated_at) over (partition by subscription_id 
            order by subscription_updated_at asc)
            as next_subscription_update
        
        from {{ ref('stg_recharge__subscription_history') }}

    ) 

), add_subscription_history_details as (

    select
    * except(closest_record_to_order_date)
    from (

        select
        recharge_order_detail.*,
        subscription_history.subscription_updated_at,
        subscription_history.order_interval_frequency,
        case
            when purchase_item_type = "onetime" then 1
            else row_number() over (partition by order_id, purchase_item_id order by subscription_updated_at desc)
        end as closest_record_to_order_date 
        from recharge_order_detail as recharge_order_detail
        left join subscription_history as subscription_history on subscription_history.subscription_id = recharge_order_detail.purchase_item_id 
            and date(order_processed_at) >= subscription_history.subscription_updated_at
            and date(order_processed_at) <= coalesce(subscription_history.next_subscription_update, order_processed_at)
    )
)

select * from add_subscription_history_details 

@fivetran-catfritz
Copy link
Contributor

For another update, while I still haven't heard back from Recharge, I've created a new issue #14 where we can continue this conversation. I've made a new issue since we will need to create a new model rather than modify an existing one, and this will allow our team to allocate the right time and resources to it.

I'm closing this issue for now, however please feel free to follow along or reach out again to us in the new issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants