"Time elapsed since event" type of features #1114

JnsLns · 2023-03-10T21:46:04Z

JnsLns
Mar 10, 2023

Hi,

we are in the early stages of evaluating feathr and try to determine its flexibility and usability compared to our current inhouse solution. One important point is which types of features it can easily deal with.

One type we often use is exemplified by "number of days since last transaction". Meaning we have to look back in time from the instance time stamp and find the most recent transaction to then compute the difference between the instance timestamp and the transaction timestamp. By "instance" I mean the training example in the offline case and the current request in the online case.

In our current solution we cover this by making the instance's timestamp available to the aggregation function that is applied to the backward window. For feathr this wouldn't do much good of course, as the logic that can be used by windowed aggregation is very restricted (maybe deliberately forcing the user to break up complex feature logic into simpler and reusable steps).

Anyway, based on the feathr docs I came up with the following way how it could be achieved in the offline case: First, use WindowAggTransformation to create a feature which is the latest transaction's timestamp. Second, create another feature that is simply the observation timestamp. Finally, create a derived feature that is the difference of the two.

However, I'm not sure how this logic could elegantly translate to the online case. Because the "instance timestamp" that was available in the raw data is not going to be available in the online case. At least not naturally. I suppose we could create an input column in the live straming data that is simply the request timestamp. But it seems a little too complicated. Or we could use the current time, which is I think available via the supported SparkSQL expressions. But this would mean the online case would be based on partly different sources than the offline case. Certainly doable but appears to me as going against the spirit of streamlining the offline to online switch in terms of feature management.

I'd be interested in learning whether it's a requirement you also encounter and how you solve it. My main question being whether there's a more direct way in feathr that eludes me.

Thanks and best regards
Jonas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Time elapsed since event" type of features #1114

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

"Time elapsed since event" type of features #1114

JnsLns Mar 10, 2023

Replies: 0 comments

JnsLns
Mar 10, 2023