Replies: 1 comment 1 reply
-
I'm sorry to be blunt but I have no idea what you're talking about. What's a case? Do you have reference to something we can read? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi! I'm working on my thesis and using River for streaming analysis over process data. Thus far I'm liking river, but there are some performance issues since I essentially use a lot of transformer functions to represent an entire "case" at once. This is necessary, because when training a model I cannot just use the last seen value in the stream since it's missing values that have appeared earlier in the process' lifecycle. I can circumvent this by using transformer functions such as mean and min/max, and I wrote some of my own transformer functions to get the first/last (non-NA) value within a case/group, or the order of events within a case as separate column-features.
This works, but since I'm using quite large process logs (over a million records, with tens of thousands of cases), these transformers are starting to slow down my system. However, there is a point at which I don't need them: before I have seen the final label in the input data I can just predict for each incoming data instance thus I need the transformers, but once the label has appeared in the data (for instance, a 'customer accepted' event), the model can be trained on that datapoint, and afterwards there is no point to using the model because the final label is actually known for the case. Therefore, it is also fine to remove all of the transformer functions for that group, because that data is no longer needed to train the model.
Since my laptop actually can't handle not removing the transformers, I have found a way to remove them manually, for both individual transformers and transformerUnions (inside a for loop, called after learn_one is called on x):
As far as I know, the transformers have no way to remove groups from them apart from this (admittedly crude) way. I realise this might be quite a niche application for transformers and most applications wouldn't need to remove groups, but it would make for much cleaner code, and perhaps someone else might find it useful.
Would it make sense to add a 'remove_groups' function to the base transformer and TransformerUnion class, or is there a better way to remove unnecessary transformer groups?
Beta Was this translation helpful? Give feedback.
All reactions