Multi-label classification in river #1184
-
Hello, I'm a PhD student working on multi-label classification algorithms on data streams. The team I integrated already worked on multi-label classification with streams of data. We also plan to reimplement on a Continual Learning plateform the ODM algorithm from the paper “Multi-Label kNN classifier with Online Dual Memory on data stream” from Xihui Wang et al. For all these reasons, we’re considering using River as a main tool to do the experiments. After reading the doc and looking for information in the github repository, I saw that you have already implemented the possibility to vary the number of features and labels in the data stream over time. I saw that multioutput models and metrics are implemented as well, and I still have some questions about the multi-label classification in river :
I wish you a great day and I'm looking forward for your answer. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hello @Elanktar, it's a pleasure to e-meet you. Your PhD sounds interesting and very much aligned with how we do machine learning. We'd be glad to collaborate.
There is a dedicated
No, it isn't. I don't much about it, but I thought it was mostly used for ranking/recsys tasks. I guess it can make sense for multi-label tasks too, especially for cases with a super-large amount of labels/classes.
Sadly not. We have a
I wouldn't say easy, no. River has a
I think there are a few methods we ported over from scikit-multiflow, for instance here.
Yes, they are.
Have a great day too! Again, we'd love to improve River's support for multi-output tasks. Your PhD is a great stepping stone to do so. |
Beta Was this translation helpful? Give feedback.
-
Hello @MaxHalford, thank you for your quick answer! |
Beta Was this translation helpful? Give feedback.
Hello @Elanktar, it's a pleasure to e-meet you. Your PhD sounds interesting and very much aligned with how we do machine learning. We'd be glad to collaborate.
There is a dedicated
multioutput
module. It's pretty basic, but it works quite well. I believe the models in there also support a varying number of outputs.No, it isn't. I don't much about it, but I thought it was mostly used for ranking/recsys tasks. I guess it can make sense for multi-label tasks too, especially for cases with a super-large amount of labels/classes.