Multi-label classification in river #1184

Elanktar · 2023-02-17T09:21:25Z

Elanktar
Feb 17, 2023

Hello,

I'm a PhD student working on multi-label classification algorithms on data streams.
My goal is to design new algorithms that can adapt over time as the data stream is also changing, considering different phenomena such as the concept drift, the feature evolution and the label evolution.

The team I integrated already worked on multi-label classification with streams of data. We also plan to reimplement on a Continual Learning plateform the ODM algorithm from the paper “Multi-Label kNN classifier with Online Dual Memory on data stream” from Xihui Wang et al.

For all these reasons, we’re considering using River as a main tool to do the experiments.

After reading the doc and looking for information in the github repository, I saw that you have already implemented the possibility to vary the number of features and labels in the data stream over time. I saw that multioutput models and metrics are implemented as well, and I still have some questions about the multi-label classification in river :

Is multi-label classification well handled by river?
Is the precision@k already implemented in river?
Does river provide any multi-label benchmark one can launch?
Are multi-label datasets (20NewsGroups, Bookmarks or even bigger real world datasets) easy to load and use in river?
Is it possible to generate synthetic multi-label data streams like in MOA?
Are the labels also handled with dictionaries?

I wish you a great day and I'm looking forward for your answer.

Answered by MaxHalford

Feb 17, 2023

Hello @Elanktar, it's a pleasure to e-meet you. Your PhD sounds interesting and very much aligned with how we do machine learning. We'd be glad to collaborate.

Is multi-label classification well handled by river?

There is a dedicated multioutput module. It's pretty basic, but it works quite well. I believe the models in there also support a varying number of outputs.

Is the precision@k already implemented in river?

No, it isn't. I don't much about it, but I thought it was mostly used for ranking/recsys tasks. I guess it can make sense for multi-label tasks too, especially for cases with a super-large amount of labels/classes.

Does river provide any multi-label benchmark one can launch?

View full answer

MaxHalford · 2023-02-17T09:41:32Z

MaxHalford
Feb 17, 2023
Maintainer

Hello @Elanktar, it's a pleasure to e-meet you. Your PhD sounds interesting and very much aligned with how we do machine learning. We'd be glad to collaborate.

Is multi-label classification well handled by river?

There is a dedicated multioutput module. It's pretty basic, but it works quite well. I believe the models in there also support a varying number of outputs.

Is the precision@k already implemented in river?

No, it isn't. I don't much about it, but I thought it was mostly used for ranking/recsys tasks. I guess it can make sense for multi-label tasks too, especially for cases with a super-large amount of labels/classes.

Does river provide any multi-label benchmark one can launch?

Sadly not. We have a benchmarks folder you could contribute to though. The first thing to do would be to implement a dedicated "track" here.

Are multi-label datasets (20NewsGroups, Bookmarks or even bigger real world datasets) easy to load and use in river?

I wouldn't say easy, no. River has a stream.iter_sklearn_dataset function. You can thus load a multi-label dataset with scikit-learn and iterate over it that way. Ideally, we could add dedicated wrappers to these datasets here.

Is it possible to generate synthetic multi-label data streams like in MOA?

I think there are a few methods we ported over from scikit-multiflow, for instance here.

Are the labels also handled with dictionaries?

Yes, they are.

I wish you a great day and I'm looking forward for your answer.

Have a great day too! Again, we'd love to improve River's support for multi-output tasks. Your PhD is a great stepping stone to do so.

0 replies

Elanktar · 2023-02-22T14:19:09Z

Elanktar
Feb 22, 2023
Author

Hello @MaxHalford, thank you for your quick answer!
My supervisor and I are thinking about new questions we have, but also how we could collaborate and contribute for river.
I'll contact you again as soon as possible!
Have a great day!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-label classification in river #1184

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Multi-label classification in river #1184

Elanktar Feb 17, 2023

Replies: 2 comments

MaxHalford Feb 17, 2023 Maintainer

Elanktar Feb 22, 2023 Author

Elanktar
Feb 17, 2023

MaxHalford
Feb 17, 2023
Maintainer

Elanktar
Feb 22, 2023
Author