HDDM: binary or real input #937

sam-data-guy · 2022-05-23T12:12:05Z

sam-data-guy
May 23, 2022

From what I have read, the HDDM drift detectors are supposed to work on real-valued input values, but your documentation indicates that the inputs must be binary. Is there a reason?

Answered by smastelini

Oct 11, 2022

When I first checked this discussion, I assumed the documentation was wrong, but after refactoring the drift module and studying more about the detectors, I changed my mind.

Let me throw my two cents in: although the original paper states the input can be real-valued, all the math derivations and formulae assume a scenario using a 0-1 loss. In other words, a stream of bits that could be the result of checking whether or not a classifier correctly classifies an instance (0) or not (1).

The same goes for the MOA version (and, consequently, scikit-multiflow). The code expects data sampled from a Bernoulli distribution. Thus, the documentation is not wrong, and it took me ages to understand.

T…

View full answer

MaxHalford · 2022-05-23T12:44:50Z

MaxHalford
May 23, 2022
Maintainer

Indeed the documentation is wrong. Not too sure why to be honest! Those are artifacts of the moment we merged scikit-multiflow with creme. Feel free to open a pull request and fix it! :)

1 reply

smastelini May 23, 2022
Maintainer

I will also ping @jacobmontiel as he owned that portion of the code.

+1 for a PR :)

Those detectors deserve more love. I did some aesthetic work in KSWIN, but my knowledge about drift detectors is limited, to be honest.

sam-data-guy · 2022-10-11T08:24:02Z

sam-data-guy
Oct 11, 2022
Author

Sorry, I'm not sure exactly how to do that. Can you just take care of it? I'm not interested in being a contributor right now.

…

On Mon, May 23, 2022 at 3:45 PM Max Halford ***@***.***> wrote: Indeed the documentation is wrong. Not too sure why to be honest! Those are artifacts of the moment we merged scikit-multiflow with creme. Feel free to open a pull request and fix it! :) — Reply to this email directly, view it on GitHub <#937 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AL2ANDOFZYJECGDJXF4FNQLVLN4U7ANCNFSM5WVUHSRA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

1 reply

MaxHalford Oct 11, 2022
Maintainer

Sure we can take care of it. The drift detection module needs a bit of love anyway.

smastelini · 2022-10-11T11:17:10Z

smastelini
Oct 11, 2022
Maintainer

When I first checked this discussion, I assumed the documentation was wrong, but after refactoring the drift module and studying more about the detectors, I changed my mind.

Let me throw my two cents in: although the original paper states the input can be real-valued, all the math derivations and formulae assume a scenario using a 0-1 loss. In other words, a stream of bits that could be the result of checking whether or not a classifier correctly classifies an instance (0) or not (1).

The same goes for the MOA version (and, consequently, scikit-multiflow). The code expects data sampled from a Bernoulli distribution. Thus, the documentation is not wrong, and it took me ages to understand.

The drift detectors deserve more love and probably a better way to organize the algorithms per type, but this is not trivial at the time.

We could, of course, assume data follows other types of distribution as long as we know its range (as required by the Hoeffding bound). Given that, we need to create new formulae for handling this case. But this is to delve into the world of research.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDM: binary or real input #937

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

HDDM: binary or real input #937

sam-data-guy May 23, 2022

Replies: 3 comments · 2 replies

MaxHalford May 23, 2022 Maintainer

smastelini May 23, 2022 Maintainer

sam-data-guy Oct 11, 2022 Author

MaxHalford Oct 11, 2022 Maintainer

smastelini Oct 11, 2022 Maintainer

sam-data-guy
May 23, 2022

Replies: 3 comments 2 replies

MaxHalford
May 23, 2022
Maintainer

smastelini May 23, 2022
Maintainer

sam-data-guy
Oct 11, 2022
Author

MaxHalford Oct 11, 2022
Maintainer

smastelini
Oct 11, 2022
Maintainer