HDDM: binary or real input #937
-
From what I have read, the HDDM drift detectors are supposed to work on real-valued input values, but your documentation indicates that the inputs must be binary. Is there a reason? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
Indeed the documentation is wrong. Not too sure why to be honest! Those are artifacts of the moment we merged scikit-multiflow with creme. Feel free to open a pull request and fix it! :) |
Beta Was this translation helpful? Give feedback.
-
Sorry, I'm not sure exactly how to do that. Can you just take care of it?
I'm not interested in being a contributor right now.
…On Mon, May 23, 2022 at 3:45 PM Max Halford ***@***.***> wrote:
Indeed the documentation is wrong. Not too sure why to be honest! Those
are artifacts of the moment we merged scikit-multiflow with creme. Feel
free to open a pull request and fix it! :)
—
Reply to this email directly, view it on GitHub
<#937 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AL2ANDOFZYJECGDJXF4FNQLVLN4U7ANCNFSM5WVUHSRA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
When I first checked this discussion, I assumed the documentation was wrong, but after refactoring the drift module and studying more about the detectors, I changed my mind. Let me throw my two cents in: although the original paper states the input can be real-valued, all the math derivations and formulae assume a scenario using a 0-1 loss. In other words, a stream of bits that could be the result of checking whether or not a classifier correctly classifies an instance (0) or not (1). The same goes for the MOA version (and, consequently, scikit-multiflow). The code expects data sampled from a Bernoulli distribution. Thus, the documentation is not wrong, and it took me ages to understand. The drift detectors deserve more love and probably a better way to organize the algorithms per type, but this is not trivial at the time. We could, of course, assume data follows other types of distribution as long as we know its range (as required by the Hoeffding bound). Given that, we need to create new formulae for handling this case. But this is to delve into the world of research. |
Beta Was this translation helpful? Give feedback.
When I first checked this discussion, I assumed the documentation was wrong, but after refactoring the drift module and studying more about the detectors, I changed my mind.
Let me throw my two cents in: although the original paper states the input can be real-valued, all the math derivations and formulae assume a scenario using a 0-1 loss. In other words, a stream of bits that could be the result of checking whether or not a classifier correctly classifies an instance (0) or not (1).
The same goes for the MOA version (and, consequently, scikit-multiflow). The code expects data sampled from a Bernoulli distribution. Thus, the documentation is not wrong, and it took me ages to understand.
T…