Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How many datapoints can this work with? #15

Closed
bharatkrishna opened this issue Feb 17, 2021 · 2 comments
Closed

How many datapoints can this work with? #15

bharatkrishna opened this issue Feb 17, 2021 · 2 comments
Labels

Comments

@bharatkrishna
Copy link

I am using this library to create bins on 1-d data with around 35 million datapoints. It takes forever (4+ hours) and I had to kill it without results. If I try it with around 10,000 datapoints it works fine and returns results in few seconds.

Is this library only meant for datasets with smaller sizes?

@kevinjwalters
Copy link

Performance is mentioned in #7 too

@mthh mthh added the question label Aug 16, 2022
@mthh
Copy link
Owner

mthh commented Aug 18, 2022

It depends on what is meant by "large array" but indeed it is a classification algorithm that is quite expensive as the size of the array and the number of requested classes are increasing.
So I would say that it is rather suited to "medium arrays" as it still works quite fast for tens or even hundreds of thousands of datapoints
See my answer in #7 and lets continue the discussion there if necessary.

@mthh mthh closed this as completed Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants