-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
K-NN Classifier #263
K-NN Classifier #263
Conversation
One option is to pass all options altogether. Then KNNClassifier "splits" (via Keyword.split) the options it uses and passes all other options to the underlying algorithm, which will also use NimbleOptions to validate and raise in case of unknown/bad options. FWIW, I'd also call it simply |
What do you want to edit? We should probably make it consistent and make it always return a two-arity function. Is this what you want? |
@msluszniak could you please give a hand on this one? 🙌
Sorry to hear but also glad to you are fine! Have a speedy recovery! |
I am not sure if we want to specify the metric option as scholar/lib/scholar/neighbors/kd_tree.ex Line 35 in e0e92d0
or simply as type: {:in, [:minkowski, :cosine]} . Especially if we want to add more metrics, it might become an issue which of those are supported by different k-NN algorithms. Another thing is, as you say, whether normalization should be performed inside Scholar.Options.metric or inside the modules where metric can be specified (as mentioned here).
|
Yeah, this should be the way. :) |
Let's open up a separate issue to normalize how metric is handled. My suggestion would be to use |
Sure, I'll work on that.
I'm sorry, I wish you a quick recovery. |
lib/scholar/neighbors/kd_tree.ex
Outdated
defnp predict_n(tree, point, opts) do | ||
k = opts[:k] | ||
defnp predict_n(tree, point) do | ||
k = tree.num_neighbors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we now pass num_neighbors
in fit, I think we may add a note that there is no need to compute all KDTree from scratch for a different number of nearest neighbors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, but then we do the same in BruteKNN
and RandomProjectionForest
. They all now take num_neighbors
as an option to fit
.
I've sent a PR with improvement |
Very quick, thanks. :) |
Alright, almost done. I think there is a bug in
Then
while
This doesn't seem right. I am having a look at it. |
] | ||
) | ||
""" | ||
defn predict_proba(model, x) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename it to predict_probability
, because that's what we call these functions everywhere else! However, if they are incorrect and we can't figure out why, we can remove this for now and add it in future PRs. :) Your call!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather investigate it now. I don't expect it to take long, but then, you never know. :)
|
||
indices = | ||
Nx.stack( | ||
[Nx.iota(Nx.shape(labels_pred), axis: 0), Nx.take(model.labels, labels_pred)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is what we want?
[Nx.iota(Nx.shape(labels_pred), axis: 0), Nx.take(model.labels, labels_pred)], | |
[Nx.iota(Nx.shape(labels_pred), axis: 0), labels_pred], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so. Let me rename labels_pred
to neighbor_labels
, I think it is a more suitable name.
Co-authored-by: José Valim <[email protected]>
There are several dilemmas I had while implementing this:
How to provide k-NN algorithm and algorithm-specific options? The way it's currently done is either as:algorithm_name
or{:algorithm_name, algorithm_opts}
(someone correct me if I'm wrong, but the latter should be idiomatic Elixir way of specifying both the module and options to be passed to the initialization function). Another way would be having a separate option foralgorithm_name
and algorithm-specific optionsalgorithm_opts
.Literally every k-NN algorithm in Scholar takesnum_neighbors
as an option. I would however prefer passing it as separate option toKNNClassifier
instead of nesting it inside algorithm-specific options. What I mean is doingScholar.Neighbors.KNNClassifier.fit(x, y, num_neighbors: 3, num_classes: 2)
instead ofScholar.Neighbors.KNNClassifier.fit(x, y, {:brute, [num_neighbors: 3]}, num_classes: 2)
. Similarly for the metric option. Also, currently it is possible to doScholar.Neighbors.KNNClassifier.fit(x, y, {:brute, [num_neighbors: 5]}, num_neighbors: 3, num_classes: 2)
andnum_neighbors: 5
will overridenum_neighbors: 3
. Perhaps an error should be raised to prevent this?TODO:
I thinkKDTree.predict/2
should be updated to return{neighbors, distances}
(currently it returns justneighbors
; an unit-test is failing because of this). I might need help with this one.ImplementKNNClassifier.predict_proba/2
.:euclidean
.Scholar.Options.metric
might also need to be edited. Alternative is removing it from k-NN modules and specifying metrics as atoms in the docs.Maybe few more unit-tests.Last, I am sorry it took me slightly longer to implement this than I said. I suffered a horrible bike crash this week. I am fine, but still recovering, both physically and mentally. Started implementing
KNNRegressor
in parallel with this one - shouldn't take long.