K-NN Classifier #263

krstopro · 2024-05-12T19:20:13Z

There are several dilemmas I had while implementing this:

How to provide k-NN algorithm and algorithm-specific options? The way it's currently done is either as :algorithm_name or {:algorithm_name, algorithm_opts} (someone correct me if I'm wrong, but the latter should be idiomatic Elixir way of specifying both the module and options to be passed to the initialization function). Another way would be having a separate option for algorithm_name and algorithm-specific options algorithm_opts.
Literally every k-NN algorithm in Scholar takes num_neighbors as an option. I would however prefer passing it as separate option to KNNClassifier instead of nesting it inside algorithm-specific options. What I mean is doing Scholar.Neighbors.KNNClassifier.fit(x, y, num_neighbors: 3, num_classes: 2) instead of Scholar.Neighbors.KNNClassifier.fit(x, y, {:brute, [num_neighbors: 3]}, num_classes: 2). Similarly for the metric option. Also, currently it is possible to do Scholar.Neighbors.KNNClassifier.fit(x, y, {:brute, [num_neighbors: 5]}, num_neighbors: 3, num_classes: 2) and num_neighbors: 5 will override num_neighbors: 3. Perhaps an error should be raised to prevent this?

TODO:

~~I think KDTree.predict/2 should be updated to return {neighbors, distances} (currently it returns just neighbors; an unit-test is failing because of this). I might need help with this one.~~
~~Implement KNNClassifier.predict_proba/2.~~
Add more metrics, e.g. :euclidean.
Not sure, but Scholar.Options.metric might also need to be edited. Alternative is removing it from k-NN modules and specifying metrics as atoms in the docs.
~~Maybe few more unit-tests.~~

Last, I am sorry it took me slightly longer to implement this than I said. I suffered a horrible bike crash this week. I am fine, but still recovering, both physically and mentally. Started implementing KNNRegressor in parallel with this one - shouldn't take long.

lib/scholar/neighbors/knn_classifier.ex

josevalim · 2024-05-12T19:57:15Z

How to provide k-NN algorithm and algorithm-specific options?

One option is to pass all options altogether. Then KNNClassifier "splits" (via Keyword.split) the options it uses and passes all other options to the underlying algorithm, which will also use NimbleOptions to validate and raise in case of unknown/bad options.

FWIW, I'd also call it simply :algorithm.

josevalim · 2024-05-12T20:03:00Z

Not sure, but Scholar.Options.metric might also need to be edited. Alternative is removing it from k-NN modules and specifying metrics as atoms in the docs.

What do you want to edit? We should probably make it consistent and make it always return a two-arity function. Is this what you want?

josevalim · 2024-05-12T20:05:20Z

I think KDTree.predict/2 should be updated to return {neighbors, distances} (currently it returns just neighbors; an unit-test is failing because of this). I might need help with this

@msluszniak could you please give a hand on this one? 🙌

Last, I am sorry it took me slightly longer to implement this than I said. I suffered a horrible bike crash this week. I am fine, but still recovering, both physically and mentally.

Sorry to hear but also glad to you are fine! Have a speedy recovery!

krstopro · 2024-05-12T20:18:49Z

Not sure, but Scholar.Options.metric might also need to be edited. Alternative is removing it from k-NN modules and specifying metrics as atoms in the docs.

What do you want to edit? We should probably make it consistent and make it always return a two element function. Is this what you want?

I am not sure if we want to specify the metric option as

scholar/lib/scholar/neighbors/kd_tree.ex

Line 35 in e0e92d0

type: {:custom, Scholar.Options, :metric, []},

or simply as type: {:in, [:minkowski, :cosine]}. Especially if we want to add more metrics, it might become an issue which of those are supported by different k-NN algorithms. Another thing is, as you say, whether normalization should be performed inside Scholar.Options.metric or inside the modules where metric can be specified (as mentioned here).

krstopro · 2024-05-12T20:20:39Z

How to provide k-NN algorithm and algorithm-specific options?

One option is to pass all options altogether. Then KNNClassifier "splits" (via Keyword.split) the options it uses and passes all other options to the underlying algorithm, which will also use NimbleOptions to validate and raise in case of unknown/bad options.

FWIW, I'd also call it simply :algorithm.

Yeah, this should be the way. :)

josevalim · 2024-05-12T20:39:17Z

I am not sure if we want to specify the metric option as

Let's open up a separate issue to normalize how metric is handled. My suggestion would be to use {:custom, Scholar.Options, :metric, []} everywhere and, if you cannot handle any metric, then you explicitly opt-in to the only ones you can.

msluszniak · 2024-05-12T21:35:28Z

@msluszniak could you please give a hand on this one? 🙌

Sure, I'll work on that.

Last, I am sorry it took me slightly longer to implement this than I said. I suffered a horrible bike crash this week. I am fine, but still recovering, both physically and mentally.

I'm sorry, I wish you a quick recovery.

msluszniak · 2024-05-12T21:46:03Z

lib/scholar/neighbors/kd_tree.ex

-  defnp predict_n(tree, point, opts) do
-    k = opts[:k]
+  defnp predict_n(tree, point) do
+    k = tree.num_neighbors


As we now pass num_neighbors in fit, I think we may add a note that there is no need to compute all KDTree from scratch for a different number of nearest neighbors.

Perhaps, but then we do the same in BruteKNN and RandomProjectionForest. They all now take num_neighbors as an option to fit.

msluszniak · 2024-05-13T11:44:57Z

I think KDTree.predict/2 should be updated to return {neighbors, distances} (currently it returns just neighbors; an unit-test is failing because of this). I might need help with this one.

I've sent a PR with improvement

krstopro · 2024-05-13T11:57:42Z

I think KDTree.predict/2 should be updated to return {neighbors, distances} (currently it returns just neighbors; an unit-test is failing because of this). I might need help with this one.

I've sent a PR with improvement

Very quick, thanks. :)

krstopro · 2024-05-14T16:36:06Z

Alright, almost done. I think there is a bug in predict_proba/2. For example, if we have

x_train = Nx.tensor([[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]])
y_train = Nx.tensor([0, 0, 0, 1, 1])
model = Scholar.Neighbors.KNNClassifier.fit(x_train, y_train, num_neighbors: 3, num_classes: 2)
x = Nx.tensor([[1, 3], [4, 2], [3, 6]])

Then Scholar.Neighbors.KNNClassifier.predict(model, x) gives

#Nx.Tensor<
  s64[3]
  [0, 0, 1]
>

while Scholar.Neighbors.KNNClassifier.predict_proba(model, x) gives

#Nx.Tensor<
  f32[3][2]
  [
    [1.0, 0.0],
    [1.0, 0.0],
    [1.0, 0.0]
  ]
>

This doesn't seem right. I am having a look at it.

josevalim · 2024-05-14T17:22:27Z

lib/scholar/neighbors/knn_classifier.ex

+        ]
+      )
+  """
+  defn predict_proba(model, x) do


Let's rename it to predict_probability, because that's what we call these functions everywhere else! However, if they are incorrect and we can't figure out why, we can remove this for now and add it in future PRs. :) Your call!

I would rather investigate it now. I don't expect it to take long, but then, you never know. :)

josevalim · 2024-05-14T17:42:26Z

lib/scholar/neighbors/knn_classifier.ex

+
+    indices =
+      Nx.stack(
+        [Nx.iota(Nx.shape(labels_pred), axis: 0), Nx.take(model.labels, labels_pred)],


I think this is what we want?

Suggested change

[Nx.iota(Nx.shape(labels_pred), axis: 0), Nx.take(model.labels, labels_pred)],

[Nx.iota(Nx.shape(labels_pred), axis: 0), labels_pred],

Yes, I think so. Let me rename labels_pred to neighbor_labels, I think it is a more suitable name.

lib/scholar/neighbors/knn_classifier.ex

Co-authored-by: José Valim <[email protected]>

Krsto Proroković added 3 commits May 12, 2024 20:52

Major update, submitting a PR

429b72a

Update doc

ead78d1

Update doc

9993867

krstopro commented May 12, 2024

View reviewed changes

lib/scholar/neighbors/knn_classifier.ex Outdated Show resolved Hide resolved

msluszniak reviewed May 12, 2024

View reviewed changes

Krsto Proroković added 6 commits May 14, 2024 15:19

Add distance to KDTree.predict/2

584b306

Update doc

410bb99

Update doc

6b9ddd0

Add metric to RandomProjectionForest and LargeVis, more unit-tests, etc

ea158f1

Add predict_proba/2

19f1f75

Bug fix.

2580ebd

josevalim reviewed May 14, 2024

View reviewed changes

Rename predict_proba to predict_probability, fix a bug inside of it

1837382

krstopro commented May 14, 2024

View reviewed changes

lib/scholar/neighbors/knn_classifier.ex Outdated Show resolved Hide resolved

Remove Nx.Type.merge in predict_probability

4010e73

Co-authored-by: José Valim <[email protected]>

josevalim approved these changes May 14, 2024

View reviewed changes

msluszniak approved these changes May 14, 2024

View reviewed changes

krstopro merged commit ffaac87 into elixir-nx:main May 14, 2024
2 checks passed

josevalim mentioned this pull request May 14, 2024

Break RadiusNearestNeighbors into RNNClassifier and RNNRegression #266

Open

krstopro deleted the knn-classifier branch May 15, 2024 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K-NN Classifier #263

K-NN Classifier #263

krstopro commented May 12, 2024 •

edited

Loading

josevalim commented May 12, 2024

josevalim commented May 12, 2024 •

edited

Loading

josevalim commented May 12, 2024

krstopro commented May 12, 2024

krstopro commented May 12, 2024 •

edited

Loading

josevalim commented May 12, 2024

msluszniak commented May 12, 2024

msluszniak May 12, 2024 •

edited

Loading

krstopro May 14, 2024

msluszniak commented May 13, 2024

krstopro commented May 13, 2024

krstopro commented May 14, 2024 •

edited

Loading

josevalim May 14, 2024

krstopro May 14, 2024

josevalim May 14, 2024

krstopro May 14, 2024 •

edited

Loading

	[Nx.iota(Nx.shape(labels_pred), axis: 0), Nx.take(model.labels, labels_pred)],
	[Nx.iota(Nx.shape(labels_pred), axis: 0), labels_pred],

K-NN Classifier #263

K-NN Classifier #263

Conversation

krstopro commented May 12, 2024 • edited Loading

josevalim commented May 12, 2024

josevalim commented May 12, 2024 • edited Loading

josevalim commented May 12, 2024

krstopro commented May 12, 2024

krstopro commented May 12, 2024 • edited Loading

josevalim commented May 12, 2024

msluszniak commented May 12, 2024

msluszniak May 12, 2024 • edited Loading

Choose a reason for hiding this comment

krstopro May 14, 2024

Choose a reason for hiding this comment

msluszniak commented May 13, 2024

krstopro commented May 13, 2024

krstopro commented May 14, 2024 • edited Loading

josevalim May 14, 2024

Choose a reason for hiding this comment

krstopro May 14, 2024

Choose a reason for hiding this comment

josevalim May 14, 2024

Choose a reason for hiding this comment

krstopro May 14, 2024 • edited Loading

Choose a reason for hiding this comment

krstopro commented May 12, 2024 •

edited

Loading

josevalim commented May 12, 2024 •

edited

Loading

krstopro commented May 12, 2024 •

edited

Loading

msluszniak May 12, 2024 •

edited

Loading

krstopro commented May 14, 2024 •

edited

Loading

krstopro May 14, 2024 •

edited

Loading