-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do you have any suggestions on setting the max number of iteration in training som? #66
Comments
Hi, thanks for checking that!
However, the status is printed at every iteration via the generator defined
outside the class. Only there errors are printed in there end. It's
unpractical to recompute there errors at each iteration.
…On Wed, Apr 15, 2020, 16:24 Zonglei Zhen ***@***.***> wrote:
Hi,
I just found there is a inconsistency statement for the verbose output of
the train method.
Line 347 states that if verbose is true, the status of the training will
be printed at each iteration. But in line 361, the status is only printed
after all iterations. I guess the code between 361-363 should be indented.
Thanks
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#66>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABFTNGOZG4DR5KAOM47KPTLRMXGTLANCNFSM4MIVLB7A>
.
|
Got it! Thanks very much. |
Hi again, the number of iterations required for convergence depends on many factors. The main ones are size of the som and shape of the data. The only way to know if you reached convergence is to look at the learning curve and check if it reached a plateau (see the Iris example). If you have a som 100-by-100, start with 10000 iterations so that each sample is observed at least once and check the results. Increase the the number of iterations if you think that the error is on a downward trajectory. |
Let me extend this question a little further with some emphasis on the topographic error. I have a dataset with around 360 rows, and small correlations between features. After plotting the learning curves like in the "Iris" example I noticed the quantization error indeed shows a decreasing behavior that reaches a plateau and the topographic error shows a fluctuating behavior that tends to become stable as well. The problem is it fluctuates around 0.8 which is too large. Since the t.e is an indication of how representative the SOM is I believe it is an important issue. The question is whether there is a parameter that if properly tuned can decrease the t.e, or if it is inevitable to get a non-representative SOM for low-correlated data ? |
hi @V-for-Vaggelis, Have you tried inspecting the results visually? You want to check that the u-matrix (that you can get with the method You can obtain a smooth mapping no matter how the data is correlated. |
@V-for-Vaggelis also, to really understand if the som has converged you can check the weights step by step and stop when the they don't change anymore ( |
@JustGlowing A weird thing happened. I updated minisom and would not print t.e anymore. So I print it myself and got 0.09 for the same data. Could it be a bug you had fixed? I also got the distance map as you advised. In general it has a smooth behavior, but there is a small red area (large distances). I guess it means this small area of the grid can't be trusted to draw conclusions. Also another thing, is there a paper I can refer to in my thesis for minisom or should I just link to the repo? |
@V-for-Vaggelis there was a bug fix released in December related to the quantization error. Can you please cite MiniSom as follows: G. Vettigli, "MiniSom: minimalistic and NumPy-based implementation of the Self Organizing Map,". Available: |
Hi,I am using Minisom to cluster data, and I find it is so convenient. So thanks for your contributions. However, I am confused about how to properly select initial parameters, eg: sigma,learning rate and max_iteration. In the issue, you said " The only way to know if you reached convergence is to look at the learning curve and check if it reached a plateau", but I want to know use which indicator to plot learning curve, quantization error? And finally, I want to know is there any way I can get the cluster number to which each datapoint in that dataset belongs to. In the Cluster example you set each neuron as a cluster but it is not properly in my experiment. Thanks . |
hi @Yifeng-J, Here's an example of how to plot the learning curve: https://github.com/JustGlowing/minisom/blob/master/examples/BasicUsage.ipynb I'd recommend to use the quantization error unless you're trying to optimize your own custom metric. Regarding the cluster index, that example you pointed out shows the most convenient way to solve the issues. However, you can do more complex stuff, like grouping different neurons and assigning the cluster index according to that. |
@JustGlowing Ok,thanks for your answering. I will try some other method to solve the cluster index problem. I hope you can give me some suggestions on how to choose initial parameters, because I can't find any information about how to choose it properly on the Internet. |
I'd suggest you too start with the default parameters and plot the results as showed into the documentation. Then you can tweaking the parameters. You'll get a grasp once you try a couple of edge cases (eg set sigma too high or too low). Remember that there's no optimal set of parameters, but you can find a set that is good enough for you. |
@JustGlowing Ok, I get it. Thank you very much! |
Hello guys, I'm trying to use minisom for clustering a 16-dimensions embeddings with 7 classes, if for example I set it to 77 i'd get 49 clusters I read your rule of thumb, but it doesn't work for me, because I'd have to set it to 16*16 and by doing so I'd get 265 clusters! Would appreciate the help |
Hi @atheeraa , you have to set input_len to 16 and create a map of size 3x3. This will give you 9 clusters and you can merge two of the closest clusters to get the 8 that you need. |
Thank you for your reply! Again, thank you for your replies, I appreciate your help. |
hi again @atheeraa , you want to have a look at this example https://github.com/JustGlowing/minisom/blob/master/examples/BasicUsage.ipynb |
Thank you so much for your wonderful working, I'm trying to use minisom in some cluster task, but in the cluster example, "som.winner" process the data one by one, which cost so much time if the amount of input is huge, if the input 's shape is (m, n ), how to process the array without “for”? thank you. |
Hi,
I just found there is a inconsistency statement for the verbose output of the train method.
Line 347 states that if verbose is true, the status of the training will be printed at each iteration. But in line 361, the status is only printed after all iterations. I guess the code between 361-363 should be indented.
Thanks
The text was updated successfully, but these errors were encountered: