-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added parallelism to train.jl and improved efficiency of types.jl #8
base: master
Are you sure you want to change the base?
Conversation
Hi @Skylion007, this is neat, thanks for looking into parallelizing the training. I pulled your patch down and ran a sample training loop with it, something like:
That RMSE is pretty far from the ~0.92 that the existing code achieves with the same sequence of commands. Accordingly, if you run more of the sample commands in the README (like This may be because there's an error in your patch? I noticed that your loop uses It may also be that this loop just isn't amenable to a naive parallel execution: each iteration uses and updates values of In either event, if you can update the patch to get the RMSE down to something reasonable, I'd be happy to take another look. |
Ah, the way I parallelized should have accounted for this and made sure that two ratings weren't being updated at the same time. The error probably has to do with the residual curr_error. That's interesting though because I tested it on a really massive dataset and got a pretty consistent RMSE, all be it that was over 200 million ratings so perhaps that is why the residual error was minuscule. I'll be glad to try to fix it an test it out when I have the chance. |
On second thought, you may be right about the parrellelism issue. Oddly enough, it worked on my dataset because it was so massive. With approximately 200,000 ratings, the odds of two items overwriting the same value are relatively small, but on these smaller datasets, the error is much more pronounced. I may have to figure out how to rewrite this if I want to add parralelism. I found an implementation that works by relying on a map reduce style framework, but Julia's code just doesn't seem amicable to that. Perhaps if Julia adds read/writes lock or another method of safely makes these values concurrent. Actually, one way that might work is having each process calculate it's own portion of the matrix and somehow combine them using SIMD operations. Alternatively, groupby algorithms would make it relatively easy to divide the work. I'll look into this issue. I'll also leave this code since it works well on large datasets with lots of values since the error is relatively small. |
So apparently, Julia does have support for locking via mutexes. Since the model will be relatively small compared to the number of ratings, it might be possible to have a parallel array of mutexes to prevent collision. The overhead should be relatively minor even on large datasets. |
I added parrellism so that you can train it with several processes. I felt conflicted about whether or not I should force types.jl to be included using the "@Everywhere" notation, but I left that up to you. I also utilized shared arrays to speed up execution and finally I had to make some types immutable so that they are bittypes. This allows them to be used in SharedArrays and should overall improve the speed of execution since it's more efficient.