-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
comparison with SGDClassifier #1
Comments
I will compare, the original paper did some comparisons with SGD (not sklearn's implementation) and they found that the projection step and adaptive learning rate improved performance. |
The SGD in scikit-learn actually has an adaptive learning rate - it can even be set to be the same as pegasos, I believe. |
After looking it up again, I think you need to set |
Here are some benchmarks with identical learning rates: https://raw.github.com/ejlb/pegasos/master/benchmarks/benchmarks.png Pegasos seems to be slightly more accurate (1%). The only two differences I know of are:
Due to point 2) it is hard to compare speed across iterations. |
You say that training on random samples makes it had to compare speed.s How so? One iteration of sgd are |
@amueller SGDClassifier trains on the whole data set at each iteration I assume? It is probably where the speed increase comes from edit: yes true, that would be a good comparison. Will upload the benchmark script |
Ok, but then the plot doesn't make sense. You should rescale it such that the number of weight updates is the same. |
Yeah, will run some with equal weight updates |
Yes,
It also wastes a little bit of time in each update, checking whether it should do a PA update or a vanilla additive one. |
this makes much more sense: https://raw.github.com/ejlb/pegasos/master/benchmarks/weight_updates/benchmarks.png Perhaps batching the pegasos weight updates would retain the slight accuracy boost and improve the training time |
Yeah, that looks more realistic ;) |
I used this: SGDClassifier(power_t=1, learning_rate='invscaling', n_iter=sample_coef, eta0=0.01). The full benchmark is here: https://github.com/ejlb/pegasos/blob/master/benchmarks/weight_updates/benchmark.py |
Hey. Did you compare with SGDClassifier?
The results should be quite close to yours.
The text was updated successfully, but these errors were encountered: