Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add readme for VW module #34

Open
vsuthichai opened this issue Sep 14, 2016 · 8 comments
Open

Add readme for VW module #34

vsuthichai opened this issue Sep 14, 2016 · 8 comments
Assignees

Comments

@vsuthichai
Copy link
Contributor

No description provided.

@vsuthichai vsuthichai self-assigned this Sep 14, 2016
@mostafa-zefr
Copy link

Hi vsuthichai!

I wanted to use spotz with VW but did not know where to start and then I found this ticket. Could you please add just one very basic example with VW? Much appreciated.

@vsuthichai
Copy link
Contributor Author

vsuthichai commented Feb 13, 2017

Hi @mostafa-zefr , apologies for the lack of VW documentation. I will try to get to that asap. May I ask what you're trying to use the VW integration for? Thanks.

@vsuthichai
Copy link
Contributor Author

@mostafa-zefr , there is documentation here https://github.com/eHarmony/spotz/tree/branch-1.0.1/vw

It's on branch-1.0.1

Let me know if that can get you started with what you're trying to do.

@mostafa-zefr
Copy link

Oh, I see the doc now! I am using it for a classification problem. Let me know exactly what you need to know about the application and I will share it with you if possible according to ZEFR policies.
I have a question though; I need to have more control over training and test set (not using k-fold CV because of nature of my data where order matters). At first glance, I did not see ant way to specify the hold off dataset directly and it seems k-fold CV is the one deciding on test data set at each iteration. So wondering if I can directly specify a train and test set in the current version of the code?

@vsuthichai
Copy link
Contributor Author

vsuthichai commented Feb 13, 2017

@mostafa-zefr Have a look at this class here

https://github.com/eHarmony/spotz/blob/branch-1.0.1/vw/src/main/scala/com/eharmony/spotz/objective/vw/VwHoldoutObjective.scala

You can supply the VW dataset through the constructor as an Iterator, Iterable, or a path. If the VW dataset is being loaded from an RDD, you can call rdd.toLocalIterator

vwTrainParamsString allows you to specify the VW parameters during training. Note that certain VW arguments will not work like -d or anything related to caching because spotz will manipulate those internally before calling to VW.

@mostafa-zefr
Copy link

Thanks Victor!

I figured those out.
One thing that I noticed is that the documentation here https://github.com/eHarmony/spotz/tree/branch-1.0.1/vw requires version 1.0.1 while in maven repo the latest version is 1.0.0. Is there a reason for not providing the 1.0.1 version in the maven repo?

@vsuthichai
Copy link
Contributor Author

vsuthichai commented Feb 14, 2017

@mostafa-zefr After the initial 1.0.0 release, I began working on some documentation and wanted to integrate other important features into a 1.0.1 release. The branch is where the ongoing 1.0.1 work happens and I haven't released 1.0.1 yet. There's still a lot to be done, but I don't have much time to get to it as my time is being allocated to another project right now.

I appreciate any feedback you can provide about your experiences using it. Good, bad, recommendations for improvement, etc.

@mostafa-zefr
Copy link

@vsuthichai No worries! I'll be sure to get back to you with feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants