Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tweaks to the SparkLR example #872

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

wannabeast
Copy link
Contributor

I've been using spark.examples.SparkLR for performance testing. Just passing along some enhancements to the class.

…tions.

Also add an example of Kryo usage, and an optional "points" parameter.
@AmplabJenkins
Copy link

Thank you for your pull request. An admin will review this request soon.

@mateiz
Copy link
Member

mateiz commented Aug 29, 2013

Jenkins, this is ok to test

@AmplabJenkins
Copy link

Thank you for submitting this pull request.

All automated tests for this request have passed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/764/

kryo.setRegistrationRequired(true)

kryo.register(classOf[scala.collection.mutable.WrappedArray.ofRef[_]])
kryo.register(classOf[java.lang.Class[_]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Mike, why do you need to register WrappedArray and Class? Doesn't seem like they'll occur in our data here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up with that list by adding setRegistrationRequired, then fixing each exception that was thrown. So Kryo said they were used; I haven't considered why.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Did Kryo actually make a performance difference for you at all? We are not caching data in serialized form here, so it would only be used to send back task results, but that's just one Vector per partition. I think the WrappedArrays are because we somehow send that back within an array.

Replace fold() with aggregate() in SparkLR, to avoid Vector instatiations.
@AmplabJenkins
Copy link

Thank you for submitting this pull request.

All automated tests for this request have passed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/809/

(1 / (1 + exp(-p.y * (w dot p.x))) - 1) * p.y * p.x
}.reduce(_ + _)
((1 / (1 + exp(-p.y * (w dot p.x))) - 1) * p.y, p.x)
}.aggregate(zero)((sum,v) => sum saxpy (v._1,v._2), _ += _)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the second argument to sum.saxpy(v._1, v._2); we don't use infix notation for methods unless they're operators.

@mateiz
Copy link
Member

mateiz commented Oct 3, 2013

Sorry for the long delay, just made a couple more comments. I'm happy to merge this otherwise. You can either update it here or send a new PR against the apache/incubator-spark repo (the latter would be slightly easier).

xiajunluan pushed a commit to xiajunluan/spark that referenced this pull request May 30, 2014
This sets the max line length to 100 as a PEP8 exception.

Author: Reynold Xin <[email protected]>

Closes mesos#872 from rxin/pep8 and squashes the following commits:

2f26029 [Reynold Xin] Added PEP8 style configuration file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants