-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to modern spark (3.4) #166
Conversation
…pdated since the SQL pretty printer has changed in Spark 3.
I really appreciate the contribution — first outside contribution in some time! For our purposes, DV must continue working with Spark 2.3 and JDK8, so we can't bump versions as you have proposed. Can you think of a way to test/qualify this against both Spark 2.3 and 3.4? As a potential starting point, I started on crossbuilding last year but haven't had the bandwidth to complete the work. |
So one lazy option would be to make a spark 3 and a spark 2 branch and publish a "-3" version of the artifact rather than a "proper" cross build. |
I could potentially take a look at finishing the cross-build work @colindean Also many thanks @holdenk for keeping this project / others in the spark ecosystem in mind. |
So I was thinking the main "annoying" thing in a cross build for this is probably the tests and maybe the quoted identifiers. We could probably make it cross build easily, although my default is lazy (branches). |
That's a good idea, but it introduces some operational complexity to be recalled at release time. I prefer the technical complexity of an sbt crossbuild, especially if…
❤️ Yes that would be awesome. I don't remember what was going awry when I left off with it. I welcome the contribution, including pairing if either of you'd like (or you could pair the two of you if you'd like).
I've not seen pressure from DV users to support Spark 3, but most DV users who have identified themselves are internal to Target and stuck on our Spark 2.3 cluster while we await the production launch of our internal platform enabling use of Spark 3 + Docker.
Yes, indeed! I'm very happy to see this and your continued work in the space, and to see another former IBMer around! |
So my other project where I do more extensive cross-building is https://github.com/holdenk/spark-testing-base/blob/main/build.sbt , it puts the actual Spark Versions outside of sbt (passed in as a parameter) and does "classic" sbt cross-build for the scala versions. What do folks think of something like that? Also thanks for being so friendly y'all :) |
Also, slightly off topic, (in the context of having a large pre-Spark 3 codebase) my self and some of my coworkers have been working on some tooling at https://github.com/holdenk/spark-upgrade that might be useful once you'all have Spark 3 ready for use :) |
A bit of work to see if folks agree with the direction, if so I'll update the tests and such :) |
…user classpath first) and also pick a better Scala version.
Hey just following up @c-horn do you have some bandwidth to review this? |
Hi, sorry, my free time plans got ahead of me, I will be out of the country for a few weeks starting weds. I will try to take a look at this tonight or tomorrow. |
It's been a while but I'm hoping to revisit this mid-February. I shook out some backed-up update PRs and will take a close look at this when I return from FOSDEM — @holdenk, I noted that you're speaking there and I hope to attend your talk and meet you! |
Sounds awesome, maybe we can grab some coffee:) |
Good news! The requirement to support Spark 2 is now gone as my org is moving to Spark 3 by the end of May. I'm going to return to this in the coming days/weeks as I've gotten a few requests for it. |
We're going into Spark 3 at v3.5.1 and still supporting Scala 2.12 and 2.13. |
#186 sets some new baseline. I think I want to get this merged (probably with the suggestions to move to Spark 3.5.1 merged) and then continue from there. |
@colindean @holdenk - Here is the change that is required in build.sbt to make the DV work with Spark version 3.x. I tested it and its working: current circe-yaml version : 0.14.2 Change line number 35 in build.sbt from 0.14.2 to 1.15.0 Another thing I changed was line number 6 as below (could not be needed, wasn't sure how we are getting the spark version from the env) sparkVersion := "3.5.1" |
There was no 0.14.6 for circe-yaml
Thank you so much, @holdenk, for submitting this. I'm sorry it took a while for priorities to bubble up. I'm glad that we were able to merge it eventually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
Hey y'all poked around in the project, I don't know if this is still being actively developed but if folks are interested upgraded it to Spark 3.4, I've only done the tests with the built in test suite and it's my first time poking around in the project so you know could be some mistakes.