Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4+ AgnosticEncoder support & Spark Connect #701

Open
chris-twiner opened this issue Apr 11, 2023 · 2 comments
Open

4+ AgnosticEncoder support & Spark Connect #701

chris-twiner opened this issue Apr 11, 2023 · 2 comments

Comments

@chris-twiner
Copy link
Contributor

chris-twiner commented Apr 11, 2023

Spark Connect adds another interaction approach, splitting driver and executor. Prior to Spark 4 the interfaces were separated, with the introduction of a base class this is now possible.

Pro's:

  • works on Spark 4
  • opens up the possibility of connect usage

Con's:

  • possible performance hit as TransformingEncoders are required so wholestage codegen is no longer possible (unknown and probably not relevant)
@chris-twiner
Copy link
Contributor Author

chris-twiner commented Apr 14, 2023

below is not relevant per repurpose of issue to Spark 4 focus

Based on the source code the ConvertToArrow (called by connects' SparkSession) and ExpressionEncoder only support the built in AgnosticEncoders (logic present in ScalaReflection) - it's a locked in system, no way to inject behaviour without classpath hackery.

So custom types, typed datasets (different api ) and injections - i.e. all the cool stuff - don't seem to be possible with Spark Connect as it stands in 3.4.

@chris-twiner
Copy link
Contributor Author

NB per SPARK-49960 discussion, these spark fixes * and this frameless branch using AgnosticEncoders is possible in Spark Classic.

Frameless cannot support multiple Spark builds with this approach and must have a clean cut.

* I'll update if the extra changes are possible to include.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants