Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Spark connector opens and closes multiple connections per operation #324

Open
jeremyprime opened this issue Feb 10, 2022 · 1 comment
Labels
bug Something isn't working Low Priority

Comments

@jeremyprime
Copy link
Collaborator

Environment

  • Vertica Spark Connector version: 3.0.2

Problem Description

#314 helped address the original issue by making the JDBC layer a singleton per operation so we have, at most, one connection per operation at a time. And this single connection per operation eventually closes, releasing the session in Vertica.

However, this hides the fact that each JDBC layer object is recreated multiple times, and thus the underlying connection and session are also recreated. This increases overhead and can lead to other functional issues.

There is a need to refactor our use of the read and write pipes to avoid having to recreate the JDBC layer and underlying connection multiple times. For example, the following list of classes, which is not exhaustive, shows how read and write create multiple connections (each time getReadPipe or getWritePipe are called):

Read:

  1. DSReader.DSReader(), DSConfigSetup.validateAndGetConfig(), DSConfigSetup.performInitialSetup(ReadConfig)
  2. VerticaPipeFctory.getReadPipe()
  3. VerticaJdbcLayer.VerticaJdbcLayer()

Write:

  1. DSWriter.DSWriter(), DSConfigSetup.performInitialSetup(WriteConfig)
  2. VerticaPipeFctory.getWritePipe()
  3. VerticaJdbcLayer.VerticaJdbcLayer()

Ideally we would create one connection for read and one for write (since they have different config). And the connection would be closed after the final commit. This will require ensuring we only create a single JDBC object and only close the underlying connection once, at the end of the operation.

@alexey-temnikov
Copy link
Collaborator

This is an optimization for handling connections. It is expected this does not significantly impact performance and has not functional impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Low Priority
Projects
None yet
Development

No branches or pull requests

2 participants