Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External Tables: Remove creation of job status table #290

Closed
ravjotbrar opened this issue Dec 8, 2021 · 5 comments · Fixed by #309
Closed

External Tables: Remove creation of job status table #290

ravjotbrar opened this issue Dec 8, 2021 · 5 comments · Fixed by #309
Labels
Milestone

Comments

@ravjotbrar
Copy link
Collaborator

Remove the creation of the job status table when creating an external table.

@ravjotbrar ravjotbrar added this to the 3.0.2 milestone Dec 17, 2021
@KevinAppelBofa
Copy link

The creation of the S2V_JOB_STATUS_USER_ table inside the Vertica has a comment ie
Persistent job status table showing all jobs, serving as permanent record of data loaded from Spark to Vertica. Creation time:Thu Dec 23 18:50:29 GMT 2021

This is occurring on both creating external tables and also when doing a write of a spark dataframe directly to the vertica

Once this is set the first time, a follow up write of either type is not changing this item; it remains with the comment associated to it and requires a manual drop of the table to get rid of it

I also had tested to see what happens and tried to drop this table while the external table was writing and that throws an exception.

Since this doesn't seem to handle the overwrite of the comments, what happens if the person actually submits two spark jobs at the same time to write into vertica 2 different things

This would be great if you can have this tabled dropped in both cases when the write is completed, this will make the DBAs happy that we are not leaving left over temp tables

@alexey-temnikov alexey-temnikov added the enhancement New feature or request label Dec 23, 2021
@alexey-temnikov
Copy link
Collaborator

@KevinAppelBofa, thank you for the feedback! I understand the inconvenience of having these tables deleted manually. We are planning to work on this issue early next year.

@jeremyprime
Copy link
Collaborator

The status table can be useful to audit or debug when/what data was written to Vertica. Currently the table is always created if it doesn't exist, and subsequent writes will add another row to the table. However, as Kevin notes, there are cases when it is not needed and it only clutters the database. We can create another ticket to provide a flag, such as save_metadata_tables, so that creation of the status table can be enabled/disabled (I assume we'll want to disable creation by default, which also aligns with the existing prevent_cleanup flag for temporary HDFS data). This flag can also cover the creation of the rejected rows table in the future (see #293), which is also useful for debugging but could clutter the database.

So the scope of this ticket will be to not create the status table when creating an external table, while the other ticket will disable creation of status table by default but provide an option to enable its creation (mostly for development/debugging).

@KevinAppelBofa, would that work for you?

@KevinAppelBofa
Copy link

@jeremyp-bq that sounds great

@jeremyprime
Copy link
Collaborator

See #308 for the ticket to add the flag to enable/disable creation of the job status table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants