-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save job status tables option #311
Save job status tables option #311
Conversation
write_status_table added to the configuration process.
_ <- if (config.saveMetadataTables) { | ||
tableUtils.updateJobStatusTable(config.tablename, config.jdbcConfig.auth.user, 0.0, config.sessionId, success = true) | ||
} else { | ||
Right(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we should create the status table for external tables, even if the save_metadata_tables
option is true
. Since there is no fault tolerance information saved, the only benefit is to track the number of times an external table is created and the session id it was created under.
@alexey-temnikov, thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. We should have it documented and highlighted in the changelog, that this behaviour will be changed.
@jeremyp-bq, @Aryex, minor nit - should we consider another name instead of save_metadata_tables
since it is specific to the rejected rows? Perhaps save_rejected_rows
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexey-temnikov This PR is to turn off the status table by default and add an option to enable it, but I was thinking we could lump together the creation of the rejected rows table in the future under the same option, which is why it has the generic name save_metadata_tables
.
Unless you think it is better to have an option for creation of each (e.g. save_status_table
and later save_rejected_rows_table
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we should create the status table for external tables, even if the
save_metadata_tables
option istrue
. Since there is no fault tolerance information saved, the only benefit is to track the number of times an external table is created and the session id it was created under.
This was purely to mimic the old behavior. That said, it could be extended to provide an option to not record jobs of external tables. Something like: save_metadata_tables: all | none | vertica-only
.
This PR is to turn off the status table by default and add an option to enable it, but I was thinking we could lump together the creation of the rejected rows table in the future under the same option.
I think using specific options would be better in the long run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the logic would be like this, unless we see value in creating the status table for external tables:
if(config.saveMetadataTables && !config.createExternalTable.isDefined) {
// create table
} else {
Right(())
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not following the condition. What do you mean by createExternalTable is defined AND not defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo, fixed it.
To summarize my questions:
- Should we create the status table for the external table case, or not since it doesn't provide much value for the external tables case (even if the option is set to true)?
- Should we have separate options for saving the status table and rejected rows table, or use the same option for both?
@Aryex We should also update the README.md to list the new option. Also note that one of the functional tests is failing (click the details link for the integration tests on the PR, then search the logs for "TEST FAILED" to see the exact one - looks to be an integration test that is looking for the job status table). |
connector/src/main/scala/com/vertica/spark/config/WriteConfig.scala
Outdated
Show resolved
Hide resolved
val query = "SELECT COUNT (1) FROM S2V_JOB_STATUS_USER_" + writeOpts("user") + | ||
" WHERE target_table_name = '" + targetTableName + "'" | ||
val rs = stmt.executeQuery(query) | ||
var count:Int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple notes here:
- General style recommendation is to put a space after the colon in type annotations. This is probably something that we should be using an automated code formatter for, but we haven't spent the time to do this yet.
- Avoid var. In this case, the mutation is localized and probably not a problem, especially since it's in test code, but it helps to get out of the mindset of reaching for var when it isn't necessary. Here, we could use the following snippet instead:
val count: Int = if (rs.next) rs.getInt(1) else 0
rs.close()
count
Summary
Added save_job_tatus_tables:Boolean option for jobs status table.
Description
With save_job_tatus_tables = true, a job status table will be initialized. Commits to vertica or external tables are recorded to vertica.
Related Issue
Closes #308
Additional Reviewers
@alexey-temnikov
@alexr-bq
@jonathanl-bq
@jeremyp-bq