Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(outputs.sql): Add option to automate table schema updates #16544

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hautecodure
Copy link

@hautecodure hautecodure commented Feb 24, 2025

Automate table updates, when the table does not contain a field

Summary

Currently, updates to the Table are left to the administrator and whenever a field is not present (in the table), the entire Telegraf flush is discarded, losing data in the process.

The new approach exposes two templates, one to update a table and another on to query a table for it's columns.

To avoid unnecessary (and expensive) requests, we cache the table columns and update it, whenever there is a cache miss from a field.

Checklist

  • No AI generated code was used in this PR

Related issues

N/A

@telegraf-tiger
Copy link
Contributor

Thanks so much for the pull request!
🤝 ✒️ Just a reminder that the CLA has not yet been signed, and we'll need it before merging. Please sign the CLA when you get a chance, then post a comment here saying !signed-cla

@telegraf-tiger telegraf-tiger bot added area/sql feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins labels Feb 24, 2025
@hautecodure hautecodure force-pushed the master branch 3 times, most recently from 30b5ab0 to ae1da3c Compare February 24, 2025 17:21
@hautecodure
Copy link
Author

hautecodure commented Feb 24, 2025

Pushed a new version to address the linter complaints and took the opportunity to:

  • Get the defer outside of the for loop
  • Move the table creation in a separate function to make the Write loop to improve readability

Note: left the table_exists_template option, even though nothing uses it, as removing it might break existing configs

Update:

It seems the Integration tests failing, are addressed in another PR. Will rebase, once merged.

Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hautecodure thank you very much for your contribution! Much appreciated! I do have some comments in the code, the biggest work items are

  1. Strip out the column sorting to another PR to only have one thing in each PR!
  2. I would rather like to have table-altering as an opt-in to not accidentally break existing configurations.

Looking forward to the next round!

@hautecodure hautecodure reopened this Feb 25, 2025
@hautecodure hautecodure changed the title feat(outputs.sql): Add column (if missing), use Prepared statements feat(outputs.sql): Add column (if missing) from the database Feb 25, 2025
@hautecodure hautecodure requested a review from srebhan February 25, 2025 13:34
@hautecodure
Copy link
Author

Pushed a new update:

  • Added tests
  • Reworked the table query, as it did not work under Clickhouse.
  • The column query has been removed
  • Changed a few if/else expression to match the style convention

While a bit unpleasant with the per-driver overrides, there was not much of a choice, unless we want to pester users with writing queries for each and every edge-case.

@hautecodure hautecodure requested a review from srebhan February 27, 2025 19:13
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hautecodure for the update and your patience! I do have some styling comments in the code. Additionally, I have the following items:

  1. Please do not remove existing options (table_exists_template) or render them non-functional! This might break some users! I agree that most likely this option is useless and should be removed, but to do so, you need to deprecate it first and keep it for a certain time.
  2. The default value of table_column_template is empty and therefore, the line in the sample.conf file must reflect this! You can add a line
  ## Use the following setting for automatically adding columns:
  ## table_update_template = "ALTER TABLE {TABLE} ADD COLUMN {COLUMN}"
  1. Using tables map[string]map[string]bool might simplify the code and improve lookups. What do you think?

Automate table schema updates, when the table does not contain a
field (or tag) by using template, provided by the user.

Note: this feature is opt-in, it requires the user to explicitly
modify their configuration files.
@hautecodure hautecodure reopened this Mar 4, 2025
@hautecodure
Copy link
Author

Hi, thanks for the feedback, pushed a new iteration:

  • Reverted tableExists back to its current form - not in the scope for this PR, might open a new one, to discuss / update (if needed).
  • Update the SQL struct names, to closely match the config options
  • Switch table cache to map[string]map[string]bool

@hautecodure hautecodure requested a review from srebhan March 4, 2025 08:12
@hautecodure hautecodure changed the title feat(outputs.sql): Add column (if missing) from the database feat(outputs.sql): Add option to automate table schema updates Mar 4, 2025
@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Mar 4, 2025

Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hautecodure. Some more cosmetic things like moving the conditions to the caller side so that the reader directly sees that the create/update functions are only called on purpose or opt-in.

Comment on lines -218 to +301
if !p.tables[tablename] && !p.tableExists(tablename) {
createStmt := p.generateCreateTable(metric)
_, err := p.db.Exec(createStmt)
if err != nil {
return err
}
if err := p.createTable(metric); err != nil {
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I would like to keep the condition here as otherwise this reads like "always create the table"!

Copy link
Author

@hautecodure hautecodure Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to keep the Write loop easier to read and handle the minutia in the respective function.

Should i revert back to the previous statement?

@@ -288,7 +337,6 @@ func TestPostgresIntegration(t *testing.T) {
require.NoError(t, p.Connect())
defer p.Close()
require.NoError(t, p.Write(testMetrics))
require.NoError(t, p.Close())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you remove this? This line makes sure all metrics are written...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It closes the connection and i want to re-use the existing connection, instead of setting up a new one.

The tests seem to pass fine. Are you sure Close is required to flush the metrics? Is this a pgx quirk?

Comment on lines +367 to +400

p.TableUpdateTemplate = "ALTER TABLE {TABLE} ADD COLUMN {COLUMN}"

require.NoError(t, p.Write(postCreateMetrics))

fields := []string{
"tag_add_after_create text",
"bool_add_after_create boolean",
}
for _, column := range fields {
require.Eventually(t, func() bool {
rc, out, err := container.Exec([]string{
"bash",
"-c",
"pg_dump" +
" --username=" + username +
" --no-comments" +
" " + dbname +
// pg_dump's output has comments that include build info
// of postgres and pg_dump. The build info changes with
// each release. To prevent these changes from causing the
// test to fail, we strip out comments. Also strip out
// blank lines.
"|grep -E -v '(^--|^$|^SET )'",
})
require.NoError(t, err)
require.Equal(t, 0, rc)

b, err := io.ReadAll(out)
require.NoError(t, err)

return bytes.Contains(b, []byte(column))
}, 5*time.Second, 500*time.Millisecond, column)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a new testcase for this or directly set the update template! Furthermore, a comment would be nice saying something like

	// Write a metric that targets the same table but has additional fields to test the automatic
	// column update functionality.
	require.NoError(t, p.Write(postCreateMetrics))

	fields := []string{
		"tag_add_after_create text",
		"bool_add_after_create boolean",
	}
	for _, column := range fields {

Same for the other spots below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of creating new cases, but i have to duplicate the existing ones as i have to create a table, insert metrics, add new fields.

I'm not familiar with the test infrastructure, is there a way to keep the containers running so i don't have to re-create the database?

Or should i split the cases into separate functions and re-use them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sql feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants