Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pkg/pg with dialects.go & txdb.go #910

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Conversation

reductionista
Copy link
Contributor

@reductionista reductionista commented Nov 1, 2024

NONEVM-739

Supports

smartcontractkit/chainlink#15064
smartcontractkit/chainlink-solana#921

Description

This mostly just moves chainlink/core/internal/testutils/pgtest/txdb.go and chainlink/core/store/dialects/dialects.go into a new common package chainlink-common/pkg/pg, updating imports accordingly.
The purpose of this is so that it can be imported and used by chainlink-solana for unit testing Solana ORM's (for example of actual use, see the linked chainlink-solana PR this supports)

Since txdb.go was a bit of a mess, it has been cleaned up a bit.

  • The global init() function invoked on package load has been replaced with RegisterTxDb() which accepts a dbUrl as a param instead of automatically reading it from the CL_DATABASE env var. Since there are only two places in the codebase which depend on our custom txdb dialect of sql being registered, this can simply be called in both of those places (inside NewSqlxDb() and NewConnection()) instead of relying on the global init().
  • Prevents calling sql.Register() more than once if RegisterTxDb() is called more than once (calling sql.Register() more than once with the same driver name will result in an error).
  • Replaces passing of context.Background to queries with passing of a new context which gets cancelled when the db is closed. (Note: db here refers to the txdb, which is just a single connection in the lower level pg db, corresponding to the dsn passed in which is a randomly generated UUID.)
  • Corrected spelling errors

This also moves some unit tests of txdb itself from chainlink into chainlink-common. Since they need an actual database, but don't need any fixtures this is an ideal use case for an in-memory test db, something in between the empty interface added earlier and spinning up a full docker postgresql container with the schemas and tables all set up. go-duckdb has been used for this, which is now easy to add to our dependencies since we're already at go 1.23. If the CL_CHAINLINK env var is set locally, txdb can be tested with a full ordinary postgresql database backing it... otherwise it falls back to testing it with the in-memory db backing it.

core ref: c7728a4eced5de67bad1cafdd21b4fab276e0e96
solana ref: 88cca3779525873e45f6d9f54bc8e0d0f2c9af26

pkg/pg/txdb_test.go Outdated Show resolved Hide resolved
Comment on lines 23 to 43
dbURL, ok := os.LookupEnv("CL_DATABASE_URL")
if !ok {
fmt.Sprintf("CL_DATABASE_URL not set--falling back to testing txdb backed by an in-memory db")
dbURL = string(InMemoryPostgres)
}
db := NewSqlxDB(t, dbURL)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to package this up as a test helper?

Copy link
Contributor Author

@reductionista reductionista Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, although I can't think of a good use case for this other than testing txdb itself (ie, this test). So making it a test helper, especially if it's in a place where it can be imported, might confuse someone.

Tests that depend on the db migration scripts being run (ie, require the standard tables and indexes set up) should call NewSqlxDB() to connect to an actual db and just fail if they can't. Tests that don't depend on those scripts should call NewInMemoryDataSource(). Usually if testing with NewInMemoryDataSource is something that can work, then you wouldn't have any reason to attempt testing it with an actual db. In this case, it makes sense because there is some code in the Open() method of the txdb driver which parses the db url and validates that it's a validly formatted database url. This gets skipped if you're testing with an in-memory data source but everything else is tested. So using a real db tests everything, but using the in-memory db in its place tests only like 95%

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests that depend on the db migration scripts being run

This is part of the pattern that we are trying to get away from though. It should not be the default.

Usually if testing with NewInMemoryDataSource is something that can work, then you wouldn't have any reason to attempt testing it with an actual db

Why do you say this? If we have the ability to trivially integration test against real postgres, then we should use it.

Copy link
Contributor Author

@reductionista reductionista Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you say this? If we have the ability to trivially integration test against real postgres, then we should use it.

I was mostly saying that because it seems unlikely anything could pass in the in-memory db but fail in a real db. (Unless we're testing the connection to the db itself, or that the db is set up properly... in which case I wouldn't expect an in-memory db to be useful.) I suppose there are probably some ways, but in general I would assume that the set of sql commands supported by an in-memory db would be a small subset of those supported by a fully fledged database.

This is part of the pattern that we are trying to get away from though. It should not be the default.

I'm not sure what the motivation for wanting to change existing patterns is, but if the reason is because we want to minimize use of the external db to conserve resources then making this a helper function and encouraging its use would be diametrically opposed to that. If we always try to hit the external db first, and only use the in-memory one as a backup if that fails then we'd waste a whole lot of resources in CI since 100% of them will go to the db. I'm not sure what the purpose of an in-memory db would even be then. In the short term, it could help us with local testing before we get the db isolation work Keith planned out done. But once each relay has its own migration scripts, it won't even be useful for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify:

In my last paragraph I was assuming this meant that you think we should always call this helper function instead of using an in-memory db by itself:

If we have the ability to trivially integration test against real postgres, then we should use it.

So that's why I say "100% of them will go to the db". If instead by that, you just mean that we ought to leave the option open to do that for some tests, rather than an all-or-nothing approach then my objection above wouldn't apply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: in light of having read your other thread, I understand now that the motivation is not conserving CI resources but making it easier for devs to run tests locally. At least for some tests, it may still not be worth using the extra CI resources just to get that slight bit of extra certainty that the function you're testing behaves as expected. But given your motivations, it does seem like you're recommending an all-or-nothing approach. Since unless all of the tests were able to run in-memory you wouldn't get rid of the dependency... right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is partially about local vs. CI, but it is also about whether a test can bootstrap itself, or depends on particularly configured dependencies, like fixture data in an external DB. I'm not sure what you mean by all-or-nothing. If something doesn't work with in memory, then it would be skipped 🤷 This is just like how we hide tests behind the integration build tag. Most devs don't run those locally, but they always run in CI on every PR.

Copy link
Contributor Author

@reductionista reductionista Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If something doesn't work with in memory, then it would be skipped 🤷

Doesn't that imply that we do need to have a common SkipShortDB() method for tests requiring a db dependency to import... so that they can get skipped? Or is there a different way to skip them you're suggesting we replace that with?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it does not imply that we need to use short at all:
#910 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess this is what you had in mind...

If there are dependencies required, then typically you opt-in to those tests by e.g. including a build tag, setting a flag/env-var, etc.

So you're just recommending we move to using one of these methods instead of --short... that makes sense

pkg/pg/txdb_test.go Outdated Show resolved Hide resolved
"github.com/smartcontractkit/chainlink-common/pkg/utils"
)

// txdb is a simplified version of https://github.com/DATA-DOG/go-txdb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are introducing our own custom txdb here (which is a copy from chainlink), have you considered just using the DATA-DOG one (which is at least regularly updated)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@samsondav samsondav Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We originally did use the DATA-DOG version, it was terrible. Full of bugs and overly complicated. After several PRs trying to fix the upstream, I just wrote our own streamlined version and we've pretty much never had any problems with it, which is why it has never had to be updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, this is outside of the scope of this CL - but we do have the problem with it: we essentially maintain a fork of the txdb inside of the chainlink (and its forked repos) and we will now do the same in the chainlink-common. My hope that in the last 5 years DATA-DOG has made enough improvements to their library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not a fork its a rewrite and its relatively small. Just put it here and import wherever you need it then its only needed once.

If it ain't broken don't fix it, and this ain't broke.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we will now do the same in the chainlink-common

Just to be clear, this is a move not a copy... the PR this is linked to in chainlink repo changes all of the imports of it to point here and removes it from that repo

Neither of these were in the actual pg package in chainlink repo.
dialects.go came from core/store/dialects and txdb.go from
core/internal/testutils/pgtest, but neither of these seem like they
deserve their own package in chainlink-common--we can lump all the
postgres specific common utilities under pkg/pg
Also: convert rest of panic isn't ordinary errors
All txdb connections share the same underlying connection to the
postgres db. Calling NewSqlxDB() or NewConnection() with dialect=txdb
doesn't create a new pg connection, it just creates a new tx with
BEGIN. Closing the connection with db.Close() issues ROLLBACK.

Both NewSqlxDB() and NewConneciton() choose random UUID's for their
dsn string, so we shouldn't have a case where the same dsn is opened
more than once. If that did happen, then these two different txdb
"connections" would be sharing the same transaction which would
mean closing the abort channel due to a query sent over one of them
would affect the other. Hopefully that's not a problem? If it is
I think our only option will be to go back to using context.Background
for all queries.

Before this commit, there was only one abort channel for the entire
txdb driver meaning that even two entirely different connections
opened with different dsn's could interfere with each other's queries.
This should fix that case, which is presumably the only case we
care about. Since each dsn corresponds to a different call to
NewSqlxDB() and the UUID's are generated randomly, there should no
longer be a conflict. Each txdb connection will have its own abort
channel.
… is not set

This allows us to test most of it in CI, and all locally
This showed up in some of the unit tests in the linked PR in chainlink repo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants