Migrations: use a txn + commit for each migration, deprecate MigrateTx #600

bgentry · 2024-09-18T02:50:31Z

This is partly extracted from #590. The migration in that PR exposes an issue with our migration framework, which is that there are certain schema changes which must be committed before they can be referenced by subsequent schema changes. The example of this I found is that if you add a new value to an enum, you cannot later create a function that relies on that new enum value unless the new value has first been committed. Committing it within a DDL transaction does not count—it must be a full commit.

IMO this might mean that the entire idea of a MigrateTx API is not workable with certain schema change combinations. At worst, it can result in unpredictable failures depending on the exact sequence of changes and how many migrations are being run at once.

As such, in this PR I've deprecated MigrateTx and adjusted the migrator so that it opens a new transaction for each individual migration with a commit between them. Migrator tests were changed to move away from MigrateTx and to a setup where they get a new clean schema for each test that's disposed at the end. This makes it possible to test the full sequence of database migrations with a clean slate each time.

I believe this is the right long-term direction because it's the approach that other migration libraries use (Rails/ActiveRecord, Sequel, Goose, etc). It also enables the potential for us to add the ability for individual migrations to opt out of a having a wrapping transaction, which is essential if we ever want our default to be CREATE INDEX CONCURRENTLY rather than synchronous indexing (as it really should be for any at-scale system).

brandur · 2024-09-18T15:45:04Z

@bgentry Hm, this has a fairly sizable downside in that it's not clear what's supposed to happen with MigrateTx, which I guess just won't work anymore? Also, like you said, the tests are still broken in #590.

This really feels like an enum problem rather than one with the migrator's design. As an alternative, what do you think about working around this by just adding pending to version 002 retroactively?

diff --git a/riverdriver/riverpgxv5/migration/main/002_initial_schema.up.sql b/riverdriver/riverpgxv5/migration/main/002_initial_schema.up.sql
index 074e562..5a5683d 100644
--- a/riverdriver/riverpgxv5/migration/main/002_initial_schema.up.sql
+++ b/riverdriver/riverpgxv5/migration/main/002_initial_schema.up.sql
@@ -3,6 +3,12 @@ CREATE TYPE river_job_state AS ENUM(
   'cancelled',
   'completed',
   'discarded',
+  -- `pending` wasn't originally in this migration, but has been added
+  -- retroactively to enable migrating in a transaction, where otherwise use of
+  -- enums have some sharp edges. If a system ran this migration before
+  -- `pending` was added, that's okay. `pending` will be brought in with version
+  -- 004 with an `ALTER TYPE river_job_state ADD VALUE IF NOT EXISTS ...`.
+  'pending',
   'retryable',
   'running',
   'scheduled'

Version 004 brings in pending using IF NOT EXISTS already, so all existing users and new users are safe — brand new users will have pending come in with version 002, and users that already ran 002 but not yet 004 will get pending when they eventually run 004.

ALTER TYPE river_job_state ADD VALUE IF NOT EXISTS 'pending' AFTER 'discarded';

brandur · 2024-09-18T15:48:23Z

I guess the one corner case might be users who haven't yet run 004, and would be running a prospective 006 including a new enum-based function. We'd tell them to run river migrate with one step at a time, or program the migrate CLI to be aware of the situation and compensate for it.

bgentry · 2024-09-18T15:52:47Z

Hm, this has a fairly sizable downside in that it's not clear what's supposed to happen with MigrateTx, which I guess just won't work anymore?

Right so that's what I was trying to convey here:

This might mean that the entire idea of a MigrateTx API is unworkable with certain schema change combinations. At worst, it can result in unpredictable failures depending on the exact sequence of changes and how many migrations are being run at once. Maybe we can safely deprecate and/or get rid of this? (though it will require some changes to our rivermigrate tests)

And as far as:

As an alternative, what do you think about working around this by just adding pending to version 002 retroactively?

I'm generally pretty 👎 on trying to backdate or modify historical migrations unless there is truly no other decent option. I think that would potentially confuse a lot of users if they discover that their River migrations don't match what's in the repo.

And to me the problem here is clear: there are certain combinations of schema changes which are impossible to execute within the same transaction in Postgres. The solution is also clear IMO, and it's what other established migration frameworks landed on a long time ago: individual migrations may run in a transaction (with a per-migration option to not do so for i.e. CREATE INDEX CONCURRENTLY), but transactions are not shared between migrations.

I think we need to fix the migrator framework because some of its design choices are unworkable in practice as we've learned here. It doesn't seem like a big deal to me either, we took an initial design pass at it but in the course of using it learned new information that invalidated past assumptions. So I'd prefer to just fix the framework, because who knows what future migrations we'll run into with similar issues.

brandur · 2024-09-18T16:00:51Z

I'm generally pretty 👎 on trying to backdate or modify historical migrations unless there is truly no other decent option. I think that would potentially confuse a lot of users if they discover that their River migrations don't match what's in the repo.

Can you think of any realistic examples of problems/confusion this would actually cause? I think this is a theoretical concern that'd likely go totally unnoticed in practice.

On the other hand, trying to post-hoc deprecate long standing APIs like MigrateTx ... that'll cause very real pain.

And to me the problem here is clear: there are certain combinations of schema changes which are impossible to execute within the same transaction in Postgres. The solution is also clear IMO, and it's what other established migration frameworks landed on a long time ago: individual migrations may run in a transaction (with a per-migration option to not do so for i.e. CREATE INDEX CONCURRENTLY), but transactions are not shared between migrations.

Besides this one case with enums, are there other DDL changes that don't work in transactions that you know of?

bgentry · 2024-09-18T16:17:58Z

Besides this one case with enums, are there other DDL changes that don't work in transactions that you know of?

I know of none, but I didn't know of this one either until I ran into it. Again, I think the fundamental issue is that the assumption that "schema changes can always be batched into a single transaction" is clearly untrue. How frequently that pops up, I'm not sure.

I know we might have some people using MigrateTx in practice, but they too will run into problems here unless we rewrite the history of our migrations. I'd much rather rip off that bandaid than add user confusion (potentially forever) and future problems if we encounter yet another reason this API isn't viable, because it will be much harder to rationalize removing it at that later time (especially post v1).

I guess I'm also seeing the breakage as pretty minor, because it should be pretty easy to switch an executable from calling MigrateTx with a tx to calling Migrate with no tx. What I would do is:

Make the changes in this PR
Mark MigrateTx as deprecated to discourage its use
Update our test suites and internal code to no longer rely on MigrateTx. We will need to rely on full databases for these tests, but I think that shouldn't have too much impact. It's not surprising to me if migrations are a bit of a special case in that regard.

Maybe I'm missing a reason this would be more difficult or impactful? Happy to vid call to sort it out, but also keep in mind solving this is a straight blocker to #590 which I really need to move forward on so I can finish some new stuff before my leave is up 😬

As detailed in #600, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. There are other examples of things which cannot be run in a transaction, such as `CREATE INDEX CONCURRENTLY`. While that specific one isn't solved here, moving away from a migrator that bundles migrations into a single transaction will also allow us to update our migration system to exclude certain migrations from transactions and i.e. add indexes concurrently.

bgentry · 2024-09-18T20:28:28Z

I've made the additional changes I described above. Notably, our migrator tests now create a dedicated schema to use for each individual test and then drop the schema at the end of the test. This gives a clean slate to the migrator tests and allows testing of the non-transaction variants.

I did mark MigrateTx as deprecated, though we can keep it around for while so long as we're ok with some people running into an issue if they try to run all migrations post-#590 in a single txn.

As detailed in #600, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. There are other examples of things which cannot be run in a transaction, such as `CREATE INDEX CONCURRENTLY`. While that specific one isn't solved here, moving away from a migrator that bundles migrations into a single transaction will also allow us to update our migration system to exclude certain migrations from transactions and i.e. add indexes concurrently.

bgentry · 2024-09-22T19:31:55Z

Moving forward on this and #590 to keep things moving, but obviously none of these choices are permanent so let's keep iterating!

bgentry requested a review from brandur September 18, 2024 02:50

bgentry force-pushed the bg-migration-framework-ddl-tx-per-migration branch from bdc33ff to 2ceb8bb Compare September 18, 2024 20:28

bgentry force-pushed the bg-migration-framework-ddl-tx-per-migration branch from 7252fcf to 37fd615 Compare September 18, 2024 20:40

bgentry force-pushed the bg-migration-framework-ddl-tx-per-migration branch from 37fd615 to 3c932f6 Compare September 18, 2024 21:17

bgentry changed the title ~~fix non-tx Migrate using a txn + commit for each migration~~ Migrations: use a txn + commit for each migration, deprecate MigrateTx Sep 18, 2024

bgentry mentioned this pull request Sep 19, 2024

Bulk unique insertion, uniqueness with subset of args #590

Merged

9 tasks

bgentry added 3 commits September 20, 2024 13:27

illustrate migrator issue with using uncommitted enum value

e3b3b55

fix non-tx Migrate using a txn + commit for each migration

30d0691

bgentry force-pushed the bg-migration-framework-ddl-tx-per-migration branch from 3c932f6 to c0de0ae Compare September 20, 2024 18:27

bgentry merged commit 3ed1d29 into master Sep 22, 2024
14 checks passed

bgentry deleted the bg-migration-framework-ddl-tx-per-migration branch September 22, 2024 19:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrations: use a txn + commit for each migration, deprecate MigrateTx #600

Migrations: use a txn + commit for each migration, deprecate MigrateTx #600

bgentry commented Sep 18, 2024 •

edited

Loading

brandur commented Sep 18, 2024

brandur commented Sep 18, 2024

bgentry commented Sep 18, 2024

brandur commented Sep 18, 2024

bgentry commented Sep 18, 2024

bgentry commented Sep 18, 2024

bgentry commented Sep 22, 2024

Migrations: use a txn + commit for each migration, deprecate MigrateTx #600

Migrations: use a txn + commit for each migration, deprecate MigrateTx #600

Conversation

bgentry commented Sep 18, 2024 • edited Loading

brandur commented Sep 18, 2024

brandur commented Sep 18, 2024

bgentry commented Sep 18, 2024

brandur commented Sep 18, 2024

bgentry commented Sep 18, 2024

bgentry commented Sep 18, 2024

bgentry commented Sep 22, 2024

bgentry commented Sep 18, 2024 •

edited

Loading