-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrations: use a txn + commit for each migration, deprecate MigrateTx #600
Conversation
@bgentry Hm, this has a fairly sizable downside in that it's not clear what's supposed to happen with This really feels like an enum problem rather than one with the migrator's design. As an alternative, what do you think about working around this by just adding diff --git a/riverdriver/riverpgxv5/migration/main/002_initial_schema.up.sql b/riverdriver/riverpgxv5/migration/main/002_initial_schema.up.sql
index 074e562..5a5683d 100644
--- a/riverdriver/riverpgxv5/migration/main/002_initial_schema.up.sql
+++ b/riverdriver/riverpgxv5/migration/main/002_initial_schema.up.sql
@@ -3,6 +3,12 @@ CREATE TYPE river_job_state AS ENUM(
'cancelled',
'completed',
'discarded',
+ -- `pending` wasn't originally in this migration, but has been added
+ -- retroactively to enable migrating in a transaction, where otherwise use of
+ -- enums have some sharp edges. If a system ran this migration before
+ -- `pending` was added, that's okay. `pending` will be brought in with version
+ -- 004 with an `ALTER TYPE river_job_state ADD VALUE IF NOT EXISTS ...`.
+ 'pending',
'retryable',
'running',
'scheduled' Version 004 brings in ALTER TYPE river_job_state ADD VALUE IF NOT EXISTS 'pending' AFTER 'discarded'; |
I guess the one corner case might be users who haven't yet run 004, and would be running a prospective 006 including a new enum-based function. We'd tell them to run |
Right so that's what I was trying to convey here:
And as far as:
I'm generally pretty 👎 on trying to backdate or modify historical migrations unless there is truly no other decent option. I think that would potentially confuse a lot of users if they discover that their River migrations don't match what's in the repo. And to me the problem here is clear: there are certain combinations of schema changes which are impossible to execute within the same transaction in Postgres. The solution is also clear IMO, and it's what other established migration frameworks landed on a long time ago: individual migrations may run in a transaction (with a per-migration option to not do so for i.e. I think we need to fix the migrator framework because some of its design choices are unworkable in practice as we've learned here. It doesn't seem like a big deal to me either, we took an initial design pass at it but in the course of using it learned new information that invalidated past assumptions. So I'd prefer to just fix the framework, because who knows what future migrations we'll run into with similar issues. |
Can you think of any realistic examples of problems/confusion this would actually cause? I think this is a theoretical concern that'd likely go totally unnoticed in practice. On the other hand, trying to post-hoc deprecate long standing APIs like
Besides this one case with enums, are there other DDL changes that don't work in transactions that you know of? |
I know of none, but I didn't know of this one either until I ran into it. Again, I think the fundamental issue is that the assumption that "schema changes can always be batched into a single transaction" is clearly untrue. How frequently that pops up, I'm not sure. I know we might have some people using I guess I'm also seeing the breakage as pretty minor, because it should be pretty easy to switch an executable from calling
Maybe I'm missing a reason this would be more difficult or impactful? Happy to vid call to sort it out, but also keep in mind solving this is a straight blocker to #590 which I really need to move forward on so I can finish some new stuff before my leave is up 😬 |
As detailed in #600, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. There are other examples of things which cannot be run in a transaction, such as `CREATE INDEX CONCURRENTLY`. While that specific one isn't solved here, moving away from a migrator that bundles migrations into a single transaction will also allow us to update our migration system to exclude certain migrations from transactions and i.e. add indexes concurrently.
I've made the additional changes I described above. Notably, our migrator tests now create a dedicated schema to use for each individual test and then drop the schema at the end of the test. This gives a clean slate to the migrator tests and allows testing of the non-transaction variants. I did mark |
As detailed in #600, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. There are other examples of things which cannot be run in a transaction, such as `CREATE INDEX CONCURRENTLY`. While that specific one isn't solved here, moving away from a migrator that bundles migrations into a single transaction will also allow us to update our migration system to exclude certain migrations from transactions and i.e. add indexes concurrently.
bdc33ff
to
2ceb8bb
Compare
As detailed in #600, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. There are other examples of things which cannot be run in a transaction, such as `CREATE INDEX CONCURRENTLY`. While that specific one isn't solved here, moving away from a migrator that bundles migrations into a single transaction will also allow us to update our migration system to exclude certain migrations from transactions and i.e. add indexes concurrently.
7252fcf
to
37fd615
Compare
As detailed in #600, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. There are other examples of things which cannot be run in a transaction, such as `CREATE INDEX CONCURRENTLY`. While that specific one isn't solved here, moving away from a migrator that bundles migrations into a single transaction will also allow us to update our migration system to exclude certain migrations from transactions and i.e. add indexes concurrently.
37fd615
to
3c932f6
Compare
As detailed in #600, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. There are other examples of things which cannot be run in a transaction, such as `CREATE INDEX CONCURRENTLY`. While that specific one isn't solved here, moving away from a migrator that bundles migrations into a single transaction will also allow us to update our migration system to exclude certain migrations from transactions and i.e. add indexes concurrently.
As detailed in #600, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. There are other examples of things which cannot be run in a transaction, such as `CREATE INDEX CONCURRENTLY`. While that specific one isn't solved here, moving away from a migrator that bundles migrations into a single transaction will also allow us to update our migration system to exclude certain migrations from transactions and i.e. add indexes concurrently.
As detailed in #600, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. There are other examples of things which cannot be run in a transaction, such as `CREATE INDEX CONCURRENTLY`. While that specific one isn't solved here, moving away from a migrator that bundles migrations into a single transaction will also allow us to update our migration system to exclude certain migrations from transactions and i.e. add indexes concurrently.
3c932f6
to
c0de0ae
Compare
Moving forward on this and #590 to keep things moving, but obviously none of these choices are permanent so let's keep iterating! |
This is partly extracted from #590. The migration in that PR exposes an issue with our migration framework, which is that there are certain schema changes which must be committed before they can be referenced by subsequent schema changes. The example of this I found is that if you add a new value to an enum, you cannot later create a function that relies on that new enum value unless the new value has first been committed. Committing it within a DDL transaction does not count—it must be a full commit.
IMO this might mean that the entire idea of a
MigrateTx
API is not workable with certain schema change combinations. At worst, it can result in unpredictable failures depending on the exact sequence of changes and how many migrations are being run at once.As such, in this PR I've deprecated
MigrateTx
and adjusted the migrator so that it opens a new transaction for each individual migration with a commit between them. Migrator tests were changed to move away fromMigrateTx
and to a setup where they get a new clean schema for each test that's disposed at the end. This makes it possible to test the full sequence of database migrations with a clean slate each time.I believe this is the right long-term direction because it's the approach that other migration libraries use (Rails/ActiveRecord, Sequel, Goose, etc). It also enables the potential for us to add the ability for individual migrations to opt out of a having a wrapping transaction, which is essential if we ever want our default to be
CREATE INDEX CONCURRENTLY
rather than synchronous indexing (as it really should be for any at-scale system).