Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make cost surface data migration idempotent [MRXN23-478] #1540

Draft
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

hotzevzl
Copy link
Member

@hotzevzl hotzevzl commented Oct 6, 2023

@vercel
Copy link

vercel bot commented Oct 6, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
marxan ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 6, 2023 2:44pm

@hotzevzl
Copy link
Member Author

hotzevzl commented Oct 6, 2023

just an initial draft for now, right after running the data migration in my local dev env and on marxan23:

  • I think that using the scenario_id of an existing scenario as id of the corresponding cost surface created by copying over the scenario's current scenarios_pu_cost_data records should not do any harm, and actually can help us guarantee that if we were to re-run the data migration script again (with some further tweaks over the current draft of my changes: we'd need to turn some inserts into upserts), we would not scatter new cost surfaces all over the place (as we would if using gen_random_uuid()) but we'd "just" copy over the scenarios_pu_cost_data costs again
  • likewise for creating (or upserting, in a later commit) default cost surfaces using the project's project_id as the cost surface's id

Both these things should maybe handled by using a function to pick a UUIDv4 as a sort of hash (sort of what can be done in Java with this library method: https://stackoverflow.com/a/29059595), but I don't think it's worth the effort (of finding a Python equivalent and using it) over "just" syncing the ids

@hotzevzl hotzevzl requested review from KevSanchez and alexeh October 6, 2023 17:20
@hotzevzl
Copy link
Member Author

hotzevzl commented Oct 6, 2023

The other thing I added in this initial draft is to avoid deleting orphaned scenarios - yes, scenarios with cost data in such a bad shape (=likely not even existing, for whatever reason) are going to be useless in practice, but thinking about doing this in staging and production, I think that by keeping around any scenarios with messed-up data, we wouldn't make things worse than they are, and on the other hand we get a chance to inspect and see what's up.

I've seen one such scenario when doing the data migration on m23 and that was a legacy project yadda yadda, plus there were 4 default cost surfaces for which we didn't have min/max data, which was likely because of some mess-up of the project grid (no PUs? some bug in some specific way of creating the grid so that no project_pus were created? I'll need to check more carefully next week).

… cost surface

Avoid automatically deleting the scenario: we log the anomalous
situation and arbitrarily set the cost surface's range to [0,0]. In
practice, the scenario that uses such cost surface may have contained
invalid or incomplete data to start with, so setting an arbitrary range
may not really matter here (as in, it should not make things worse).
@hotzevzl hotzevzl force-pushed the chore/data/MRXN23-478_make-cost-surface-data-migration-idempotent branch from fb78313 to 8111fd0 Compare December 6, 2023 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant