-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database defined UNIQUE or FOREIGN KEY constraint causes rebase to fail #210
Comments
Thanks for the detailed bug report @leorudczenko! I have looked into this and the reason for things falling apart is this:
Now how to solve this issue: Ideally we would want to tell sqlite to temporarily disable constraints while we're doing updates (like what postgres has with I can see two ways out:
The question is what to do when there would be simultaneous edits that actually break the unique constraint - when geodiff is used in context of Mergin Maps, this would mean that a user would end up being unable to sync their changes, due to the unique constraint error, and no way to fix it. This would be quite tricky to handle, to give mobile users enough information what has happened and offer some ways out - for the typical use case with field surveys, it is typically easier to let some data admin fix such issues in the data afterwards... I am keen to hear your thoughts! |
Thanks so much for all the information and insightful discussion @wonder-sk! It's great to finally understand what's going on with this bug 😄 I personally think the first of your two potential solutions sounds like the more complex but ideal one, where you try to apply the difference and then create a buffer of changes on a failed constraint. It seems like it would better fit the current process used to apply differences, whilst avoiding potential issues with directly modifying the database schema. I think when there are simultaneous edits that break a constraint, it would make sense then for it to create a conflict version of the database, like Mergin Maps already does. In an ideal world, the user would also be informed that this has been done, but it isn't required. Then again, Mergin Maps currently does not display any message when a conflict occurs. However, the important difference between these 2 scenarios would be the reason for the conflict, which would be very helpful to know as a developer. Especially if it could include the exact constraint that failed. |
Hi @wonder-sk, Thanks for picking this up. Although it is a bit awkward that we have constraints on the GeoPackage, I think that it is likely that others will take advantage of these features of SQLite, too. Without being able to switch off constraints, applying the diff is a tricky problem. I have two comments:
DROP TABLE IF EXISTS test;
DROP TABLE IF EXISTS geodiff_working;
CREATE TABLE test (
"fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
"name" TEXT UNIQUE
);
INSERT INTO test (fid, name) VALUES (1, 'our_change');
CREATE TEMPORARY TABLE geodiff_working AS SELECT * FROM test;
INSERT INTO geodiff_working (fid, name) VALUES (2, 'our_change');
UPDATE geodiff_working SET name = 'their_change' WHERE fid = 1;
-- Getting the data back isn't so easy...
-- We can't do this in real life because wiping the table will break foreign key
-- constraints and inserting all the records will fire insert triggers and
-- increment sequences.
DELETE FROM test;
INSERT INTO test SELECT * FROM geodiff_working;
SELECT * FROM test; Please let us know if there is anything that we can do to help. We can arrange a call if you would like to discuss options (and it would be good for us to learn how geodiff works internally) and we can do also test different versions of code. |
I had another thought for a workaround. In this case you create a stash of failed values inserting a placeholder in the failed column, then update them into the correct place at the end.
|
We had another discussion about workarounds for this. One question that we had was why the UNIQUE constraint issue only affects us in the mobile app and not in the QGIS plugin. |
It seems like GDAL and QGIS are both adding support for auto-detection of UNIQUE constraints in GeoPackages, so there must be other people than us that use them. They already detect foreign key relationships. |
We tried a workaround for this issue by dropping all of the The script below will fail when """
geodiff_fk_rebase_bug.py
"""
import sqlite3
from pathlib import Path
import pygeodiff
import etlhelper as etl
def main() -> None:
# Define GeoPackage filepaths
gpkg_dir = Path(__file__).parent
base = gpkg_dir / "base.gpkg"
ours = gpkg_dir / "ours.gpkg"
theirs = gpkg_dir / "theirs.gpkg"
reset_gpkg_files(base, ours, theirs)
create_gpkg_files(base, ours, theirs)
geodiff_rebase(gpkg_dir, base, ours, theirs)
def reset_gpkg_files(base: Path, ours: Path, theirs: Path) -> None:
"""
Delete the GeoPackage files if they already exist.
"""
for gpkg in [base, ours, theirs]:
gpkg.unlink(missing_ok=True)
def create_gpkg_files(base: Path, ours: Path, theirs: Path) -> None:
"""
Create 3 GeoPackage files at the given filepaths.
All 3 of the files will have the same tables created in them.
But, for the file 'ours' and 'theirs', they will each have 1 row inserted into each table too.
These rows will abide by the database table constraints.
"""
# We have to setup the "uuid" column with a UNIQUE constraint to comply with SQLite database constraints for foreign key columns
# It is not possible to define foreign key relationships in SQLite without UNIQUE constraints on the parent key column
# Typically, the PRIMARY KEY column is used as the parent key column, where the PRIMARY KEY by default also applies the UNIQUE constraint
# If you remove the UNIQUE constraint from the "uuid" column in the table "test_parent", it will result in the following SQLite error:
# foreign key mismatch - "test_child" referencing "test_parent"
create_test_table = """
CREATE TABLE test_parent (
"fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, -- This is the PRIMARY KEY for the GeoPackage
"uuid" TEXT UNIQUE, -- This is the FOREIGN KEY for parent/child relationships
"name" TEXT
)
"""
create_test_child_table = """
CREATE TABLE test_child (
"fid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
"parent_fuid" TEXT,
"name" TEXT,
FOREIGN KEY("parent_fuid") REFERENCES "test_parent"("uuid")
)
"""
# Add both tables to all 3 databases
for gpkg in [base, ours, theirs]:
with sqlite3.connect(gpkg) as conn:
etl.execute("PRAGMA foreign_keys = ON", conn)
# Create the 2 tables
for create_sql in [create_test_table, create_test_child_table]:
etl.execute(create_sql, conn)
# Add 1 row to ours and theirs
if gpkg.stem == "ours":
etl.load("test_parent", conn, [{"name": "our_change", "uuid": "{ac267d95-db70-4299-b019-c52599ca1e5f}"}])
etl.load("test_child", conn, [{"parent_fuid": "{ac267d95-db70-4299-b019-c52599ca1e5f}", "name": "our_child"}])
elif gpkg.stem == "theirs":
etl.load("test_parent", conn, [{"name": "their_change", "uuid": "{abc43098-fe9b-4da0-b008-7518694466bb}"}])
etl.load("test_child", conn, [{"parent_fuid": "{abc43098-fe9b-4da0-b008-7518694466bb}", "name": "their_child"}])
def geodiff_rebase(gpkg_dir: Path, base: Path, ours: Path, theirs: Path) -> None:
"""
Call geodiff.rebase.
When it works, you should end up with both of the inserted rows in the file 'ours'.
When it fails, you should see 'pygeodiff.geodifflib.GeoDiffLibError: rebase' and no conflict file is created.
"""
geodiff = pygeodiff.GeoDiff()
geodiff.rebase(str(base), str(theirs), str(ours), str(gpkg_dir / "conflict.gpkg"))
print(f"Rebased files successfully into: {ours}")
if __name__ == "__main__":
main() We require the foreign key relationships to be defined in the database because our tool depends on QGIS auto-detecting the relationships. This issue may relate to these other issues concerning foreign key relationships: |
I have looked into this again and the case with foreign keys boil down again to the violation of the UNIQUE constraint on Looking further into the sqlite3 session extension (which inspired geodiff implementation), it supports the approach with accumulation of failed insert/update/delete statements due to failed constraints, and re-applying them later - I am convinced now that if we want to solve these issues, we should take that route. I have done a quick&dirty proof of concept (https://gist.github.com/wonder-sk/60fa63a7b0e51e7276f5773dda6142a3) and verified this would help to fix the issue. A proper fix would need to be of course more elaborate and auto-tested... Simply reversing the order of statements (as suggested in an earlier comment) is likely to be only a partial solution - sometimes it could work, but probably in more complex scenarios we would have failures again. Even with the solution of accumulating failed statements and retrying them, this may need to be done in multiple passes (as long as we were able to apply at least one failed statement). By the way, as for why this issue only happens on mobile app, and not in QGIS, this is most likely simply a matter of luck, that relevant changes get written in different order, so the unique constraint does not fail. |
Thank you for looking at this, @wonder-sk and for taking the time to make a proof-of-concept. Repeatedly looping through and retrying failed statements seems like a good approach then, particularly if that is the way that SQLite themselves does it. That's interesting about the difference between mobile app and QGIS. I will keep an eye out for conflicts in other Mergin projects. |
I just remembered that |
This is a follow-up from MerginMaps/mobile#2933. After further investigation into this bug, we have found that the bug exists in the
rebase
function fromgeodiff
.Summary
Our database/GeoPackage structure requires the use of the
UNIQUE
constraint on certain fields, specifically for primary/foreign keys. When trying to rebase a GeoPackage file with aUNIQUE
constraint on any column, it fails with this error:pygeodiff.geodifflib.GeoDiffLibError: rebase
. This error is then what leads to the strange conflict bug here: MerginMaps/mobile#2933.To Reproduce
We have created a Python script to re-produce this error repeatedly, using a minimal table definition with 2 fields:
Running the Script
To run the script, you need to install the following Python libraries:
pygeodiff
etlhelper
Run the script with:
python geodiff_unique_rebase_bug.py
Results
The script will fail at the
rebase
step, demonstrating the bug.If you then remove the
UNIQUE
constraint from thetest
table definition, it will allow the script to run to completion. In this circumstance, the GeoPackageours
will end up with 2 rows as expected.Software Versions
python
version3.9.18
pygeodiff
version2.0.2
The text was updated successfully, but these errors were encountered: