Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why DOLT_DIFF_SUMMARY shows record without data change and schema change? #8907

Open
mateusz-lachowski-codilime opened this issue Feb 28, 2025 · 2 comments
Assignees
Labels

Comments

@mateusz-lachowski-codilime

Dolt version: v1.47.1

I've encounter following issue using select * from dolt_diff_summary(...) because in some cases it's returning records with both data_change and schema_change set to false. For example:

SELECT * FROM DOLT_DIFF_SUMMARY('5ps4828ppa912injf4vj89h9meaarepu', '76q0rsmorlg35o64uf87d37n2hbiae43')

|from_table_name              |to_table_name                |diff_type|data_change|schema_change|
|-----------------------------|-----------------------------|---------|-----------|-------------|
|dcim_location                |dcim_location                |modified |1          |0            |
|extras_configcontext_clusters|extras_configcontext_clusters|modified |0          |0            |

But other methods return "proper" data:

SELECT * FROM DOLT_DIFF_STAT('5ps4828ppa912injf4vj89h9meaarepu', '76q0rsmorlg35o64uf87d37n2hbiae43')

|table_name         |rows_unmodified|rows_added|rows_deleted|rows_modified|cells_added|cells_deleted|cells_modified|old_row_count|new_row_count|old_cell_count|new_cell_count|
|-------------------|---------------|----------|------------|-------------|-----------|-------------|--------------|-------------|-------------|--------------|--------------|
|dcim_location      |0              |0         |0           |1            |0          |0            |3             |1            |1            |22            |22            |

And dolt diff return nothing for specified table, either on main branch activated or new branch.

SELECT * FROM DOLT_DIFF('5ps4828ppa912injf4vj89h9meaarepu', '76q0rsmorlg35o64uf87d37n2hbiae43', 'extras_configcontext_clusters')

select * from dolt_diff_extras_configcontext_clusters

I've found out that this is happening especially when comparing commit hashes from two different branches. In above case, 5ps commit is the last commit on main branch and 76q is the first and only commit on newly created branch out of main. When I'm comparing two commits from the same branch results are correct.

Expected results: all of the results from calling dolt_diff_summary should be with either data_change or schema_change flag set to true or not shown at all.

@timsehn
Copy link
Contributor

timsehn commented Feb 28, 2025

Interesting. This means the actual hash of the table is different but Dolt can't display the change. There are some "hidden" elements in table schema called tags.

Is your database able to be shared?

@timsehn timsehn added bug Something isn't working version control labels Feb 28, 2025
@nicktobey nicktobey self-assigned this Feb 28, 2025
@nicktobey
Copy link
Contributor

Hey Mateusz!

I agree this is confusing, and we should figure out what's happening.

I have a theory what might be causing this. Does your extras_configcontext_clusters table use an auto-incrementing primary key? Tables in Dolt store the highest auto-incrementing primary key they've generated, in order to guarantee that it doesn't generate the same primary key twice even if the original row gets deleted. This stored value affects the table's hash despite not being part of either the schema or the data. If this value and only this value were to change (for instance, by inserting a new row and then immediately deleting it), it would cause the table to report as modified despite there being no schema change and no data change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants