Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dolt diff shows NULL values for virtual generated columns #8879

Open
nicktobey opened this issue Feb 21, 2025 · 2 comments
Open

dolt diff shows NULL values for virtual generated columns #8879

nicktobey opened this issue Feb 21, 2025 · 2 comments
Assignees
Labels
cli good first issue Good for newcomers

Comments

@nicktobey
Copy link
Contributor

Repro steps:

create table test(pk int primary key, c0 int generated always as (pk + 1));
insert into test(pk) values (1), (2);

dolt diff will produce the following output:

diff --dolt a/test b/test
added table
+CREATE TABLE `test` (
+  `pk` int NOT NULL,
+  `c0` int GENERATED ALWAYS AS ((`pk` + 1)),
+  PRIMARY KEY (`pk`)
+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;
+---+----+------+
|   | pk | c0   |
+---+----+------+
| + | 1  | NULL |
| + | 2  | NULL |
+---+----+------+

This is likely because c0 is virtual, so the tuples in the table don't actually contain a value for it. Displaying the correct values would require evaluating the stored expression for that column, which requires a SQL engine.

As for the correct behavior here, I'm leaning towards excluding virtual generated columns from dolt diff entirely, since they aren't actually part of the data in the diff.

@nicktobey nicktobey added cli good first issue Good for newcomers labels Feb 21, 2025
@astorig
Copy link

astorig commented Feb 22, 2025

hi, I would like to take this issue. could you assign it to me?

@nicktobey
Copy link
Contributor Author

Sure! Thank you for your time!

This likely happened because Dolt previously didn't support virtual columns. And adding virtual columns complicates things.

When there are no virtual columns, there's no difference between "table rows stored on disk" and "table rows seen by the engine". But once a table has virtual columns, that distinction matters, and any code that assumes that the two are identical may get incorrect results.

In this case, it's likely that the dolt diff code is reading the rows directly out of storage and assuming that they match the table schema. Due to the way we store rows, adding a NULL to the end of a row doesn't change the bytes on disk: so for example, the row (1, NULL) and the row (1,) have the exact same bytes. So dolt diff is likely reading the latter and treating it like it's the former.

I think that the best solution here is to simply omit any virtual generated columns from the output, since computing the correct value for the column would be more complicated and potentially slow.

The code for the diff command is in go/cmd/dolt/commands/diff.go. Most of the logic for the command works by writing SQL queries against system tables. Most likely this will involve changing the behavior of the dolt_diff() system table function to not have columns for virtual generated columns. That table function is defined in go/libraries/doltcore/sqle/dtablefunctions/dolt_diff_table_function.go.

Let me know if you have any questions! There's a lot going on in these files because they interact with the SQL engine, but stepping through a sample dolt diff command in the debugger should help you figure out what each part is doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants