Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"address" values are unnecessarily deserialized/reserialized. #8840

Open
nicktobey opened this issue Feb 9, 2025 · 0 comments
Open

"address" values are unnecessarily deserialized/reserialized. #8840

nicktobey opened this issue Feb 9, 2025 · 0 comments
Labels

Comments

@nicktobey
Copy link
Contributor

Some types in Dolt, such as TEXT and BLOB types, are stored in the table as an content address of some other content in the chunk store. An operation on that table that doesn't particularly care about the values of those columns shouldn't require those values to be deserialized in order to work. However, because go-mysql-server abstracts away storage details and may not have access to the content address, it may end up fully loading these values unnecessarily.

For example, consider this example where we make a table by transforming an existing one:

create table input(pk int primary key, c0 int, b longblob);
insert into input values (1, 1, load_file("large_blob.bin"));

create table output(pk int primary key, c0 int, b longblob);
insert into output select pk, c0 + 1, b from longblob;

In this case, we can simply copy the content address from input to output, but instead we fully load the blob into memory and rechunk it when writing it to output. Depending on the size of the blob, this can make the operation astronomically slower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant