Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support vacuum leaked table data #17022

Merged
merged 3 commits into from
Dec 13, 2024

Conversation

SkyFan2002
Copy link
Member

@SkyFan2002 SkyFan2002 commented Dec 9, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Leaked table refers to data that is not recorded in the metadata but still occupies storage space, possibly due to a bug in the vacuum process. This PR allows the use of vacuum drop table [from database] force to clean up leaked tables.

For example:

Construct leaked data

  1. create a table
root@localhost:8000/default> create database test_database;
processed in (0.061 sec)

root@localhost:8000/default> create table test_database.t(c int);
processed in (0.075 sec)

root@localhost:8000/default> insert into test_database.t values(1);
1 rows affected in (0.076 sec)

root@localhost:8000/default> select * from fuse_snapshot('test_database','t');

SELECT * FROM fuse_snapshot('test_database', 't')

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│            snapshot_id           │                     snapshot_location                     │ format_version │ previous_snapshot_id │ segment_count │ block_count │ row_count │ bytes_uncompressed │ bytes_compressed │ index_size │          timestamp         │
│              String              │                           String                          │     UInt64     │   Nullable(String)   │     UInt64    │    UInt64   │   UInt64  │       UInt64       │      UInt64      │   UInt64   │     Nullable(Timestamp)    │
├──────────────────────────────────┼───────────────────────────────────────────────────────────┼────────────────┼──────────────────────┼───────────────┼─────────────┼───────────┼────────────────────┼──────────────────┼────────────┼────────────────────────────┤
│ 583a15becadf4492b79c691a60884635 │ 121460/121467/_ss/583a15becadf4492b79c691a60884635_v4.mpk4NULL11153584252024-12-09 11:07:12.574054 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
1 row read in 0.024 sec. Processed 1 row, 191 B (41.67 rows/s, 7.77 KiB/s)
  1. backup the data:
cp -r .databend/stateless_test_data/121460/121467/ ../
  1. execute vacuum:
root@localhost:8000/default> drop table test_database.t;
processed in (0.083 sec)

root@localhost:8000/default> set data_retention_time_in_days=0;

SET data_retention_time_in_days = 0

0 row read in 0.005 sec. Processed 0 row, 0 B (0 row/s, 0 B/s)

root@localhost:8000/default> vacuum drop table from test_database;

VACUUM DROP TABLE FROM test_database 

0 row read in 0.054 sec. Processed 0 row, 0 B (0 row/s, 0 B/s)
  1. bring back the data:
cp -r ../121467/ .databend/stateless_test_data/121460/

Execute vacuum without force

sky@hp:~/databend$ bendsql
Welcome to BendSQL 0.21.0-unknown.
Connecting to localhost:8000 as user root.
Connected to Databend Query v1.2.670-nightly-9700a3b6ca(rust-1.81.0-nightly-2024-12-09T06:50:31.561198402Z)
Loaded 1394 auto complete keywords from server.

root@localhost:8000/default> set data_retention_time_in_days=0;

SET data_retention_time_in_days = 0

0 row read in 0.003 sec. Processed 0 row, 0 B (0 row/s, 0 B/s)

root@localhost:8000/default> vacuum drop table from test_database;

VACUUM DROP TABLE FROM test_database 

0 row read in 0.019 sec. Processed 0 row, 0 B (0 row/s, 0 B/s)

root@localhost:8000/default> quit;
Bye~

leaked data is not vacuumed:

sky@hp:~/databend$ ls .databend/stateless_test_data/121460/
121467

Execute vacuum with force

sky@hp:~/databend$ bendsql
Welcome to BendSQL 0.21.0-unknown.
Connecting to localhost:8000 as user root.
Connected to Databend Query v1.2.670-nightly-9700a3b6ca(rust-1.81.0-nightly-2024-12-09T06:50:31.561198402Z)
Loaded 1394 auto complete keywords from server.

root@localhost:8000/default> set data_retention_time_in_days=0;

SET data_retention_time_in_days = 0

0 row read in 0.003 sec. Processed 0 row, 0 B (0 row/s, 0 B/s)

root@localhost:8000/default> vacuum drop table from test_database force;

vacuum drop table from test_database force

0 row read in 6.784 sec. Processed 0 row, 0 B (0 row/s, 0 B/s)

root@localhost:8000/default> quit;

leaked data is vacuumed:

sky@hp:~/databend$ ls .databend/stateless_test_data/121460/
sky@hp:~/databend$ 

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test Not easy to construct leaked table in test.

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Dec 9, 2024
Copy link
Member

@zhyass zhyass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like a safe operation? Will it cause the newly created tables to be cleaned up incorrectly?

@SkyFan2002
Copy link
Member Author

It doesn't seem like a safe operation? Will it cause the newly created tables to be cleaned up incorrectly?

A newly created table will first create metadata, and at this point, there will be no corresponding files in the storage. Only after executing DML operations will the corresponding directory be created. Therefore, if a file belongs to a valid table, there must be a corresponding record in the metadata.So, files without corresponding metadata records can be safely deleted.

@zhyass
Copy link
Member

zhyass commented Dec 10, 2024

I noticed that the list path comes before the list table, so that should be fine.

@BohuTANG BohuTANG merged commit 3a9f404 into databendlabs:main Dec 13, 2024
72 checks passed
dantengsky added a commit to dantengsky/fuse-query that referenced this pull request Dec 26, 2024
dantengsky added a commit to dantengsky/fuse-query that referenced this pull request Dec 26, 2024
BohuTANG pushed a commit that referenced this pull request Dec 26, 2024
* Revert "chore: add more log (#17110)"

This reverts commit 0e05b43.

* Revert "feat: support vacuum leaked table data (#17022)"

This reverts commit 3a9f404.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants