Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: recluster unclustered blocks #15623

Merged
merged 4 commits into from
Jun 4, 2024
Merged

Conversation

zhyass
Copy link
Member

@zhyass zhyass commented May 23, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Recluster Unclustered Blocks:

  • The recluster operation now supports targeting unclustered blocks.
  • When constructing recluster tasks, the system prioritizes selecting unclustered blocks for the resort operation.
  • For tables that already contain data, adding or modifying a cluster key and performing recluster can be done directly using the ALTER TABLE xx RECLUSTER FINAL command.
mysql> create table t14(a int not null) row_per_block=3;
Query OK, 0 rows affected (0.06 sec)

mysql> insert into t14 values(0),(1),(4);
Query OK, 3 rows affected (0.07 sec)

mysql> insert into t14 values(3);
Query OK, 1 row affected (0.06 sec)

mysql> insert into t14 values(-6),(-8);
Query OK, 2 rows affected (0.09 sec)

mysql> ALTER TABLE t14 cluster by(abs(a));
Query OK, 0 rows affected (0.10 sec)

mysql> insert into t14 values(2),(5),(-7);
Query OK, 3 rows affected (0.11 sec)

mysql> select * from clustering_information('default','t14');
+-------------+-------------------+----------------------+-------------------------+------------------+---------------+-----------------------+
| cluster_key | total_block_count | constant_block_count | unclustered_block_count | average_overlaps | average_depth | block_depth_histogram |
+-------------+-------------------+----------------------+-------------------------+------------------+---------------+-----------------------+
| (abs(a))    |                 4 |                    0 |                       3 |              0.0 |           1.0 | {"00001":1}           |
+-------------+-------------------+----------------------+-------------------------+------------------+---------------+-----------------------+
1 row in set (0.05 sec)
Read 1 rows, 448.00 B in 0.039 sec., 25.55 rows/sec., 11.18 KiB/sec.

mysql> alter table t14 recluster;
Query OK, 6 rows affected (0.19 sec)

mysql> select * from clustering_information('default','t14');
+-------------+-------------------+----------------------+-------------------------+------------------+---------------+-----------------------+
| cluster_key | total_block_count | constant_block_count | unclustered_block_count | average_overlaps | average_depth | block_depth_histogram |
+-------------+-------------------+----------------------+-------------------------+------------------+---------------+-----------------------+
| (abs(a))    |                 3 |                    0 |                       0 |              2.0 |           3.0 | {"00003":3}           |
+-------------+-------------------+----------------------+-------------------------+------------------+---------------+-----------------------+
1 row in set (0.04 sec)
Read 1 rows, 448.00 B in 0.025 sec., 39.27 rows/sec., 17.18 KiB/sec.

mysql> alter table t14 recluster;
Query OK, 9 rows affected (0.32 sec)

mysql> select * from clustering_information('default','t14');
+-------------+-------------------+----------------------+-------------------------+------------------+---------------+-----------------------+
| cluster_key | total_block_count | constant_block_count | unclustered_block_count | average_overlaps | average_depth | block_depth_histogram |
+-------------+-------------------+----------------------+-------------------------+------------------+---------------+-----------------------+
| (abs(a))    |                 3 |                    0 |                       0 |              0.0 |           1.0 | {"00001":3}           |
+-------------+-------------------+----------------------+-------------------------+------------------+---------------+-----------------------+
1 row in set (0.08 sec)
Read 1 rows, 448.00 B in 0.056 sec., 17.94 rows/sec., 7.85 KiB/sec.
  • Fixes #[Link the issue here]

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label May 23, 2024
@zhyass zhyass marked this pull request as draft May 23, 2024 06:03
@zhyass zhyass force-pushed the feature_fix branch 2 times, most recently from b5442c3 to 41952fc Compare May 25, 2024 06:48
@zhyass zhyass marked this pull request as ready for review May 25, 2024 08:02
@zhyass zhyass marked this pull request as draft May 31, 2024 16:03
@zhyass zhyass marked this pull request as ready for review May 31, 2024 16:03
@everpcpc everpcpc marked this pull request as draft May 31, 2024 16:22
@everpcpc everpcpc marked this pull request as ready for review May 31, 2024 16:22
@zhyass zhyass requested a review from sundy-li May 31, 2024 17:19
@dantengsky
Copy link
Member

dantengsky commented Jun 4, 2024

@zhyass

09_0008_fuse_optimize_table.test failed in local dev box, any clue?

./target/debug/databend-sqllogictests  --handlers mysql --run_dir  base -f 09_0008_fuse_optimize_table.test  --skip_dir  management,explain_native,ee
MySQL client starts to run with: SqlLogicTestArgs { dir: Some("base"), file: Some("09_0008_fuse_optimize_table.test"), skipped_dir: Some("management,explain_native,ee"), skipped_file: None, handlers: Some(["mysql"]), suites: "tests/sqllogictests/suites", complete: false, no_fail_fast: false, parallel: 1, enable_sandbox: false, debug: false, bench: false, database: "default" }
Running MySQL test for file: tests/sqllogictests/suites/base/09_fuse_engine/09_0008_fuse_optimize_table.test ...
Test finished, fail fast enabled, 1 out of 241 records failed to run
0: query result mismatch:
[SQL] select * from clustering_information('db_09_0008','t15')
[Diff] (-expected|+actual)
-   (abs(a)) 3 0 0 2.0 3.0 {"00003":3}
+   (abs(a)) 4 0 3 0.0 1.0 {"00001":1}
at tests/sqllogictests/suites/base/09_fuse_engine/09_0008_fuse_optimize_table.test:810

UPDATE:

sorry, false alarm, testing wrong PR

@dantengsky dantengsky added this pull request to the merge queue Jun 4, 2024
@BohuTANG BohuTANG removed this pull request from the merge queue due to a manual request Jun 4, 2024
@BohuTANG BohuTANG merged commit e980c27 into databendlabs:main Jun 4, 2024
71 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants