Cherry-pick VACUUM stats & Vacuum progress related commits. #611

reshke · 2024-08-30T20:12:00Z

Cherry-pick some GPDB commits.

This is part 1 of my native Yezzey support for CBDB plan open-gpdb/yezzey#40 (comment)

CLAassistant · 2024-08-30T20:12:08Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 10 committers have signed the CLA.

✅ reshke
❌ haolinw
❌ soumyadeep2007
❌ linxuh
❌ AJR-VMware
❌ Stolb27
❌ l-wang
❌ kainwen
❌ bmdoil
❌ huansong
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

reshke · 2024-09-17T09:23:11Z

UPD: cherry-pick a6fdc27246e46bf3b72d43420c1454a37e68e273 from gp7 to fix incorrect regression test results

lss602726449 · 2024-09-27T08:45:10Z

Thanks for your Merge Request. I think We'd better revert the last commit which have been merged and squash the last two commit (refac and 993d11e) together.

reshke · 2024-09-28T11:59:31Z

Thanks for your Merge Request. I think We'd better revert the last commit which have been merged and squash the last two commit (refac and 993d11e) together.

Hi, many thanks for your review.
did exacly as you suggested.

src/backend/access/aocs/aocsam.c

src/backend/catalog/pg_attribute_encoding.c

reshke · 2024-09-30T05:24:09Z

Applied all review comments as suggested

As GPDB vacuumss auxiliary tables together with base relation in one distributed transaction, rather than PostgreSQL where one transaction is opened and closed for each auxiliary relation, vacuuming auxiliary TOAST should not be dispatched.

This enables reporting for the table scan phase of index builds on AOCO tables. This directly affects: (1) pg_stat_progress_create_index.blocks_total (2) pg_stat_progress_create_index.blocks_done Sample: Connect in utility mode to a segment, while CREATE INDEX is running on an AOCO table: postgres=# select pid, command, phase, blocks_total, blocks_done from pg_stat_progress_create_index; pid | command | phase | blocks_total | blocks_done --------+--------------+--------------------------------+--------------+------------- 644771 | CREATE INDEX | building index: scanning table | 9779 | 6832 This commit introduces: (1) Accounting in AOCSScanDescData to track the total number of bytes read across all segment files. It is updated whenever a new block is read. If the block is compressed, its compressed length is added (uncompressed length otherwise). (2) open_all_datumstreamread_segfiles() is touched since it also reads blocks. It's interface is made simpler by simply passing the scan descriptor, which is also used to get at the total bytes counter. (3) Reporting code in aoco_index_build_range_scan() to report in increments of BLCKSZ, the number of blocks scanned so far. The total number of blocks to be scanned is also published up front and is determined by summing the compressed eof values of all segfiles. The total number of blocks depends on whether a block directory is being built as part of the scan. If it is being built, then the whole table needs to be scanned, whereas only index columns need to be scanned otherwise. A new function has been introduced to summarize the filesegtotals which accepts column projection info: GetAOCSSSegFilesTotalsWithProj(). Note: We currently only support reporting for non-concurrent and serial builds.

* IndexVacuumInfo.analyze_only was not initialized, leading to possible no-ops in index_vacuum_cleanup(), if the garbage value for it was non-zero. * IndexVacuumInfo.num_heap_tuples should be set to the AO table's old reltuples value (value prior to kicking off vacuum). Please see IndexVacuumInfo and lazy_vacuum_index() for details. As explained in the code comments, num_heap_tuples will always be an estimate during ambulkdelete(). Since dedd4075c9f, we have been using the index rel's reltuples instead. This is wrong (eg in scenarios where the index has been created post data manipulation). So, correctly use the table's reltuples value to provide the estimate.

Given the scenario of updating stats during VACUUM: 1) coordinator vacuums and updates stats of its own; 2) then coordinator dispatches vacuum to segments; 3) coordinator combines stats received from segments to overwrite the stats of its own. Because upstream introduced a feature which could skip full index scan uring cleanup of B-tree indexes when possible (refer to: postgres/postgres@857f9c3), there was a case in QD-QEs distributed deployment that some QEs could skip full index scan and stop updating statistics, resulting in QD being unable to collect all QEs' stats thus overwrote a paritial accumulated value to index->reltuples. More interesting, it usually happened starting from the 3rd time of consecutively VACUUM after fresh inserts due to above skipping index scan criteria. This patch is intended to prevent from above issue by two aspects: on QE, do not skip full index scan, report current statistics to QD as requested; on QD, restrict updating statistics only when collecting all QEs' data completely, if the reporting QE number doesn't match the total number of dispatched QEs, no stats update happens.

Since we have already checked the visibility during scan tuples in aocs compaction, the visibility check code in aocs compaction is unnecessary and can be removed. Co-authored-by: linxu.hlx <[email protected]>

When this option is passed, instead of vacuuming main tables look up auxiliary tables for all indicated AO tables and vacuum those. If this option is given without a tablename, vacuum ONLY all AO auxiliary tables. Also add regress test coverage of this behavior.

We used to have the pg_appendonly row for the AO/CO table stored in its relcache entry, but that somehow got lost during the PG12 merge (it is still in 6X). Now bringing it back, and utilize it wherever possible. This commit basically a partial cherrypick of ef92bcf151f99c9152a10c8750eee0d4dde39bc1. But due to a lot of changes in the code base and also we don't need other changes in that commit, we are not cherrypicking it.

…nsert() to reduce unnecessary catelog scans as the required segrelid was already obtained from appendonly_insert_init().

Enable system view "pg_stat_progress_vacuum" to report VACUUM progress for ao_row and ao_column tables. Here are details of how each field in this view is interpreted/updated for append-optimized tables: ----- phase ----- Since the mechanism of VACUMM append-optimized tables is significantly different from VACUUM heap tables, this commit adds three additional vacuum progress phases dedicated to append-optimized talbes. Possible VACUUM phases for append-optimized tables include the following: - PROGRESS_VACUUM_PHASE_AO_PRE_CLEANUP (append-optimized pre-cleanup) - PROGRESS_VACUUM_PHASE_AO_COMPACT (append-optimized compact) - PROGRESS_VACUUM_PHASE_AO_POST_CLEANUP (append-optimized post-cleanup) - PROGRESS_VACUUM_PHASE_VACUUM_INDEX (vacuuming indexes) Phases for VACUUM append-optimized table progresses in the following order: - append-optimized pre-cleanup - append-optimized compact - append-optimized post-cleanup "vacuuming index" is a "sub-phases" that could happen as part of pre-cleanup phase and/or post-cleanup phase, it happens if: - the relation has index, AND - there are awaiting-drop segment files that are invisible to all. While a segment can only be marked as awaiting-drop during VACUUM compact phase, it can be recycled during either post-cleanup phase or pre-clean phase of a later vacuum run. ----------- heap_blks_* ----------- We divide byte size by heap block size to get heap_blks_* numbers for AO tables. Since append-optimized tables have variable length block sizes, measuring heap-equivalent blocks instead of using number of varblocks better helps users to compare and calibrate. --------------- heap_blks_total --------------- - Collected at the begining of pre-cleanup phase by adding up the on-disk file sizes of all segment files for the append-optimized relation and convert that size into number of heap-equivalent blocks. - The value should not change while VACUUM progresses. ----------------- heap_blks_scanned ----------------- - Collected during compact phase. - ao_row: updated every time we finish scanning a segment file. - ao_column: updated every time the number advances. - SPECIAL NOTE for AO: heap_blks_scanned can be smaller or equal to heap_blks_total at the end of VACUUM, because for AO tables we need not to scan blocks after the logical EOF of a segment file. ------------------ heap_blks_vacuumed ------------------ - Collected whenever we truncate a segment file, which could happen during either/both/neither pre-cleanup phase and post-cleanup phase. - SPECIAL NOTE for AO: heap_blks_vacuumed could be either smaller or larger than heap_blks_scanned. For AO tables the bottom line is at the end of VACUUM the sum of heap_blks_vacuumed and heap_blks_scanned must be smaller or equal to heap_blks_total. ------------------ index_vacuum_count ------------------ - Collected whenever we recycle a dead segment file, which could happen during either/both/neither pre-cleanup phase and post-cleanup phase. --------------- max_dead_tuples --------------- - Collected at the beginning of pre_cleanup phase. This is the total number of tuples before the logical EOF of all segment files. - The value should not change while VACUUM progresses. --------------- num_dead_tuples --------------- - Collected during compact phase - ao_row: Update for every time we throw a dead tuple - ao_column: updated every time we move a tuple and if number of dead tuples advances Reviewed-by: Huansong Fu <[email protected]> Remove tests that are not here

This commit fixes the following compiler warnings: vacuum_ao.c:738:1: warning: ‘init_vacrelstats’ was used with no prototype before its definition [-Wmissing-prototypes] 738 | init_vacrelstats() | ^~~~~~~~~~~~~~~~ CPartPruneStepsBuilder.cpp: In member function ‘PartitionedRelPruneInfo* gpdxl::CPartPruneStepsBuilder::CreatePartPruneInfoForOneLevel(gpdxl::CDXLNode*)’: CPartPruneStepsBuilder.cpp:83:29: warning: comparison of integer expressions of different signedness: ‘gpos::ULONG’ {aka ‘unsigned int’} and ‘int’ [-Wsign-compare] 83 | for (ULONG i = 0; i < pinfo->nparts; ++i) | ~~^~~~~~~~~~~~~~~

The following fields of the pg_stat_all_tables view are now available for append-optimized tables: n_live_tup n_dead_tup last_vacuum vacuum_count Note that n_dead_tup only accounts for dead tuples in non-awaiting-drop segment files. Reviewed-by: Hansong Fu <[email protected]>

Calls to FaultInjector_InjectFaultIfSet that provide tableName as an argument currently break early if the tableName does not match the hash entry (set by gp_inject_fault) for the fault. This is problematic because the fault must be set with a specific table name to be triggered at all. This commit modifies the behavior to only check for matching tableName if the hash entry has a tableName set. This allows for more targeted testing. Example: Setting fault without a tableName will trigger for any table: ``` postgres=# SELECT gp_inject_fault('AppendOnlyStorageRead_ReadNextBlock_success', 'skip', '', '', '', 1, 100, 0, dbid) FROM gp_segment_configuration WHERE content = 1 AND role = 'p'; gp_inject_fault ----------------- Success: (1 row) postgres=# copy alter_attach_t1 to '/dev/null'; COPY 100000 postgres=# copy alter_attach_t2 to '/dev/null'; COPY 100000 postgres=# SELECT gp_inject_fault('AppendOnlyStorageRead_ReadNextBlock_success', 'status', dbid) FROM gp_segment_configuration WHERE content = 1 AND role = 'p'; gp_inject_fault ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Success: fault name:'AppendOnlyStorageRead_ReadNextBlock_success' fault type:'skip' ddl statement:'' database name:'' table name:'' start occurrence:'1' end occurrence:'100' extra arg:'0' fault injection state:'completed' num times hit:'100' + ``` Setting fault with a given tableName will trigger only for that table. ``` postgres=# SELECT gp_inject_fault('AppendOnlyStorageRead_ReadNextBlock_success', 'skip', '', '', 'alter_attach_t1', 1, 100, 0, dbid) FROM gp_segment_configuration WHERE content = 1 AND role = 'p'; gp_inject_fault ----------------- Success: (1 row) postgres=# copy alter_attach_t1 to '/dev/null'; COPY 100000 postgres=# copy alter_attach_t2 to '/dev/null'; COPY 100000 postgres=# SELECT gp_inject_fault('AppendOnlyStorageRead_ReadNextBlock_success', 'status', dbid) FROM gp_segment_configuration WHERE content = 1 AND role = 'p'; gp_inject_fault ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Success: fault name:'AppendOnlyStorageRead_ReadNextBlock_success' fault type:'skip' ddl statement:'' database name:'' table name:'alter_attach_t1' start occurrence:'1' end occurrence:'100' extra arg:'0' fault injection state:'triggered' num times hit:'51' + ``` Authored-by: Brent Doil <[email protected]>

Change vacuum row progress expected output Fix vacuum_progress_column exp output Also some little refactoring done around src/backend/catalog/pg_attribute_encoding.c

ostinru · 2024-10-15T13:25:00Z

hm... it seems that make is broken (for me) after this patch...

 > [cloudberry 8/8] RUN locale-gen en_US.utf8 \
 && ./configure --with-perl --with-python --with-libxml --with-gssapi --prefix=/usr/local/gpdb_src > /dev/null \
 && make -j8 > /dev/null \
 && make -j8 install > /dev/null

0.379 Generating locales (this might take a while)...
0.381   en_US.UTF-8... done
1.244 Generation complete.
29.97 appendonlyam.c: In function 'appendonly_insert':
29.98 appendonlyam.c:3052:9: error: unused variable 'firstSequence' [-Werror=unused-variable]
29.98  3052 |   int64 firstSequence = GetFastSequences(aoInsertDesc->segrelid,
29.98       |         ^~~~~~~~~~~~~
30.38 cc1: all warnings being treated as errors
30.38 make[4]: *** [<builtin>: appendonlyam.o] Error 1
30.38 make[4]: *** Waiting for unfinished jobs....
30.64 make[3]: *** [../../../src/backend/common.mk:39: appendonly-recursive] Error 2
30.64 make[3]: *** Waiting for unfinished jobs....
31.84 make[2]: *** [common.mk:39: access-recursive] Error 2
31.84 make[2]: *** Waiting for unfinished jobs....
34.40 make[1]: *** [Makefile:44: all-backend-recurse] Error 2
34.40 make: *** [GNUmakefile:11: all-src-recurse] Error 2

reshke · 2024-10-15T13:50:20Z

hm... it seems that make is broken (for me) after this patch...

 > [cloudberry 8/8] RUN locale-gen en_US.utf8 \
 && ./configure --with-perl --with-python --with-libxml --with-gssapi --prefix=/usr/local/gpdb_src > /dev/null \
 && make -j8 > /dev/null \
 && make -j8 install > /dev/null

0.379 Generating locales (this might take a while)...
0.381   en_US.UTF-8... done
1.244 Generation complete.
29.97 appendonlyam.c: In function 'appendonly_insert':
29.98 appendonlyam.c:3052:9: error: unused variable 'firstSequence' [-Werror=unused-variable]
29.98  3052 |   int64 firstSequence = GetFastSequences(aoInsertDesc->segrelid,
29.98       |         ^~~~~~~~~~~~~
30.38 cc1: all warnings being treated as errors
30.38 make[4]: *** [<builtin>: appendonlyam.o] Error 1
30.38 make[4]: *** Waiting for unfinished jobs....
30.64 make[3]: *** [../../../src/backend/common.mk:39: appendonly-recursive] Error 2
30.64 make[3]: *** Waiting for unfinished jobs....
31.84 make[2]: *** [common.mk:39: access-recursive] Error 2
31.84 make[2]: *** Waiting for unfinished jobs....
34.40 make[1]: *** [Makefile:44: all-backend-recurse] Error 2
34.40 make: *** [GNUmakefile:11: all-src-recurse] Error 2

Thanks for your feedback! will take a look soon

reshke · 2024-10-16T05:44:11Z

hm... it seems that make is broken (for me) after this patch...

 > [cloudberry 8/8] RUN locale-gen en_US.utf8 \
 && ./configure --with-perl --with-python --with-libxml --with-gssapi --prefix=/usr/local/gpdb_src > /dev/null \
 && make -j8 > /dev/null \
 && make -j8 install > /dev/null

0.379 Generating locales (this might take a while)...
0.381   en_US.UTF-8... done
1.244 Generation complete.
29.97 appendonlyam.c: In function 'appendonly_insert':
29.98 appendonlyam.c:3052:9: error: unused variable 'firstSequence' [-Werror=unused-variable]
29.98  3052 |   int64 firstSequence = GetFastSequences(aoInsertDesc->segrelid,
29.98       |         ^~~~~~~~~~~~~
30.38 cc1: all warnings being treated as errors
30.38 make[4]: *** [<builtin>: appendonlyam.o] Error 1
30.38 make[4]: *** Waiting for unfinished jobs....
30.64 make[3]: *** [../../../src/backend/common.mk:39: appendonly-recursive] Error 2
30.64 make[3]: *** Waiting for unfinished jobs....
31.84 make[2]: *** [common.mk:39: access-recursive] Error 2
31.84 make[2]: *** Waiting for unfinished jobs....
34.40 make[1]: *** [Makefile:44: all-backend-recurse] Error 2
34.40 make: *** [GNUmakefile:11: all-src-recurse] Error 2

Does this PR make a fix for you #674 ?

ostinru · 2024-10-17T19:20:18Z

thanks! Build & test from main branch works fine now.

reshke force-pushed the try2 branch 6 times, most recently from 20258ff to c983c19 Compare September 13, 2024 20:29

reshke force-pushed the try2 branch from 964a6f9 to d522e5e Compare September 17, 2024 09:22

reshke force-pushed the try2 branch 7 times, most recently from 2791299 to 35470b1 Compare September 18, 2024 05:43

reshke requested a review from lss602726449 September 18, 2024 08:17

reshke force-pushed the try2 branch from 35470b1 to 577099a Compare September 28, 2024 11:58

lss602726449 approved these changes Sep 30, 2024

View reviewed changes

src/backend/access/aocs/aocsam.c Outdated Show resolved Hide resolved

src/backend/catalog/pg_attribute_encoding.c Outdated Show resolved Hide resolved

reshke force-pushed the try2 branch from 577099a to 96e3f87 Compare September 30, 2024 05:23

Haolin Wang and others added 7 commits September 30, 2024 08:00

Remove visibility check code in aocs compaction (#13886)

8e6c46f

Since we have already checked the visibility during scan tuples in aocs compaction, the visibility check code in aocs compaction is unnecessary and can be removed. Co-authored-by: linxu.hlx <[email protected]>

Haolin Wang and others added 6 commits September 30, 2024 08:00

Remove unnecessary GetAppendOnlyEntryAuxOids() call from appendonly_i…

68e2c54

…nsert() to reduce unnecessary catelog scans as the required segrelid was already obtained from appendonly_insert_init().

Update index stats test to be in sync with 3d351d9

a6aebde

Change vacuum row progress expected output Fix vacuum_progress_column exp output Also some little refactoring done around src/backend/catalog/pg_attribute_encoding.c

reshke force-pushed the try2 branch from 96e3f87 to a6aebde Compare September 30, 2024 08:00

my-ship-it approved these changes Oct 11, 2024

View reviewed changes

my-ship-it merged commit 734a8a1 into apache:main Oct 11, 2024
11 of 12 checks passed

ostinru mentioned this pull request Oct 15, 2024

Cloudberry DB initial support wal-g/wal-g#1796

Merged

tuhaihe added the cherry-pick cherry-pick upstream commts label Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry-pick VACUUM stats & Vacuum progress related commits. #611

Cherry-pick VACUUM stats & Vacuum progress related commits. #611

reshke commented Aug 30, 2024

CLAassistant commented Aug 30, 2024 •

edited

Loading

reshke commented Sep 17, 2024 •

edited

Loading

lss602726449 commented Sep 27, 2024 •

edited

Loading

reshke commented Sep 28, 2024

reshke commented Sep 30, 2024

ostinru commented Oct 15, 2024 •

edited

Loading

reshke commented Oct 15, 2024

reshke commented Oct 16, 2024

ostinru commented Oct 17, 2024

Cherry-pick VACUUM stats & Vacuum progress related commits. #611

Cherry-pick VACUUM stats & Vacuum progress related commits. #611

Conversation

reshke commented Aug 30, 2024

CLAassistant commented Aug 30, 2024 • edited Loading

reshke commented Sep 17, 2024 • edited Loading

lss602726449 commented Sep 27, 2024 • edited Loading

reshke commented Sep 28, 2024

reshke commented Sep 30, 2024

ostinru commented Oct 15, 2024 • edited Loading

reshke commented Oct 15, 2024

reshke commented Oct 16, 2024

ostinru commented Oct 17, 2024

CLAassistant commented Aug 30, 2024 •

edited

Loading

reshke commented Sep 17, 2024 •

edited

Loading

lss602726449 commented Sep 27, 2024 •

edited

Loading

ostinru commented Oct 15, 2024 •

edited

Loading