ADBDEV-6443: Refactor diskquota local hashmap with active tables #46

RekGRpth · 2024-12-17T04:19:04Z

Refactor diskquota local hashmap with active tables

diskquota used a local hashmap local_table_stats_map in the
gp_fetch_active_tables function. During initialization, it loaded all the
information from the table with the sizes diskquota.table_size into it using
the load_table_size function. And during normal operation, diskquota loaded
information about active tables with sizes from segments into this local
hashmap using the pull_active_list_from_seg, convert_map_to_string, and
pull_active_table_size_from_seg functions. This led to increased memory
consumption, especially during initialization, since with a large number of
active tables (and all tables were considered active during initialization),
the size of this local hashmap local_table_stats_map was quite large. This
local_table_stats_map was then used in the calculate_table_disk_usage function
to calculate the sizes of active tables, and in the dispatch_rejectmap function
to dispatch active tables to segments.

This patch completely gets rid of the local_table_stats_map local hashmap,
calculating the sizes of active tables directly when receiving results in the
load_table_size and pull_active_table_size_from_seg functions, simultaneously
filling in the active_oids text list of active tables, which is then passed to
the dispatch_rejectmap function. The new get_table_size_map_entry and
update_active_table_size functions are extracted parts of the
calculate_table_disk_usage function code related to active tables. The
invalidation logic during initialization 14b861d has been completely reverted
because it is already covered by the current changes.

It is easier to view the changes with the "Hide whitespace" option enabled.

src/diskquota_utility.c

src/gp_activetable.c

dkovalev1 · 2025-01-31T14:11:25Z

The original problem you resolving is high memory usage for hash table, right?
In this commit you replace hash map which lifetime is long with the string with serialized oids, which exists only at the initialization time, is it correct?
Also this string still have to keep all oids at once resulting in probably huge query to pass to diskquota.refresh_rejectmap UDF. So it does not resolve the problem completely, but making it easier.
Have you considered to have oids in chunks instead of keeping them in one string?
Have you considered to pass oids in binary format, it could save even more bytes and CPU cycles on parsing ?

Also, this huge array (string) is used to be passed to UDF diskquota.refresh_rejectmap to be executed on segments to update rejectmap in shared memory. This string, active_oids, loaded from the table diskquota.table_size which is distributed (or could be). Have you considered to perform local initialization on a segment to avoid passing all oids to the QD and back to segments at all?

RekGRpth · 2025-01-31T14:59:06Z

The original problem you resolving is high memory usage for hash table, right?

Yes

In this commit you replace hash map which lifetime is long with the string with serialized oids, which exists only at the initialization time, is it correct?

No! This line of serialized oids was there before the patch, including during initialization!

Have you considered to have oids in chunks instead of keeping them in one string?
Have you considered to pass oids in binary format, it could save even more bytes and CPU cycles on parsing ?

Unfortunately, the CdbDispatchCommand function does not support passing parameters so that the oids could be passed not as a serialized string, but as a binary array of oids, as I did earlier in the delete_from_table_size_map and update_table_size_map functions.

Also, this huge array (string) is used to be passed to UDF diskquota.refresh_rejectmap to be executed on segments to update rejectmap in shared memory. This string, active_oids, loaded from the table diskquota.table_size which is distributed (or could be). Have you considered to perform local initialization on a segment to avoid passing all oids to the QD and back to segments at all?

This is beyond the scope of this patch. The fact that the diskquota.table_size table is distributed across segments does not mean that the required sizes are on the required segment!

dkovalev1 · 2025-01-31T15:06:38Z

In this commit you replace hash map which lifetime is long with the string with serialized oids, which exists only at the initialization time, is it correct?

No! This line of serialized oids was there before the patch, including during initialization!

Yes it was, so this patch does not resolve the problem completely.

Have you considered to have oids in chunks instead of keeping them in one string?

?

Also, this huge array (string) is used to be passed to UDF diskquota.refresh_rejectmap to be executed on segments to update rejectmap in shared memory. This string, active_oids, loaded from the table diskquota.table_size which is distributed (or could be). Have you considered to perform local initialization on a segment to avoid passing all oids to the QD and back to segments at all?

This is beyond the scope of this patch. The fact that the diskquota.table_size table is distributed across segments does not mean that the required sizes are on the required segment!

Perhaps yes, but elaborating this approach can bring even more improvement and resolve the problem completely.

src/quotamodel.c

RekGRpth · 2025-02-03T17:12:15Z

this patch does not resolve the problem completely

This patch addresses a specific issue of increased memory consumption by the local hashmap during initialization.

RekGRpth · 2025-02-03T17:15:58Z

elaborating this approach can bring even more improvement and resolve the problem completely.

What problem are you talking about? This patch solves only one specific problem and nothing else. There are many other problems in diskquota, but this patch does not solve them. Solving all the problems with one patch is not a good idea, such a patch will be very difficult to review.

RekGRpth · 2025-02-03T17:18:00Z

Have you considered to have oids in chunks instead of keeping them in one string?

?

Yes, I think I already wrote that the current implementation of direct dispatching allows dispatching only string commands without parameters.

RekGRpth added 30 commits December 17, 2024 09:18

test

b50db10

optimize

38ada96

restore

e8da52a

fix

0d84a81

context

7fbdf51

format

4347e64

simplify

5c7ae33

optimize

2a87172

rename

db98950

rename

261d57c

optimize

9a74ae5

optimize

87e67e1

rm

4b901ed

revert

c451cb3

revert

ea534b0

optimize

551f45b

optimize

7827ab7

rename

621cbe0

optimize

de7bd9c

simplify

ae3edc1

optimize

5031028

optimize

c0346cb

rm

7f1f0da

rm

29bff98

optimize

3fa3cd3

optimize

68af62b

rename

d1cee79

rename

3d491ab

optimize

6be167d

auto

884b554

RekGRpth added 10 commits January 13, 2025 12:57

remove extra declarations

84c41b9

rename and move declaration

56ec176

comment and revert

9fcf837

rename

5a7dcc8

comment

2f7a2b7

comment

d76f2d6

remove assert

9ef5a86

comment

4b5176c

rename and comment

d4d56b8

rm

8558ddf

bimboterminator1 previously approved these changes Jan 16, 2025

View reviewed changes

dkovalev1 reviewed Jan 28, 2025

View reviewed changes

src/diskquota_utility.c Show resolved Hide resolved

dkovalev1 reviewed Jan 31, 2025

View reviewed changes

src/gp_activetable.c Outdated Show resolved Hide resolved

dkovalev1 reviewed Jan 31, 2025

View reviewed changes

src/gp_activetable.c Outdated Show resolved Hide resolved

revert and move

6b3b153

RekGRpth dismissed bimboterminator1’s stale review via 6b3b153 January 31, 2025 13:21

dkovalev1 reviewed Feb 3, 2025

View reviewed changes

src/quotamodel.c Show resolved Hide resolved

dkovalev1 reviewed Feb 3, 2025

View reviewed changes

src/quotamodel.c Show resolved Hide resolved

dkovalev1 reviewed Feb 3, 2025

View reviewed changes

src/quotamodel.c Outdated Show resolved Hide resolved

rename and comments

88dc181

dkovalev1 approved these changes Feb 3, 2025

View reviewed changes

bimboterminator1 approved these changes Feb 3, 2025

View reviewed changes

RekGRpth merged commit 0b6d959 into gpdb Feb 4, 2025
2 checks passed

RekGRpth deleted the ADBDEV-6443-3 branch February 4, 2025 03:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADBDEV-6443: Refactor diskquota local hashmap with active tables #46

ADBDEV-6443: Refactor diskquota local hashmap with active tables #46

RekGRpth commented Dec 17, 2024 •

edited

Loading

dkovalev1 commented Jan 31, 2025

RekGRpth commented Jan 31, 2025 •

edited

Loading

dkovalev1 commented Jan 31, 2025 •

edited

Loading

RekGRpth commented Feb 3, 2025

RekGRpth commented Feb 3, 2025

RekGRpth commented Feb 3, 2025

ADBDEV-6443: Refactor diskquota local hashmap with active tables #46

ADBDEV-6443: Refactor diskquota local hashmap with active tables #46

Conversation

RekGRpth commented Dec 17, 2024 • edited Loading

dkovalev1 commented Jan 31, 2025

RekGRpth commented Jan 31, 2025 • edited Loading

dkovalev1 commented Jan 31, 2025 • edited Loading

RekGRpth commented Feb 3, 2025

RekGRpth commented Feb 3, 2025

RekGRpth commented Feb 3, 2025

RekGRpth commented Dec 17, 2024 •

edited

Loading

RekGRpth commented Jan 31, 2025 •

edited

Loading

dkovalev1 commented Jan 31, 2025 •

edited

Loading