[SUPPORT] Should we introduce partition-level metrics ? #12197

TheR1sing3un · 2024-11-04T12:36:01Z

Can we provide partition-level metrics? In many scenarios where partitions are used, such as p_date and p_product, which are separated by time or type, the data before the partition is quite different. Can we provide a partition dimension metrics to reflect some metrics, for example, p99 latency of compaction operation for specified partition? This will help a lot when doing performance optimization.

Tips before filing an issue

Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at [email protected].
If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Hudi version :
Spark version :
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) :
Running on Docker? (yes/no) :

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

The text was updated successfully, but these errors were encountered:

vinothchandar · 2024-11-04T21:40:52Z

Only downside is - users partitioning things too granular, leading to bombardment of metrics systems downstream..

I see how its useful though.

danny0405 · 2024-11-05T00:50:49Z

for example, p99 latency of compaction operation for specified partition

Is it feasible to extend the compaction metrics a little bit, maybe just represent the latecy metrics in another level: aggregated by partitions.

TheR1sing3un · 2024-11-05T02:36:47Z

Only downside is - users partitioning things too granular, leading to bombardment of metrics systems downstream..

Yes, we also need to consider the case of too many partitions, I think we can provide this ability, by the actual user to consider whether to turn on.

TheR1sing3un · 2024-11-05T02:39:02Z

for example, p99 latency of compaction operation for specified partition

Is it feasible to extend the compaction metrics a little bit, maybe just represent the latecy metrics in another level: aggregated by partitions.

I plan to provide a Histogram aggregated by partition. It will records compaction stats such as:

log records scan time cost
base scan and merge to log records time cost
records spilled to disk ratio
memory cost and so on...
We can analyze the performance problems and health of the system from these metrics.

TheR1sing3un mentioned this issue Nov 5, 2024

[HUDI-8482] Introduce partition-level metrics and add more compaction metrics #12208

Open

4 tasks

ad1happy2go added feature-enquiry issue contains feature enquiries/requests or great improvement ideas metrics labels Nov 5, 2024

ad1happy2go added this to Hudi Issue Support Nov 5, 2024

github-project-automation bot moved this to ⏳ Awaiting Triage in Hudi Issue Support Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] Should we introduce partition-level metrics ? #12197

[SUPPORT] Should we introduce partition-level metrics ? #12197

TheR1sing3un commented Nov 4, 2024

vinothchandar commented Nov 4, 2024

danny0405 commented Nov 5, 2024

TheR1sing3un commented Nov 5, 2024 •

edited

Loading

TheR1sing3un commented Nov 5, 2024 •

edited

Loading

[SUPPORT] Should we introduce partition-level metrics ? #12197

[SUPPORT] Should we introduce partition-level metrics ? #12197

Comments

TheR1sing3un commented Nov 4, 2024

vinothchandar commented Nov 4, 2024

danny0405 commented Nov 5, 2024

TheR1sing3un commented Nov 5, 2024 • edited Loading

TheR1sing3un commented Nov 5, 2024 • edited Loading

TheR1sing3un commented Nov 5, 2024 •

edited

Loading

TheR1sing3un commented Nov 5, 2024 •

edited

Loading