Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] Should we introduce partition-level metrics ? #12197

Open
TheR1sing3un opened this issue Nov 4, 2024 · 4 comments
Open

[SUPPORT] Should we introduce partition-level metrics ? #12197

TheR1sing3un opened this issue Nov 4, 2024 · 4 comments
Labels
feature-enquiry issue contains feature enquiries/requests or great improvement ideas metrics

Comments

@TheR1sing3un
Copy link
Member

Can we provide partition-level metrics? In many scenarios where partitions are used, such as p_date and p_product, which are separated by time or type, the data before the partition is quite different. Can we provide a partition dimension metrics to reflect some metrics, for example, p99 latency of compaction operation for specified partition? This will help a lot when doing performance optimization.

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at [email protected].

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version :

  • Spark version :

  • Hive version :

  • Hadoop version :

  • Storage (HDFS/S3/GCS..) :

  • Running on Docker? (yes/no) :

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

@vinothchandar
Copy link
Member

Only downside is - users partitioning things too granular, leading to bombardment of metrics systems downstream..

I see how its useful though.

@danny0405
Copy link
Contributor

for example, p99 latency of compaction operation for specified partition

Is it feasible to extend the compaction metrics a little bit, maybe just represent the latecy metrics in another level: aggregated by partitions.

@TheR1sing3un
Copy link
Member Author

TheR1sing3un commented Nov 5, 2024

Only downside is - users partitioning things too granular, leading to bombardment of metrics systems downstream..

Yes, we also need to consider the case of too many partitions, I think we can provide this ability, by the actual user to consider whether to turn on.

@TheR1sing3un
Copy link
Member Author

TheR1sing3un commented Nov 5, 2024

for example, p99 latency of compaction operation for specified partition

Is it feasible to extend the compaction metrics a little bit, maybe just represent the latecy metrics in another level: aggregated by partitions.

I plan to provide a Histogram aggregated by partition. It will records compaction stats such as:

  • log records scan time cost
  • base scan and merge to log records time cost
  • records spilled to disk ratio
  • memory cost and so on...
    We can analyze the performance problems and health of the system from these metrics.

@ad1happy2go ad1happy2go added feature-enquiry issue contains feature enquiries/requests or great improvement ideas metrics labels Nov 5, 2024
@github-project-automation github-project-automation bot moved this to ⏳ Awaiting Triage in Hudi Issue Support Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-enquiry issue contains feature enquiries/requests or great improvement ideas metrics
Projects
Status: Awaiting Triage
Development

No branches or pull requests

4 participants