Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve] [broker] Make the estimated entry size more accurate #23931

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented Feb 5, 2025

Motivation

Background

  • B1: Broker assumes the entry size is 1 when no publishing after the topic was loaded up at the latest time. Otherwise, it uses the average entries size published during the lastest time-range of the topic loaded up.
    • Issue: the average entry size is not prectice for the catch-up read after a topic is reloaded.
  • B2: The feature managedLedgerMaxReadsInFlightSize assumes the entry size is 10kb when it read nothing. Otherwise, it uses the average entries size of the entries read out before.

Issue
Regarding scenario B2, it may cause an error Time-out elapsed while acquiring enough permits on the memory limiter to read from ledger 33, {tenant}/{ns}/persistent/{topic}-partition-0, estimated read size 1937 bytes for 1 entries (check managedLedgerMaxReadsInFlightSizeInMB) when there is hundreds topic being read and the exact entry size is very little.

More accurate solutions

  • P0: Use the exact size of the entries that we attempt to read.
  • P1: Use the exact size of the ledger that the entries were stored at to estimate the entries' size.
  • P2: Use the size of the topic that the entries were stored at to estimate the entries' size.

Modifications

  • Rather than use the P2 solution, change the solution to P1, and it will never return an inaccurate value such as 1 or 10k

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

@poorbarcode poorbarcode self-assigned this Feb 5, 2025
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 5, 2025
@poorbarcode poorbarcode added release/3.0.10 release/3.3.5 release/4.0.3 and removed doc-not-needed Your PR changes do not impact docs labels Feb 5, 2025
@poorbarcode poorbarcode added this to the 4.1.0 milestone Feb 5, 2025
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 5, 2025
@lhotari
Copy link
Member

lhotari commented Feb 5, 2025

  • The feature managedLedgerMaxReadsInFlightSize assumes the entry size is 10kb when it read nothing. Otherwise, it uses the average entries size of the entries read out before.

After #23901 changes this case would be very rare since it would apply only for the first read in case no entries were produced to the topic.

@poorbarcode
Copy link
Contributor Author

poorbarcode commented Feb 5, 2025

@lhotari

The feature managedLedgerMaxReadsInFlightSize assumes the entry size is 10kb when it read nothing. Otherwise, it uses the average entries size of the entries read out before.

After #23901 changes this case would be very rare since it would apply only for the first read in case no entries were produced to the topic.

Agree, but for the following case, the current solution is better

  • There are 50k entries in the ledger
    • 1 entry was read out, the average size estimated is less accurate than the current implementation.
  • The topic was sent for 30 days, and the customer changed the schema of the topic.
    • total size/ total count is less accurate than the current implementation
  • Additionally, the current solution will never get an inaccurate value such as 1 or 10k, because it always calculates the exact average size for the entries and it will read a unique entry if no entries to read.

@poorbarcode poorbarcode changed the title [draft] [improve] [broker] Make the estimated entry size more accurate [improve] [broker] Make the estimated entry size more accurate Feb 6, 2025
@poorbarcode
Copy link
Contributor Author

/pulsarbot rerun-failure-checks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants