Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Store Gateway] Token bucket limiter #6016

Merged
merged 28 commits into from
Jul 24, 2024

Conversation

justinjung04
Copy link
Contributor

@justinjung04 justinjung04 commented Jun 14, 2024

What this PR does:

This PR introduces token bucket limiter to enhance store gateway throttling. The limiter contains three different token buckets:

  • request token bucket
  • user token bucket (shared by multiple requests)
  • pod token bucket (shared by multiple users, multiple requests)

The token bucket limiter is passed to Thanos store, and asks for tokens as the store gateway fetches or touches data bytes. The limit allows or rejects the ongoing requests based on the following conditions:

  1. If the ongoing request does not exceed request token bucket, it's allowed
  2. If the ongoing request exceeds request token bucket, it checks if user and pod token buckets have enough tokens. If they have enough tokens, allow. If they don't, reject

You can also specify weights for each data type (postings, series, chunks fetched/touched) to allow certain data types to ask for more tokens from the token bucket. Default weights have been set based on different tests we have ran to find their correlation with CPU usage. As a result, the amount of token retrieved by Thanos store operation is data_bytes * data_type_token_factor.

The token bucket limiter is disabled by default, and it can also be enabled with dry-run mode. Dry-run mode creates the token buckets, and logs when there is not enough tokens, but never reject requests.

New configs (the token factors are hidden from the config doc):

bucket_store:
    token_bucket_bytes_limiter:
      # Token bucket bytes limiter mode. Supported values are: disabled, dryrun,
      # enabled
      # CLI flag: -blocks-storage.bucket-store.token-bucket-bytes-limiter.mode
      [mode: <string> | default = "disabled"]

      # Instance token bucket size
      # CLI flag: -blocks-storage.bucket-store.token-bucket-bytes-limiter.instance-token-bucket-size
      [instance_token_bucket_size: <int> | default = 859832320]

      # User token bucket size
      # CLI flag: -blocks-storage.bucket-store.token-bucket-bytes-limiter.user-token-bucket-size
      [user_token_bucket_size: <int> | default = 644874240]

      # Request token bucket size
      # CLI flag: -blocks-storage.bucket-store.token-bucket-bytes-limiter.request-token-bucket-size
      [request_token_bucket_size: <int> | default = 4194304]

How did I come up with the default values? I've ran test in dev environment and observed how different resources use CPU.

  1. ~80% of CPU was used by 150MB/s of postings_touched:
Screenshot 2024-06-25 at 9 16 12 PM
  1. ~80% of CPU was used by 32MB/s of series_touched:
Screenshot 2024-06-25 at 9 15 34 AM
  1. ~80% of CPU was used by 400MB/s of chunks_touched, addition to 16MB/s of series_touched:
Screenshot 2024-06-25 at 11 03 28 PM

I wasn't able to use cache to eliminate fetched bytes for postings and chunks, but judging from the series fetched they may have very minimal impact to CPU or memory usage.

Which issue(s) this PR fixes:
n/a

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@justinjung04 justinjung04 changed the title Sg token bucket Store gateway token bucket limiter Jun 14, 2024
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
@justinjung04 justinjung04 changed the title Store gateway token bucket limiter [Store Sateway] Add token bucket limiter Jun 14, 2024
@justinjung04 justinjung04 changed the title [Store Sateway] Add token bucket limiter [Store Sateway] Token bucket limiter Jun 14, 2024
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
@justinjung04 justinjung04 marked this pull request as ready for review June 14, 2024 18:40
@justinjung04 justinjung04 changed the title [Store Sateway] Token bucket limiter [Store Gateway] Token bucket limiter Jun 14, 2024
pkg/storage/tsdb/inmemory_index_cache.go Outdated Show resolved Hide resolved
pkg/storage/tsdb/config.go Outdated Show resolved Hide resolved
pkg/storegateway/limiter.go Outdated Show resolved Hide resolved
pkg/util/token_bucket.go Outdated Show resolved Hide resolved
@justinjung04
Copy link
Contributor Author

@alanprot could you take a look as well when you have a chance?

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good overall. Just one nit about the doc.
Regarding the default factor, I am still not 100% sure about the ratio 5:25:1 but I am sure we can learn more by running it and tune in the future.

docs/configuration/config-file-reference.md Outdated Show resolved Hide resolved
pkg/util/token_bucket.go Show resolved Hide resolved
pkg/storegateway/limiter.go Outdated Show resolved Hide resolved
pkg/storegateway/limiter.go Outdated Show resolved Hide resolved
pkg/storegateway/limiter.go Outdated Show resolved Hide resolved
pkg/util/token_bucket.go Show resolved Hide resolved
pkg/util/token_bucket.go Show resolved Hide resolved
pkg/storegateway/limiter.go Show resolved Hide resolved
pkg/storegateway/limiter.go Outdated Show resolved Hide resolved
pkg/storegateway/limiter.go Outdated Show resolved Hide resolved
@@ -73,6 +73,11 @@ type BucketStores struct {
storesErrorsMu sync.RWMutex
storesErrors map[string]error

instanceTokenBucket *util.TokenBucket
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The bucket_stores.go can have very minimal changes if all the code can be moved to the limiter.go with another layer of abstraction.

Copy link
Contributor

@harry671003 harry671003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Justin. LGTM.

Copy link
Member

@alanprot alanprot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yeya24 yeya24 merged commit 5356796 into cortexproject:master Jul 24, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants