-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Store Gateway] Token bucket limiter #6016
Conversation
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
b2d8e1b
to
3aa5375
Compare
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
3aa5375
to
bd8b9e7
Compare
Signed-off-by: Justin Jung <[email protected]>
e65bedb
to
f66aa50
Compare
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
@alanprot could you take a look as well when you have a chance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good overall. Just one nit about the doc.
Regarding the default factor, I am still not 100% sure about the ratio 5:25:1 but I am sure we can learn more by running it and tune in the future.
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
@@ -73,6 +73,11 @@ type BucketStores struct { | |||
storesErrorsMu sync.RWMutex | |||
storesErrors map[string]error | |||
|
|||
instanceTokenBucket *util.TokenBucket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The bucket_stores.go can have very minimal changes if all the code can be moved to the limiter.go with another layer of abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Justin. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Justin Jung <[email protected]>
What this PR does:
This PR introduces token bucket limiter to enhance store gateway throttling. The limiter contains three different token buckets:
The token bucket limiter is passed to Thanos store, and asks for tokens as the store gateway fetches or touches data bytes. The limit allows or rejects the ongoing requests based on the following conditions:
You can also specify weights for each data type (postings, series, chunks fetched/touched) to allow certain data types to ask for more tokens from the token bucket. Default weights have been set based on different tests we have ran to find their correlation with CPU usage. As a result, the amount of token retrieved by Thanos store operation is
data_bytes * data_type_token_factor
.The token bucket limiter is disabled by default, and it can also be enabled with dry-run mode. Dry-run mode creates the token buckets, and logs when there is not enough tokens, but never reject requests.
New configs (the token factors are hidden from the config doc):
How did I come up with the default values? I've ran test in dev environment and observed how different resources use CPU.
I wasn't able to use cache to eliminate fetched bytes for postings and chunks, but judging from the series fetched they may have very minimal impact to CPU or memory usage.
Which issue(s) this PR fixes:
n/a
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]