Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommend DistSender concurrency limit bump #19096

Merged
merged 12 commits into from
Nov 20, 2024

Conversation

rmloveland
Copy link
Contributor

Fixes DOC-11652

Summary of changes:

  • Update 'Performance Recipes' page to note that if you encounter DistSender batch throttling (distsender.batches.async.throttled is > 0), consider increasing the value of the kv.dist_sender.concurrency_limit cluster setting.

Fixes DOC-11652

Summary of changes:

- Update 'Performance Recipes' page to note that if you encounter
  DistSender batch throttling (`distsender.batches.async.throttled` is >
  0), consider increasing the value of the
  `kv.dist_sender.concurrency_limit` cluster setting.
@rmloveland rmloveland marked this pull request as draft November 5, 2024 21:07
Copy link

github-actions bot commented Nov 5, 2024

Copy link

netlify bot commented Nov 5, 2024

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit 928cf2c
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-api-docs/deploys/673e41bd3fd4df00089bbe3d

Copy link

netlify bot commented Nov 5, 2024

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit 928cf2c
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-interactivetutorials-docs/deploys/673e41bd7e78e50008eeb1f7

Copy link

netlify bot commented Nov 5, 2024

Netlify Preview

Name Link
🔨 Latest commit 928cf2c
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-docs/deploys/673e41bd588390000823feff
😎 Deploy Preview https://deploy-preview-19096--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@rmloveland
Copy link
Contributor Author

Hey @sean- @nhamlani101 @mwang1026

this is a quick rough draft to address https://cockroachlabs.atlassian.net/browse/DOC-11652 which was prompted by a message from @sean-

IMO to get this into docs, we need to answer a couple questions/make some decisions (below list of things are also marked XXX in the text of the PR, once we have answers/decisions I will update it and formally request reviews n stuff)

  1. How much to increase kv.dist_sender.concurrency_limit? 6x as in pkg/kv/kvclient: update per-vCPU concurrency limits cockroach#131226 ?
  2. What should we say about testing? What is a bad outcome of increasing this too much?
  3. If we're gonna document the above cluster setting, we should flip it to public, right now it's private AKA does not appear in docs at cluster settings (v24.2) (I can file a CRDB issue, at least in the past this was like a one-line code change)

@sean-
Copy link
Contributor

sean- commented Nov 5, 2024

  1. How much to increase kv.dist_sender.concurrency_limit? 6x as in pkg/kv/kvclient: update per-vCPU concurrency limits cockroach#131226 ?

Something like:

"If distsender.batches.async.throttled is non-zero and queries are experiencing unacceptable tail latency, try doubling this value to see if the number of throttled requests decreases. If it does, increase until see diminishing returns. If this value is non-zero after 10x'ing the value, please contact support for additional help to investigate the cause of the throttled messages."

  1. What should we say about testing? What is a bad outcome of increasing this too much?

"To validate a successful result, you can increase this value until you see no new throttled requests AND no increase in tail latency (e.g. p99.999)."

This does increase the amount of RAM consumption per node to handle the increased concurrency, but it's proportional to the load and an individual flow's memory consumption isn't "significant." Bad outcomes include increased tail latency or too much memory consumption with no decrease in the number of throttled requests.

  1. If we're gonna document the above cluster setting, we should flip it to public, right now it's private AKA does not appear in docs at cluster settings (v24.2) (I can file a CRDB issue, at least in the past this was like a one-line code change)

Heh, good to know, and yes agreed about making it public.

@nhamlani101
Copy link

@sean- Is the kv.dist_sender.concurrency_limit cluster setting now public in 24.3? If so, the metric being public now needs to be backported? re:

  1. If we're gonna document the above cluster setting, we should flip it to public, right now it's private AKA does not appear in docs at cluster settings (v24.2) (I can file a CRDB issue, at least in the past this was like a one-line code change)

@sean-
Copy link
Contributor

sean- commented Nov 8, 2024

@sean- Is the kv.dist_sender.concurrency_limit cluster setting now public in 24.3? If so, the metric being public now needs to be backported? re:

It's not public. Also, we decided to not backport the change per discussion in: cockroachdb/cockroach#131535 (comment)

@rmloveland
Copy link
Contributor Author

fyi gonna work on this later today

trying to hit v24.3 release with this but it's like the last prio behind core relicensing and roadmap stuff

however i am planning to get it in, just wanted to say why i haven't replied yet

should be on it this afternoon

@rmloveland
Copy link
Contributor Author

i lied, this is probably gonna be a monday thing (11/18), still working on other scheduled work

@rmloveland rmloveland marked this pull request as ready for review November 18, 2024 20:01
@rmloveland
Copy link
Contributor Author

Is the kv.dist_sender.concurrency_limit cluster setting now public in 24.3? If so, the metric being public now needs to be backported? re:

It's not public. Also, we decided to not backport the change per discussion in: cockroachdb/cockroach#131535 (comment)

@nhamlani101 and @sean- FYI i opened an issue to at least make it public: cockroachdb/cockroach#135615

@rmloveland
Copy link
Contributor Author

"To validate a successful result, you can increase this value until you see no new throttled requests AND no increase in tail latency (e.g. p99.999)."

This does increase the amount of RAM consumption per node to handle the increased concurrency, but it's proportional to the load and an individual flow's memory consumption isn't "significant." Bad outcomes include increased tail latency or too much memory consumption with no decrease in the number of throttled requests.

@sean- I made another update to try to incorporate this info as well as update with your previous inline comment

@rmloveland rmloveland requested a review from sean- November 18, 2024 20:06
@rmloveland
Copy link
Contributor Author

@sean- (and @nhamlani101) another question i had was:

what version(s) should we make this docs change to? based on convo it feels like these are the versions:

  • v24.1
  • v24.2
  • v24.3

please advise

@sean-
Copy link
Contributor

sean- commented Nov 18, 2024

"To validate a successful result, you can increase this value until you see no new throttled requests AND no increase in tail latency (e.g. p99.999)."
This does increase the amount of RAM consumption per node to handle the increased concurrency, but it's proportional to the load and an individual flow's memory consumption isn't "significant." Bad outcomes include increased tail latency or too much memory consumption with no decrease in the number of throttled requests.

@sean- I made another update to try to incorporate this info as well as update with your previous inline comment

I'm worried we're making a mountain out of a molehill with this change and the drag-along concerns. 😆

@sean-
Copy link
Contributor

sean- commented Nov 18, 2024

@sean- (and @nhamlani101) another question i had was:

what version(s) should we make this docs change to? based on convo it feels like these are the versions:

  • v24.1
  • v24.2
  • v24.3

please advise

Just do it for v24.3.

@rmloveland rmloveland requested a review from sean- November 18, 2024 22:10
@rmloveland rmloveland requested a review from taroface November 19, 2024 15:30
Copy link
Contributor

@taroface taroface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions for clarity

src/current/v24.3/performance-recipes.md Show resolved Hide resolved
src/current/v24.3/performance-recipes.md Show resolved Hide resolved
src/current/v24.3/performance-recipes.md Outdated Show resolved Hide resolved
src/current/v24.3/performance-recipes.md Outdated Show resolved Hide resolved
src/current/v24.3/performance-recipes.md Outdated Show resolved Hide resolved
src/current/v24.3/performance-recipes.md Outdated Show resolved Hide resolved
src/current/v24.3/performance-recipes.md Outdated Show resolved Hide resolved
Copy link
Contributor

@taroface taroface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have a couple follow-up comments.

src/current/v24.3/performance-recipes.md Outdated Show resolved Hide resolved
src/current/v24.3/performance-recipes.md Outdated Show resolved Hide resolved
Copy link
Contributor

@taroface taroface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the changes are reflected in the PR yet, but LGTM assuming they will be!

@rmloveland rmloveland enabled auto-merge (squash) November 20, 2024 20:09
@rmloveland rmloveland merged commit 99f63fe into main Nov 20, 2024
7 checks passed
@rmloveland rmloveland deleted the 20241105-DOC-11652-distsender-concurrency branch November 20, 2024 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants