-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New config options for dogstatsd generation #666
Conversation
Regression Detector ResultsRun ID: 5e74c2f2-f31d-4bb7-bbcb-a8dca1051974 ExplanationA regression test is an integrated performance test for Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval. We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:
The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
Regression Detector ResultsRun ID: aed7af58-c764-4300-9647-d3fe3509d404 ExplanationA regression test is an integrated performance test for Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval. We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:
The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
Regression Detector ResultsRun ID: d94c2cfb-44f1-4ee7-8db2-7bf9a35d59f4 ExplanationA regression test is an integrated performance test for Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval. We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:
The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
Regression Detector ResultsRun ID: 8bb0e702-8112-4c2f-8136-dd966860ed02 ExplanationA regression test is an integrated performance test for Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval. We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:
The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
After fixing the defaults to be sane values, for 1_000_000 generations, it currently takes:
which gives ~1.6 microseconds per generation. Much better than 30ms 😅 |
Regression Detector ResultsRun ID: 21360fc5-55d6-4122-8a0c-83e7051ba5ab ExplanationA regression test is an integrated performance test for Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval. We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:
The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
This commit is inspired by the work done in #666. We would have a much easier time writing benchmarks for payload generation _if_ the payloads were isolated to their own crate. Note that in #666 I introduced a `lading_rev` binary to work around this somewhat but the result is not totally satisfactory. REF SMP-664 Signed-off-by: Brian L. Troutwine <[email protected]>
This commit is inspired by the work done in #666. We would have a much easier time writing benchmarks for payload generation _if_ the payloads were isolated to their own crate. Note that in #666 I introduced a `lading_rev` binary to work around this somewhat but the result is not totally satisfactory. REF SMP-664 Signed-off-by: Brian L. Troutwine <[email protected]>
In #671 we moved the payload module into a new crate in order to make it easier to add benchmarks for payload generation. This is now done, we have benchmarks for dogstatsd. Of note, we benchmark setup time -- ~30ms -- and throughput, the later shows the real influence of our longish setup time: the more bytes we generate the better the throughput. It does strike me that the `Serialize` trait will not be suitable for streaming, REF #666. Some payloads like dogstatsd can stream per-message but others require serialization into a container. How to deal with the container notion in a stream, I'm not sure. Something to think about. REF SMP-673 Signed-off-by: Brian L. Troutwine <[email protected]>
* Introduce benchmarks for `lading_payload` In #671 we moved the payload module into a new crate in order to make it easier to add benchmarks for payload generation. This is now done, we have benchmarks for dogstatsd. Of note, we benchmark setup time -- ~30ms -- and throughput, the later shows the real influence of our longish setup time: the more bytes we generate the better the throughput. It does strike me that the `Serialize` trait will not be suitable for streaming, REF #666. Some payloads like dogstatsd can stream per-message but others require serialization into a container. How to deal with the container notion in a stream, I'm not sure. Something to think about. REF SMP-673 Signed-off-by: Brian L. Troutwine <[email protected]> * clippy ding Signed-off-by: Brian L. Troutwine <[email protected]> --------- Signed-off-by: Brian L. Troutwine <[email protected]>
c22d4bf
to
d1a2b27
Compare
This commit satisfies clippy in a handful of areas but most importantly adds a `lading_rev` binary to the project. This binary will only be compiled when the `dogstatsd_perf` feature flag is added to the build and is meant to be used for optimizing the member generator. Of note, we've started to discuss streaming built `Member` instances direclty to a generator without going through a block cache. To this point we've assumed that the member generation only needs to be "fast enough", hence all the cloning and small string allocations and the like. The generator is _slow_ as a result. Consider that if you compile the project now with ``` > cargo build --release --features dogstatsd_perf --bin lading_rev ... > hyperfine Benchmark 1: ./target/release/lading_rev Time (mean ± σ): 32.688 s ± 3.813 s [User: 28.507 s, System: 2.650 s] Range (min … max): 30.190 s … 40.988 s 10 runs ``` Call it roughly 30ms per `Member` instance, an eternity. I'm working on a Mac so running ``` > cargo instruments --release --features dogstatsd_perf --bin lading_rev -t time ``` does appear to show that we spend _a lot_ of time making and cloning small strings, which is accurate to my understanding of the code as it exists today. `AsciiString` is a particular culprit. 67% of program runtime is in `impl payload::Generator<Tagsets>` as of this commit. Signed-off-by: Brian L. Troutwine <[email protected]>
This commit removes the use of `choose_multiple` in the generator `AsciiString`. Profiling shows that in a run of `lading_rev` this generator is responsible for ~70% of program runtime. In _that_ ~60% of runtime is `choose_multiple`. While convenient, it does turn out that the function implicitly allocates a little `Vec` which we do not really need here, since we're immediately pushing into storage otherwise. ``` ➜ lading git:(sopell/dogstatsd-generator-revamp) ✗ hyperfine --warmup 3 ./target/release/lading_rev Benchmark 1: ./target/release/lading_rev Time (mean ± σ): 20.902 s ± 0.167 s [User: 18.438 s, System: 2.405 s] Range (min … max): 20.727 s … 21.279 s 10 runs ``` Shaves ~10 seconds off. Signed-off-by: Brian L. Troutwine <[email protected]>
Signed-off-by: Brian L. Troutwine <[email protected]>
Signed-off-by: Brian L. Troutwine <[email protected]>
d1a2b27
to
3178658
Compare
Very cool. |
Regression Detector ResultsRun ID: 3cbed3d8-3790-4775-9690-75c5241ba129 ExplanationA regression test is an integrated performance test for Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval. We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:
The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
Regression Detector ResultsRun ID: 69fa6e2a-f1de-4578-8d96-6f2036013de3 ExplanationA regression test is an integrated performance test for Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval. We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:
The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
Regression Detector ResultsRun ID: b7dc9631-e616-4eab-82e8-97e34f8be45e ExplanationA regression test is an integrated performance test for Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval. We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:
The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
Regression Detector ResultsRun ID: 430c0e5c-fcab-4e1e-8da5-b56504e68ef1 ExplanationA regression test is an integrated performance test for Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval. We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:
The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
This commit is born of the work on #666. We realize in that PR that we spend most of our runtime producing small heap allocated strings. This PR introduces `Pool` which allows the user to request a small `&str` of a given size. It is hooked up to only a single payload -- Ascii -- but will be introduced elsewhere in follow-up work. REF SMP-664 Signed-off-by: Brian L. Troutwine <[email protected]>
Regression Detector ResultsRun ID: 317db8f4-a196-4111-a625-cc7e3abcba0e ExplanationA regression test is an integrated performance test for Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval. We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:
The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
63fc5c7
to
9d84337
Compare
Regression Detector ResultsRun ID: 9d70f52c-274c-4c8f-b8ec-e7ff12b52b20 ExplanationA regression test is an integrated performance test for Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval. We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:
The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
For criterion benchmarks, you can run them via `cargo bench`. | ||
[`cargo-criterion`](https://github.com/bheisler/cargo-criterion) | ||
is a more advanced cargo extension that provides | ||
historical (ie baseline) tracking functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this should just be cargo criterion
. I'll address this in a follow-up.
tag_value_length_maximum, | ||
tags_per_msg_minimum, | ||
tags_per_msg_maximum, | ||
// TODO -- how can I validate user input for multivalue_pack_probability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flagging for follow-up.
/// A program whose sole purpose is to generate DogStatsD load and then | ||
/// exit. | ||
/// | ||
/// I'm not sure how this will interact with the release process. We probably | ||
/// should disable this binary at some point. | ||
use lading_payload::dogstatsd; | ||
use rand::{rngs::SmallRng, SeedableRng}; | ||
|
||
use std::io::{BufWriter, Write}; | ||
|
||
fn main() { | ||
let seed: u64 = 19690616; | ||
let mut rng = SmallRng::seed_from_u64(seed); | ||
let dg = dogstatsd::DogStatsD::default(&mut rng); | ||
|
||
// NOTE we lock stdout and wrap it in a buffered writer to avoid needing to | ||
// reacquire the stdio lock each write and, also, to elide as many writes as | ||
// possible. | ||
let stdout = std::io::stdout(); | ||
let mut fp = BufWriter::with_capacity(1_000_000, stdout.lock()); | ||
for _ in 0..1_000 { | ||
let member = dg.generate(&mut rng); | ||
writeln!(fp, "{member}").unwrap(); | ||
} | ||
fp.flush().unwrap(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that we have criterion now I think we can drop this in a follow-up PR.
This commit is born of the work on #666. We realize in that PR that we spend most of our runtime producing small heap allocated strings. This PR introduces `Pool` which allows the user to request a small `&str` of a given size. It is hooked up to only a single payload -- Ascii -- but will be introduced elsewhere in follow-up work. REF SMP-664 Signed-off-by: Brian L. Troutwine <[email protected]>
This commit is born of the work on #666. We realize in that PR that we spend most of our runtime producing small heap allocated strings. This PR introduces `Pool` which allows the user to request a small `&str` of a given size. It is hooked up to only a single payload -- Ascii -- but will be introduced elsewhere in follow-up work. REF SMP-664 Signed-off-by: Brian L. Troutwine <[email protected]>
What does this PR do?
DogStatsD traffic can now be generated with direct references to "contexts" which is a key concept in a dogstatsd server. Each timeseries that is being transmitted as part of the dogstatsd traffic needs a unique identifier to be able to aggregate the data (which is the whole point of a dogstatsd server).
This is referred to as a "context" which is uniquely identified by the metric name + tags (+metric type, this is kind of implicit).
In the previous generation config, there was no way to control how many "contexts" would get created which made it challenging to reason about the resulting behavior in the dogstatsd server.
The way multi-value dogstatsd messages are generated has also been changed to work off of a probability that a given msg will have multiple values packed into it. This more closely simulates what multi-value is which is essentially a compression algorithm (if N metrics are all for the same context, there is no need to duplicate the metric name and tags).
Motivation
More control over dogstatsd generation
Related issues
A list of issues either fixed, containing architectural discussions, otherwise relevant
for this Pull Request.
Additional Notes