Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New config options for dogstatsd generation #666

Merged
merged 14 commits into from
Aug 15, 2023

Conversation

scottopell
Copy link
Contributor

@scottopell scottopell commented Aug 9, 2023

What does this PR do?

DogStatsD traffic can now be generated with direct references to "contexts" which is a key concept in a dogstatsd server. Each timeseries that is being transmitted as part of the dogstatsd traffic needs a unique identifier to be able to aggregate the data (which is the whole point of a dogstatsd server).
This is referred to as a "context" which is uniquely identified by the metric name + tags (+metric type, this is kind of implicit).

In the previous generation config, there was no way to control how many "contexts" would get created which made it challenging to reason about the resulting behavior in the dogstatsd server.

The way multi-value dogstatsd messages are generated has also been changed to work off of a probability that a given msg will have multiple values packed into it. This more closely simulates what multi-value is which is essentially a compression algorithm (if N metrics are all for the same context, there is no need to duplicate the metric name and tags).

Motivation

More control over dogstatsd generation

Related issues

A list of issues either fixed, containing architectural discussions, otherwise relevant
for this Pull Request.

Additional Notes

@github-actions
Copy link

Regression Detector Results

Run ID: 5e74c2f2-f31d-4bb7-bbcb-a8dca1051974
Baseline: 0c3707b
Comparison: 7f3a3bb
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

  1. The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.

  2. Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.
experiment goal Δ mean % Δ mean % CI confidence
blackhole_from_apache_common_http ingress throughput +0.04 [-0.01, +0.09] 69.12%
apache_common_http_both_directions_this_doesnt_make_sense ingress throughput -0.19 [-0.22, -0.15] 100.00%

@github-actions
Copy link

Regression Detector Results

Run ID: aed7af58-c764-4300-9647-d3fe3509d404
Baseline: 0c3707b
Comparison: bea6c11
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

  1. The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.

  2. Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.
experiment goal Δ mean % Δ mean % CI confidence
apache_common_http_both_directions_this_doesnt_make_sense ingress throughput +0.59 [+0.56, +0.62] 100.00%
blackhole_from_apache_common_http ingress throughput -0.39 [-0.44, -0.34] 100.00%

@github-actions
Copy link

Regression Detector Results

Run ID: d94c2cfb-44f1-4ee7-8db2-7bf9a35d59f4
Baseline: 0c3707b
Comparison: 46a60b3
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

  1. The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.

  2. Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.
experiment goal Δ mean % Δ mean % CI confidence
blackhole_from_apache_common_http ingress throughput +0.05 [-0.00, +0.10] 77.49%
apache_common_http_both_directions_this_doesnt_make_sense ingress throughput -0.38 [-0.41, -0.36] 100.00%

@github-actions
Copy link

Regression Detector Results

Run ID: 8bb0e702-8112-4c2f-8136-dd966860ed02
Baseline: 0c3707b
Comparison: 2005f26
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

  1. The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.

  2. Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.
experiment goal Δ mean % Δ mean % CI confidence
apache_common_http_both_directions_this_doesnt_make_sense ingress throughput +1.05 [+1.02, +1.09] 100.00%
blackhole_from_apache_common_http ingress throughput +0.28 [+0.23, +0.33] 100.00%

@scottopell
Copy link
Contributor Author

After fixing the defaults to be sane values, for 1_000_000 generations, it currently takes:

hyperfine --warmup 10 ./target/release/lading_rev
Benchmark 1: ./target/release/lading_rev
  Time (mean ± σ):      1.617 s ±  0.031 s    [User: 1.455 s, System: 0.162 s]
  Range (min … max):    1.583 s …  1.653 s    10 runs

which gives ~1.6 microseconds per generation. Much better than 30ms 😅

@github-actions
Copy link

Regression Detector Results

Run ID: 21360fc5-55d6-4122-8a0c-83e7051ba5ab
Baseline: 0c3707b
Comparison: c22d4bf
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

  1. The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.

  2. Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.
experiment goal Δ mean % Δ mean % CI confidence
blackhole_from_apache_common_http ingress throughput +0.04 [-0.01, +0.09] 68.75%
apache_common_http_both_directions_this_doesnt_make_sense ingress throughput -0.33 [-0.36, -0.29] 100.00%

blt added a commit that referenced this pull request Aug 14, 2023
This commit is inspired by the work done in #666. We would have a much easier
time writing benchmarks for payload generation _if_ the payloads were isolated
to their own crate. Note that in #666 I introduced a `lading_rev` binary to work
around this somewhat but the result is not totally satisfactory.

REF SMP-664

Signed-off-by: Brian L. Troutwine <[email protected]>
blt added a commit that referenced this pull request Aug 14, 2023
This commit is inspired by the work done in #666. We would have a much easier
time writing benchmarks for payload generation _if_ the payloads were isolated
to their own crate. Note that in #666 I introduced a `lading_rev` binary to work
around this somewhat but the result is not totally satisfactory.

REF SMP-664

Signed-off-by: Brian L. Troutwine <[email protected]>
blt added a commit that referenced this pull request Aug 14, 2023
In #671 we moved the payload module into a new crate in order to make it easier
to add benchmarks for payload generation. This is now done, we have benchmarks
for dogstatsd. Of note, we benchmark setup time -- ~30ms -- and throughput, the
later shows the real influence of our longish setup time: the more bytes we
generate the better the throughput.

It does strike me that the `Serialize` trait will not be suitable for
streaming, REF #666. Some payloads like dogstatsd can stream per-message but
others require serialization into a container. How to deal with the container
notion in a stream, I'm not sure. Something to think about.

REF SMP-673

Signed-off-by: Brian L. Troutwine <[email protected]>
blt added a commit that referenced this pull request Aug 14, 2023
* Introduce benchmarks for `lading_payload`

In #671 we moved the payload module into a new crate in order to make it easier
to add benchmarks for payload generation. This is now done, we have benchmarks
for dogstatsd. Of note, we benchmark setup time -- ~30ms -- and throughput, the
later shows the real influence of our longish setup time: the more bytes we
generate the better the throughput.

It does strike me that the `Serialize` trait will not be suitable for
streaming, REF #666. Some payloads like dogstatsd can stream per-message but
others require serialization into a container. How to deal with the container
notion in a stream, I'm not sure. Something to think about.

REF SMP-673

Signed-off-by: Brian L. Troutwine <[email protected]>

* clippy ding

Signed-off-by: Brian L. Troutwine <[email protected]>

---------

Signed-off-by: Brian L. Troutwine <[email protected]>
@blt blt force-pushed the sopell/dogstatsd-generator-revamp branch from c22d4bf to d1a2b27 Compare August 15, 2023 00:15
scottopell and others added 8 commits August 14, 2023 17:15
This commit satisfies clippy in a handful of areas but most importantly adds a
`lading_rev` binary to the project. This binary will only be compiled when the
`dogstatsd_perf` feature flag is added to the build and is meant to be used for
optimizing the member generator.

Of note, we've started to discuss streaming built `Member` instances direclty to
a generator without going through a block cache. To this point we've assumed
that the member generation only needs to be "fast enough", hence all the cloning
and small string allocations and the like. The generator is _slow_ as a
result. Consider that if you compile the project now with

```
> cargo build --release --features dogstatsd_perf --bin lading_rev
...
> hyperfine
Benchmark 1: ./target/release/lading_rev
  Time (mean ± σ):     32.688 s ±  3.813 s    [User: 28.507 s, System: 2.650 s]
  Range (min … max):   30.190 s … 40.988 s    10 runs
```

Call it roughly 30ms per `Member` instance, an eternity. I'm working on a Mac so
running

```
> cargo instruments --release --features dogstatsd_perf --bin lading_rev -t time
```

does appear to show that we spend _a lot_ of time making and cloning small
strings, which is accurate to my understanding of the code as it exists
today. `AsciiString` is a particular culprit. 67% of program runtime is in
`impl payload::Generator<Tagsets>` as of this commit.

Signed-off-by: Brian L. Troutwine <[email protected]>
This commit removes the use of `choose_multiple` in the generator
`AsciiString`. Profiling shows that in a run of `lading_rev` this generator is
responsible for ~70% of program runtime. In _that_ ~60% of runtime is
`choose_multiple`. While convenient, it does turn out that the function
implicitly allocates a little `Vec` which we do not really need here, since
we're immediately pushing into storage otherwise.

```
➜  lading git:(sopell/dogstatsd-generator-revamp) ✗ hyperfine --warmup 3 ./target/release/lading_rev
Benchmark 1: ./target/release/lading_rev
  Time (mean ± σ):     20.902 s ±  0.167 s    [User: 18.438 s, System: 2.405 s]
  Range (min … max):   20.727 s … 21.279 s    10 runs
```

Shaves ~10 seconds off.

Signed-off-by: Brian L. Troutwine <[email protected]>
Signed-off-by: Brian L. Troutwine <[email protected]>
Signed-off-by: Brian L. Troutwine <[email protected]>
@blt blt force-pushed the sopell/dogstatsd-generator-revamp branch from d1a2b27 to 3178658 Compare August 15, 2023 01:13
@blt
Copy link
Collaborator

blt commented Aug 15, 2023

dogstatsd_setup         time:   [82.884 ms 83.014 ms 83.160 ms]
                        change: [+0.4945% +0.7625% +1.0110%] (p = 0.00 < 0.05)
                        Change within noise threshold.

dogstatsd_all/1000000   time:   [84.771 ms 84.953 ms 85.146 ms]
                        thrpt:  [11.200 MiB/s 11.226 MiB/s 11.250 MiB/s]
                 change:
                        time:   [-0.1151% +0.1746% +0.4571%] (p = 0.24 > 0.05)
                        thrpt:  [-0.4551% -0.1743% +0.1152%]
                        No change in performance detected.
dogstatsd_all/10000000  time:   [104.31 ms 104.51 ms 104.72 ms]
                        thrpt:  [91.070 MiB/s 91.256 MiB/s 91.428 MiB/s]
                 change:
                        time:   [-0.2355% +0.0661% +0.3635%] (p = 0.67 > 0.05)
                        thrpt:  [-0.3622% -0.0660% +0.2360%]
                        No change in performance detected.
dogstatsd_all/100000000 time:   [306.33 ms 307.14 ms 308.06 ms]
                        thrpt:  [309.58 MiB/s 310.50 MiB/s 311.33 MiB/s]
                 change:
                        time:   [-1.3843% -0.9167% -0.4520%] (p = 0.00 < 0.05)
                        thrpt:  [+0.4541% +0.9251% +1.4038%]
                        Change within noise threshold.
Benchmarking dogstatsd_all/1000000000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 90.0s. You may wish to increase target time to 237.9s, or reduce sample count to 30.
dogstatsd_all/1000000000
                        time:   [2.3695 s 2.3739 s 2.3785 s]
                        thrpt:  [400.96 MiB/s 401.74 MiB/s 402.48 MiB/s]
                 change:
                        time:   [-3.9508% -1.7750% -0.3280%] (p = 0.04 < 0.05)
                        thrpt:  [+0.3291% +1.8070% +4.1133%]
                        Change within noise threshold.

Very cool.

@github-actions
Copy link

Regression Detector Results

Run ID: 3cbed3d8-3790-4775-9690-75c5241ba129
Baseline: 6249fd5
Comparison: 3178658
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

  1. The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.

  2. Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.
experiment goal Δ mean % Δ mean % CI confidence
blackhole_from_apache_common_http ingress throughput +0.05 [-0.00, +0.10] 78.91%
apache_common_http_both_directions_this_doesnt_make_sense ingress throughput -0.23 [-0.26, -0.21] 100.00%

@github-actions
Copy link

Regression Detector Results

Run ID: 69fa6e2a-f1de-4578-8d96-6f2036013de3
Baseline: 6249fd5
Comparison: 1fefe48
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

  1. The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.

  2. Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.
experiment goal Δ mean % Δ mean % CI confidence
blackhole_from_apache_common_http ingress throughput +0.40 [+0.35, +0.45] 100.00%
apache_common_http_both_directions_this_doesnt_make_sense ingress throughput +0.26 [+0.23, +0.29] 100.00%

@scottopell scottopell marked this pull request as ready for review August 15, 2023 19:21
@github-actions
Copy link

Regression Detector Results

Run ID: b7dc9631-e616-4eab-82e8-97e34f8be45e
Baseline: 6249fd5
Comparison: 1bcfa8f
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

  1. The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.

  2. Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.
experiment goal Δ mean % Δ mean % CI confidence
blackhole_from_apache_common_http ingress throughput +0.25 [+0.20, +0.31] 100.00%
apache_common_http_both_directions_this_doesnt_make_sense ingress throughput -0.19 [-0.22, -0.15] 100.00%

@github-actions
Copy link

Regression Detector Results

Run ID: 430c0e5c-fcab-4e1e-8da5-b56504e68ef1
Baseline: 6249fd5
Comparison: a3a9748
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

  1. The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.

  2. Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.
experiment goal Δ mean % Δ mean % CI confidence
apache_common_http_both_directions_this_doesnt_make_sense ingress throughput +0.40 [+0.37, +0.42] 100.00%
blackhole_from_apache_common_http ingress throughput +0.09 [+0.04, +0.13] 98.14%

blt added a commit that referenced this pull request Aug 15, 2023
This commit is born of the work on #666. We realize in that PR that we spend
most of our runtime producing small heap allocated strings. This PR introduces
`Pool` which allows the user to request a small `&str` of a given size. It is
hooked up to only a single payload -- Ascii -- but will be introduced elsewhere
in follow-up work.

REF SMP-664

Signed-off-by: Brian L. Troutwine <[email protected]>
@github-actions
Copy link

Regression Detector Results

Run ID: 317db8f4-a196-4111-a625-cc7e3abcba0e
Baseline: 6249fd5
Comparison: 99ec179
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

  1. The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.

  2. Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.
experiment goal Δ mean % Δ mean % CI confidence
apache_common_http_both_directions_this_doesnt_make_sense ingress throughput -0.03 [-0.06, +0.01] 71.74%
blackhole_from_apache_common_http ingress throughput -0.17 [-0.22, -0.12] 100.00%

@scottopell scottopell force-pushed the sopell/dogstatsd-generator-revamp branch from 63fc5c7 to 9d84337 Compare August 15, 2023 20:59
@github-actions
Copy link

Regression Detector Results

Run ID: 9d70f52c-274c-4c8f-b8ec-e7ff12b52b20
Baseline: 6249fd5
Comparison: 9d84337
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

  1. The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.

  2. Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.
experiment goal Δ mean % Δ mean % CI confidence
apache_common_http_both_directions_this_doesnt_make_sense ingress throughput +0.14 [+0.11, +0.16] 100.00%
blackhole_from_apache_common_http ingress throughput +0.02 [-0.03, +0.07] 39.97%

Comment on lines +15 to +18
For criterion benchmarks, you can run them via `cargo bench`.
[`cargo-criterion`](https://github.com/bheisler/cargo-criterion)
is a more advanced cargo extension that provides
historical (ie baseline) tracking functionality.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this should just be cargo criterion. I'll address this in a follow-up.

tag_value_length_maximum,
tags_per_msg_minimum,
tags_per_msg_maximum,
// TODO -- how can I validate user input for multivalue_pack_probability
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging for follow-up.

Comment on lines +1 to +26
/// A program whose sole purpose is to generate DogStatsD load and then
/// exit.
///
/// I'm not sure how this will interact with the release process. We probably
/// should disable this binary at some point.
use lading_payload::dogstatsd;
use rand::{rngs::SmallRng, SeedableRng};

use std::io::{BufWriter, Write};

fn main() {
let seed: u64 = 19690616;
let mut rng = SmallRng::seed_from_u64(seed);
let dg = dogstatsd::DogStatsD::default(&mut rng);

// NOTE we lock stdout and wrap it in a buffered writer to avoid needing to
// reacquire the stdio lock each write and, also, to elide as many writes as
// possible.
let stdout = std::io::stdout();
let mut fp = BufWriter::with_capacity(1_000_000, stdout.lock());
for _ in 0..1_000 {
let member = dg.generate(&mut rng);
writeln!(fp, "{member}").unwrap();
}
fp.flush().unwrap();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that we have criterion now I think we can drop this in a follow-up PR.

@blt blt merged commit 3a68e2e into main Aug 15, 2023
@blt blt deleted the sopell/dogstatsd-generator-revamp branch August 15, 2023 23:36
blt added a commit that referenced this pull request Aug 15, 2023
This commit is born of the work on #666. We realize in that PR that we spend
most of our runtime producing small heap allocated strings. This PR introduces
`Pool` which allows the user to request a small `&str` of a given size. It is
hooked up to only a single payload -- Ascii -- but will be introduced elsewhere
in follow-up work.

REF SMP-664

Signed-off-by: Brian L. Troutwine <[email protected]>
blt added a commit that referenced this pull request Aug 16, 2023
This commit is born of the work on #666. We realize in that PR that we spend
most of our runtime producing small heap allocated strings. This PR introduces
`Pool` which allows the user to request a small `&str` of a given size. It is
hooked up to only a single payload -- Ascii -- but will be introduced elsewhere
in follow-up work.

REF SMP-664

Signed-off-by: Brian L. Troutwine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants