New config options for dogstatsd generation #666

scottopell · 2023-08-09T21:27:38Z

What does this PR do?

DogStatsD traffic can now be generated with direct references to "contexts" which is a key concept in a dogstatsd server. Each timeseries that is being transmitted as part of the dogstatsd traffic needs a unique identifier to be able to aggregate the data (which is the whole point of a dogstatsd server).
This is referred to as a "context" which is uniquely identified by the metric name + tags (+metric type, this is kind of implicit).

In the previous generation config, there was no way to control how many "contexts" would get created which made it challenging to reason about the resulting behavior in the dogstatsd server.

The way multi-value dogstatsd messages are generated has also been changed to work off of a probability that a given msg will have multiple values packed into it. This more closely simulates what multi-value is which is essentially a compression algorithm (if N metrics are all for the same context, there is no need to duplicate the metric name and tags).

Motivation

More control over dogstatsd generation

Related issues

A list of issues either fixed, containing architectural discussions, otherwise relevant
for this Pull Request.

Additional Notes

github-actions · 2023-08-10T19:36:59Z

Regression Detector Results

Run ID: 5e74c2f2-f31d-4bb7-bbcb-a8dca1051974
Baseline: 0c3707b
Comparison: 7f3a3bb
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.
Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.

experiment	goal	Δ mean %	Δ mean % CI	confidence
blackhole_from_apache_common_http	ingress throughput	+0.04	[-0.01, +0.09]	69.12%
apache_common_http_both_directions_this_doesnt_make_sense	ingress throughput	-0.19	[-0.22, -0.15]	100.00%

github-actions · 2023-08-10T22:59:41Z

Regression Detector Results

Run ID: aed7af58-c764-4300-9647-d3fe3509d404
Baseline: 0c3707b
Comparison: bea6c11
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.
Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.

experiment	goal	Δ mean %	Δ mean % CI	confidence
apache_common_http_both_directions_this_doesnt_make_sense	ingress throughput	+0.59	[+0.56, +0.62]	100.00%
blackhole_from_apache_common_http	ingress throughput	-0.39	[-0.44, -0.34]	100.00%

lading/src/payload/dogstatsd.rs

lading/src/payload/dogstatsd/common/tags.rs

github-actions · 2023-08-11T01:48:32Z

Regression Detector Results

Run ID: d94c2cfb-44f1-4ee7-8db2-7bf9a35d59f4
Baseline: 0c3707b
Comparison: 46a60b3
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.
Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.

experiment	goal	Δ mean %	Δ mean % CI	confidence
blackhole_from_apache_common_http	ingress throughput	+0.05	[-0.00, +0.10]	77.49%
apache_common_http_both_directions_this_doesnt_make_sense	ingress throughput	-0.38	[-0.41, -0.36]	100.00%

github-actions · 2023-08-11T02:05:44Z

Regression Detector Results

Run ID: 8bb0e702-8112-4c2f-8136-dd966860ed02
Baseline: 0c3707b
Comparison: 2005f26
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.
Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.

experiment	goal	Δ mean %	Δ mean % CI	confidence
apache_common_http_both_directions_this_doesnt_make_sense	ingress throughput	+1.05	[+1.02, +1.09]	100.00%
blackhole_from_apache_common_http	ingress throughput	+0.28	[+0.23, +0.33]	100.00%

scottopell · 2023-08-11T21:01:27Z

After fixing the defaults to be sane values, for 1_000_000 generations, it currently takes:

hyperfine --warmup 10 ./target/release/lading_rev
Benchmark 1: ./target/release/lading_rev
  Time (mean ± σ):      1.617 s ±  0.031 s    [User: 1.455 s, System: 0.162 s]
  Range (min … max):    1.583 s …  1.653 s    10 runs

which gives ~1.6 microseconds per generation. Much better than 30ms 😅

github-actions · 2023-08-11T21:28:02Z

Regression Detector Results

Run ID: 21360fc5-55d6-4122-8a0c-83e7051ba5ab
Baseline: 0c3707b
Comparison: c22d4bf
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.
Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.

experiment	goal	Δ mean %	Δ mean % CI	confidence
blackhole_from_apache_common_http	ingress throughput	+0.04	[-0.01, +0.09]	68.75%
apache_common_http_both_directions_this_doesnt_make_sense	ingress throughput	-0.33	[-0.36, -0.29]	100.00%

This commit is inspired by the work done in #666. We would have a much easier time writing benchmarks for payload generation _if_ the payloads were isolated to their own crate. Note that in #666 I introduced a `lading_rev` binary to work around this somewhat but the result is not totally satisfactory. REF SMP-664 Signed-off-by: Brian L. Troutwine <[email protected]>

In #671 we moved the payload module into a new crate in order to make it easier to add benchmarks for payload generation. This is now done, we have benchmarks for dogstatsd. Of note, we benchmark setup time -- ~30ms -- and throughput, the later shows the real influence of our longish setup time: the more bytes we generate the better the throughput. It does strike me that the `Serialize` trait will not be suitable for streaming, REF #666. Some payloads like dogstatsd can stream per-message but others require serialization into a container. How to deal with the container notion in a stream, I'm not sure. Something to think about. REF SMP-673 Signed-off-by: Brian L. Troutwine <[email protected]>

* Introduce benchmarks for `lading_payload` In #671 we moved the payload module into a new crate in order to make it easier to add benchmarks for payload generation. This is now done, we have benchmarks for dogstatsd. Of note, we benchmark setup time -- ~30ms -- and throughput, the later shows the real influence of our longish setup time: the more bytes we generate the better the throughput. It does strike me that the `Serialize` trait will not be suitable for streaming, REF #666. Some payloads like dogstatsd can stream per-message but others require serialization into a container. How to deal with the container notion in a stream, I'm not sure. Something to think about. REF SMP-673 Signed-off-by: Brian L. Troutwine <[email protected]> * clippy ding Signed-off-by: Brian L. Troutwine <[email protected]> --------- Signed-off-by: Brian L. Troutwine <[email protected]>

This commit satisfies clippy in a handful of areas but most importantly adds a `lading_rev` binary to the project. This binary will only be compiled when the `dogstatsd_perf` feature flag is added to the build and is meant to be used for optimizing the member generator. Of note, we've started to discuss streaming built `Member` instances direclty to a generator without going through a block cache. To this point we've assumed that the member generation only needs to be "fast enough", hence all the cloning and small string allocations and the like. The generator is _slow_ as a result. Consider that if you compile the project now with ``` > cargo build --release --features dogstatsd_perf --bin lading_rev ... > hyperfine Benchmark 1: ./target/release/lading_rev Time (mean ± σ): 32.688 s ± 3.813 s [User: 28.507 s, System: 2.650 s] Range (min … max): 30.190 s … 40.988 s 10 runs ``` Call it roughly 30ms per `Member` instance, an eternity. I'm working on a Mac so running ``` > cargo instruments --release --features dogstatsd_perf --bin lading_rev -t time ``` does appear to show that we spend _a lot_ of time making and cloning small strings, which is accurate to my understanding of the code as it exists today. `AsciiString` is a particular culprit. 67% of program runtime is in `impl payload::Generator<Tagsets>` as of this commit. Signed-off-by: Brian L. Troutwine <[email protected]>

This commit removes the use of `choose_multiple` in the generator `AsciiString`. Profiling shows that in a run of `lading_rev` this generator is responsible for ~70% of program runtime. In _that_ ~60% of runtime is `choose_multiple`. While convenient, it does turn out that the function implicitly allocates a little `Vec` which we do not really need here, since we're immediately pushing into storage otherwise. ``` ➜ lading git:(sopell/dogstatsd-generator-revamp) ✗ hyperfine --warmup 3 ./target/release/lading_rev Benchmark 1: ./target/release/lading_rev Time (mean ± σ): 20.902 s ± 0.167 s [User: 18.438 s, System: 2.405 s] Range (min … max): 20.727 s … 21.279 s 10 runs ``` Shaves ~10 seconds off. Signed-off-by: Brian L. Troutwine <[email protected]>

Signed-off-by: Brian L. Troutwine <[email protected]>

blt · 2023-08-15T01:31:34Z

dogstatsd_setup         time:   [82.884 ms 83.014 ms 83.160 ms]
                        change: [+0.4945% +0.7625% +1.0110%] (p = 0.00 < 0.05)
                        Change within noise threshold.

dogstatsd_all/1000000   time:   [84.771 ms 84.953 ms 85.146 ms]
                        thrpt:  [11.200 MiB/s 11.226 MiB/s 11.250 MiB/s]
                 change:
                        time:   [-0.1151% +0.1746% +0.4571%] (p = 0.24 > 0.05)
                        thrpt:  [-0.4551% -0.1743% +0.1152%]
                        No change in performance detected.
dogstatsd_all/10000000  time:   [104.31 ms 104.51 ms 104.72 ms]
                        thrpt:  [91.070 MiB/s 91.256 MiB/s 91.428 MiB/s]
                 change:
                        time:   [-0.2355% +0.0661% +0.3635%] (p = 0.67 > 0.05)
                        thrpt:  [-0.3622% -0.0660% +0.2360%]
                        No change in performance detected.
dogstatsd_all/100000000 time:   [306.33 ms 307.14 ms 308.06 ms]
                        thrpt:  [309.58 MiB/s 310.50 MiB/s 311.33 MiB/s]
                 change:
                        time:   [-1.3843% -0.9167% -0.4520%] (p = 0.00 < 0.05)
                        thrpt:  [+0.4541% +0.9251% +1.4038%]
                        Change within noise threshold.
Benchmarking dogstatsd_all/1000000000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 90.0s. You may wish to increase target time to 237.9s, or reduce sample count to 30.
dogstatsd_all/1000000000
                        time:   [2.3695 s 2.3739 s 2.3785 s]
                        thrpt:  [400.96 MiB/s 401.74 MiB/s 402.48 MiB/s]
                 change:
                        time:   [-3.9508% -1.7750% -0.3280%] (p = 0.04 < 0.05)
                        thrpt:  [+0.3291% +1.8070% +4.1133%]
                        Change within noise threshold.

Very cool.

github-actions · 2023-08-15T01:43:05Z

Regression Detector Results

Run ID: 3cbed3d8-3790-4775-9690-75c5241ba129
Baseline: 6249fd5
Comparison: 3178658
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.
Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.

experiment	goal	Δ mean %	Δ mean % CI	confidence
blackhole_from_apache_common_http	ingress throughput	+0.05	[-0.00, +0.10]	78.91%
apache_common_http_both_directions_this_doesnt_make_sense	ingress throughput	-0.23	[-0.26, -0.21]	100.00%

github-actions · 2023-08-15T18:50:26Z

Regression Detector Results

Run ID: 69fa6e2a-f1de-4578-8d96-6f2036013de3
Baseline: 6249fd5
Comparison: 1fefe48
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.
Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.

experiment	goal	Δ mean %	Δ mean % CI	confidence
blackhole_from_apache_common_http	ingress throughput	+0.40	[+0.35, +0.45]	100.00%
apache_common_http_both_directions_this_doesnt_make_sense	ingress throughput	+0.26	[+0.23, +0.29]	100.00%

github-actions · 2023-08-15T19:24:38Z

Regression Detector Results

Run ID: b7dc9631-e616-4eab-82e8-97e34f8be45e
Baseline: 6249fd5
Comparison: 1bcfa8f
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.
Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.

experiment	goal	Δ mean %	Δ mean % CI	confidence
blackhole_from_apache_common_http	ingress throughput	+0.25	[+0.20, +0.31]	100.00%
apache_common_http_both_directions_this_doesnt_make_sense	ingress throughput	-0.19	[-0.22, -0.15]	100.00%

github-actions · 2023-08-15T19:59:34Z

Regression Detector Results

Run ID: 430c0e5c-fcab-4e1e-8da5-b56504e68ef1
Baseline: 6249fd5
Comparison: a3a9748
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.
Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.

experiment	goal	Δ mean %	Δ mean % CI	confidence
apache_common_http_both_directions_this_doesnt_make_sense	ingress throughput	+0.40	[+0.37, +0.42]	100.00%
blackhole_from_apache_common_http	ingress throughput	+0.09	[+0.04, +0.13]	98.14%

This commit is born of the work on #666. We realize in that PR that we spend most of our runtime producing small heap allocated strings. This PR introduces `Pool` which allows the user to request a small `&str` of a given size. It is hooked up to only a single payload -- Ascii -- but will be introduced elsewhere in follow-up work. REF SMP-664 Signed-off-by: Brian L. Troutwine <[email protected]>

github-actions · 2023-08-15T20:37:05Z

Regression Detector Results

Run ID: 317db8f4-a196-4111-a625-cc7e3abcba0e
Baseline: 6249fd5
Comparison: 99ec179
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.
Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.

experiment	goal	Δ mean %	Δ mean % CI	confidence
apache_common_http_both_directions_this_doesnt_make_sense	ingress throughput	-0.03	[-0.06, +0.01]	71.74%
blackhole_from_apache_common_http	ingress throughput	-0.17	[-0.22, -0.12]	100.00%

github-actions · 2023-08-15T21:26:19Z

Regression Detector Results

Run ID: 9d70f52c-274c-4c8f-b8ec-e7ff12b52b20
Baseline: 6249fd5
Comparison: 9d84337
Total lading-target CPUs: 4

Explanation

A regression test is an integrated performance test for lading-target in a repeatable rig, with varying configuration for lading-target. What follows is a statistical summary of a brief lading-target run for each configuration across SHAs given above. The goal of these tests are to determine quickly if lading-target performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.
Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.

experiment	goal	Δ mean %	Δ mean % CI	confidence
apache_common_http_both_directions_this_doesnt_make_sense	ingress throughput	+0.14	[+0.11, +0.16]	100.00%
blackhole_from_apache_common_http	ingress throughput	+0.02	[-0.03, +0.07]	39.97%

blt · 2023-08-15T23:32:43Z

README.md

+For criterion benchmarks, you can run them via `cargo bench`.
+[`cargo-criterion`](https://github.com/bheisler/cargo-criterion)
+ is a more advanced cargo extension that provides
+historical (ie baseline) tracking functionality.


Ah this should just be cargo criterion. I'll address this in a follow-up.

blt · 2023-08-15T23:32:58Z

lading/src/block.rs

+            tag_value_length_maximum,
+            tags_per_msg_minimum,
+            tags_per_msg_maximum,
+            // TODO -- how can I validate user input for multivalue_pack_probability


Flagging for follow-up.

blt · 2023-08-15T23:34:22Z

lading_payload/src/bin/lading_rev.rs

+/// A program whose sole purpose is to generate DogStatsD load and then
+/// exit.
+///
+/// I'm not sure how this will interact with the release process. We probably
+/// should disable this binary at some point.
+use lading_payload::dogstatsd;
+use rand::{rngs::SmallRng, SeedableRng};
+
+use std::io::{BufWriter, Write};
+
+fn main() {
+    let seed: u64 = 19690616;
+    let mut rng = SmallRng::seed_from_u64(seed);
+    let dg = dogstatsd::DogStatsD::default(&mut rng);
+
+    // NOTE we lock stdout and wrap it in a buffered writer to avoid needing to
+    // reacquire the stdio lock each write and, also, to elide as many writes as
+    // possible.
+    let stdout = std::io::stdout();
+    let mut fp = BufWriter::with_capacity(1_000_000, stdout.lock());
+    for _ in 0..1_000 {
+        let member = dg.generate(&mut rng);
+        writeln!(fp, "{member}").unwrap();
+    }
+    fp.flush().unwrap();
+}


Considering that we have criterion now I think we can drop this in a follow-up PR.

This commit is born of the work on #666. We realize in that PR that we spend most of our runtime producing small heap allocated strings. This PR introduces `Pool` which allows the user to request a small `&str` of a given size. It is hooked up to only a single payload -- Ascii -- but will be introduced elsewhere in follow-up work. REF SMP-664 Signed-off-by: Brian L. Troutwine <[email protected]>

blt reviewed Aug 11, 2023

View reviewed changes

blt mentioned this pull request Aug 14, 2023

Move lading::payload into its own crate #671

Merged

blt mentioned this pull request Aug 14, 2023

Introduce benchmarks for lading_payload #672

Merged

blt force-pushed the sopell/dogstatsd-generator-revamp branch from c22d4bf to d1a2b27 Compare August 15, 2023 00:15

scottopell and others added 8 commits August 14, 2023 17:15

[does not build] refactor dogstatsd load generator

61177be

Fixes remaining build issues, almost works now

d63628b

Incorrect use of choose_or_not

3ef4444

Update comments and correct multi-value bound

624e8a3

tidy up after rebase

4fb0757

Signed-off-by: Brian L. Troutwine <[email protected]>

re-add lading_rev, correct defaults

3178658

Signed-off-by: Brian L. Troutwine <[email protected]>

blt force-pushed the sopell/dogstatsd-generator-revamp branch from d1a2b27 to 3178658 Compare August 15, 2023 01:13

scottopell added 2 commits August 15, 2023 18:20

Adds config knobs for min and max name length

1fefe48

Adds config knobs for tag key and tag value string length

c48517c

Fix clippy useless conversion warning

1bcfa8f

scottopell marked this pull request as ready for review August 15, 2023 19:21

Updates readme and dogstatsd example

a3a9748

Suppresses clippy warnings about too-many-arguments

99ec179

blt mentioned this pull request Aug 15, 2023

Introduce a pool mechanism for producing small strings #675

Merged

Removes stale comment and debug build directive

9d84337

scottopell force-pushed the sopell/dogstatsd-generator-revamp branch from 63fc5c7 to 9d84337 Compare August 15, 2023 20:59

blt reviewed Aug 15, 2023

View reviewed changes

blt approved these changes Aug 15, 2023

View reviewed changes

blt merged commit 3a68e2e into main Aug 15, 2023

blt deleted the sopell/dogstatsd-generator-revamp branch August 15, 2023 23:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New config options for dogstatsd generation #666

New config options for dogstatsd generation #666

scottopell commented Aug 9, 2023 •

edited

Loading

github-actions bot commented Aug 10, 2023

github-actions bot commented Aug 10, 2023

github-actions bot commented Aug 11, 2023

github-actions bot commented Aug 11, 2023

scottopell commented Aug 11, 2023

github-actions bot commented Aug 11, 2023

blt commented Aug 15, 2023

github-actions bot commented Aug 15, 2023

github-actions bot commented Aug 15, 2023

github-actions bot commented Aug 15, 2023

github-actions bot commented Aug 15, 2023

github-actions bot commented Aug 15, 2023

github-actions bot commented Aug 15, 2023

blt Aug 15, 2023

blt Aug 15, 2023

blt Aug 15, 2023

New config options for dogstatsd generation #666

New config options for dogstatsd generation #666

Conversation

scottopell commented Aug 9, 2023 • edited Loading

What does this PR do?

Motivation

Related issues

Additional Notes

github-actions bot commented Aug 10, 2023

Regression Detector Results

github-actions bot commented Aug 10, 2023

Regression Detector Results

github-actions bot commented Aug 11, 2023

Regression Detector Results

github-actions bot commented Aug 11, 2023

Regression Detector Results

scottopell commented Aug 11, 2023

github-actions bot commented Aug 11, 2023

Regression Detector Results

blt commented Aug 15, 2023

github-actions bot commented Aug 15, 2023

Regression Detector Results

github-actions bot commented Aug 15, 2023

Regression Detector Results

github-actions bot commented Aug 15, 2023

Regression Detector Results

github-actions bot commented Aug 15, 2023

Regression Detector Results

github-actions bot commented Aug 15, 2023

Regression Detector Results

github-actions bot commented Aug 15, 2023

Regression Detector Results

blt Aug 15, 2023

Choose a reason for hiding this comment

blt Aug 15, 2023

Choose a reason for hiding this comment

blt Aug 15, 2023

Choose a reason for hiding this comment

scottopell commented Aug 9, 2023 •

edited

Loading