-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
profiler: test random execution trace collection with a fixed seed #2642
Conversation
To de-flake TestExecutionTraceRandom, provide a fixed-seed random number generator so that the results are deterministic. This is done through a non-exported profiler option so it's easy to provide in specific test cases (only one so far). Developers should remove this option while working on anything that might rely on real randomness, verify that it works as intended, and then add the option back to get reliable tests in CI. Fixes #2529
BenchmarksBenchmark execution time: 2024-04-01 15:14:16 Comparing candidate commit 7dc4cf4 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 44 metrics, 0 unstable metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! I'm okay with this approach.
But maybe there is a simpler fix. While looking at this PR, I was trying to figure out why we're seeing these flakes to begin with. According to the comment below, we'd be expecting to see 1 flake every 100k runs.
// We should be within 2 standard deviations ~95% of the time
// with a correct implementation. If we do this twice, then
// we have a ~99.999% chance of succeeding with a correct
// implementation.
However according to your comment here we're seeing something closer to one failure every 500 runs in reality.
I was initially worried we might be seeing a bad RNG or something, but thinking about it a little more, I think the assumption is wrong. If each doTrial()
has a 5% chance of failure, then the chance that two attempts fail should be 0.05*0.05
which is 1 in 400 (aka 99.75% success chance).
That number lines up pretty well with what we seem to observe in the wild, so maybe another way to fix it is to simply go from 2 to 4 trials? That would give us 1:160k odds, which is what the test was initially expecting.
Let me know what you think 🙇
Due to a bit of math sloppiness, we were getting a ~1/500 failure rate for TestExecutionTraceRandom, which was often enough to be irritating to dd-trace-go developers. Each trial has a 95% success rate given a correct implementation. We were doing 2 trials. The comment in the test incorrectly states that 2 trials should have a 99.999% success rate. But, actuall we should expect a ~99.75% success rate for 2 trials, or a 1/400 failure rate, roughly matching what we saw. Increase the number of trials to 4. This actually gives the desired 99.999% success rate. We should expect roughly 1 failure for every 160000 runs. This is a tolerable failure rate, and lets the test remain somewhat robust, rather than use a fixed seed as considered in #2642. I have manually tested this by breaking the implementation (multiplying by an extra rand.Float64() draw) and confirmed that the test still fails reliably. Fixes #2529
🤦 Indeed, thank you very much for double checking this! I had just opted to fix the seed to make the flake go away, but I should have revisited the numbers. I have no idea where the 99.999% came from, looking back. Your analysis is correct. I've also gone back and verified the probabilities to confirm that the "be within 2 standard deviations" probabilities are ~95% for the rates we test: > rates <- c(1/15, 0.5, 1)
> means <- 100*rates
> stddevs <- sqrt(100*rates*(1-rates))
> lo <- means - 2*stddevs
> hi <- means + 2*stddevs
> 1 - (1 - pbinom(hi, size=100, prob=rates) + pbinom(lo, size=100, prob=rates))
[1] 0.9574683 0.9539559 0.0000000 (here, I've sent #2651 implementing your suggestion and will close this one. I think keeping some randomness will make the test more robust. If we just kept a fixed seed then I worry we might break the implementation but still have it coincidentally work for the fixed seed. |
…als (#2651) Due to a bit of math sloppiness, we were getting a ~1/500 failure rate for TestExecutionTraceRandom, which was often enough to be irritating to dd-trace-go developers. Each trial has a 95% success rate given a correct implementation. We were doing 2 trials. The comment in the test incorrectly states that 2 trials should have a 99.999% success rate. But, actuall we should expect a ~99.75% success rate for 2 trials, or a 1/400 failure rate, roughly matching what we saw. Increase the number of trials to 4. This actually gives the desired 99.999% success rate. We should expect roughly 1 failure for every 160000 runs. This is a tolerable failure rate, and lets the test remain somewhat robust, rather than use a fixed seed as considered in #2642. I have manually tested this by breaking the implementation (multiplying by an extra rand.Float64() draw) and confirmed that the test still fails reliably. Fixes #2529
To de-flake TestExecutionTraceRandom, provide a fixed-seed random number
generator so that the results are deterministic. This is done through a
non-exported profiler option so it's easy to provide in specific test
cases (only one so far). Developers should remove this option while
working on anything that might rely on real randomness, verify that it
works as intended, and then add the option back to get reliable tests in
CI.
Fixes #2529