Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telemetry worker: flush data after stops #515

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cataphract
Copy link
Contributor

@cataphract cataphract commented Jul 1, 2024

Telemetry workers are functionally dead after a Stop lifecycle action, provided there's no intervening Start. While AddPoint actions are still processed, their data is never flushed, since the Stop action handler unschedules FlushMetrics and FlushData actions.

PHP sends a Stop action at the end of every request via ddog_sidecar_telemetry_end(), but a Start action is only generated just after a telemetry worker is spawned. With no more Start actions generated, no metrics can effectively be sent after the first Stop.

It is not clear to me whether the intention is to have a Start/Stop pair on every PHP request (where Stop flushes the metrics) or if the intention is to to have only such a pair in the first request, with the Stop event generated by ddog_sidecar_telemetry_end() effectively a noop. It would appear, judging by this
comment
:

Also allow the telemetry worker to have a mode where it's continuing
execution after a start-stop cycle, otherwise it won't send any more metrics afterwards.

that the intention is to keep sending metrics after a Start/Stop pair. It also makes more sense, insofar as data is flushed only on the interval, rather than after every request via Stop. In that case:

  • The Stop action handler should not unschedule FlushData and FlushMetrics events and
  • FlushData, if called outside a Start-Stop pair, should not be a noop.

Finally: swap the order in which FlushData and FlushMetrics are scheduled so that FlushMetrics runs first and therefore its generated data can be sent by the next FlushData.

@cataphract cataphract requested a review from a team as a code owner July 1, 2024 13:55
@cataphract cataphract requested review from pawelchcki, bwoebi and bantonsson and removed request for a team and pawelchcki July 1, 2024 13:55
@pr-commenter
Copy link

pr-commenter bot commented Jul 1, 2024

Benchmarks

Comparison

Benchmark execution time: 2024-11-11 11:48:30

Comparing candidate commit b018033 in PR branch glopes/flush-data-after-stop with baseline commit a22637c in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 51 metrics, 2 unstable metrics.

Candidate

Candidate benchmark details

Group 1

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz b018033 1731324931 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_trace/test_trace execution_time 436.035ns 449.145ns ± 13.455ns 440.011ns ± 3.446ns 462.221ns 472.987ns 475.717ns 476.712ns 8.34% 0.583 -1.264 2.99% 0.951ns 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_trace/test_trace execution_time [447.280ns; 451.010ns] or [-0.415%; +0.415%] None None None

Group 2

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz b018033 1731324931 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
tags/replace_trace_tags execution_time 5.116µs 5.142µs ± 0.017µs 5.137µs ± 0.008µs 5.155µs 5.176µs 5.181µs 5.182µs 0.88% 0.827 -0.466 0.32% 0.001µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
tags/replace_trace_tags execution_time [5.140µs; 5.145µs] or [-0.045%; +0.045%] None None None

Group 3

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz b018033 1731324931 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
sql/obfuscate_sql_string execution_time 104.049µs 104.297µs ± 0.305µs 104.265µs ± 0.060µs 104.335µs 104.435µs 104.632µs 108.384µs 3.95% 12.129 159.841 0.29% 0.022µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
sql/obfuscate_sql_string execution_time [104.255µs; 104.339µs] or [-0.041%; +0.041%] None None None

Group 4

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz b018033 1731324931 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
redis/obfuscate_redis_string execution_time 66.453µs 66.606µs ± 0.200µs 66.582µs ± 0.037µs 66.626µs 66.711µs 66.989µs 69.226µs 3.97% 11.351 145.341 0.30% 0.014µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
redis/obfuscate_redis_string execution_time [66.578µs; 66.634µs] or [-0.042%; +0.042%] None None None

Group 5

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz b018033 1731324931 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching deserializing traces from msgpack to their internal representation execution_time 2.200µs 2.243µs ± 0.019µs 2.246µs ± 0.012µs 2.258µs 2.267µs 2.269µs 2.271µs 1.12% -0.669 -0.575 0.84% 0.001µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching deserializing traces from msgpack to their internal representation execution_time [2.241µs; 2.246µs] or [-0.117%; +0.117%] None None None

Group 6

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz b018033 1731324931 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
concentrator/add_spans_to_concentrator execution_time 13.497ms 13.544ms ± 0.027ms 13.541ms ± 0.015ms 13.558ms 13.579ms 13.628ms 13.764ms 1.65% 2.850 20.068 0.20% 0.002ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
concentrator/add_spans_to_concentrator execution_time [13.541ms; 13.548ms] or [-0.028%; +0.028%] None None None

Group 7

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz b018033 1731324931 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time 272.410µs 273.817µs ± 1.911µs 272.976µs ± 0.353µs 274.065µs 277.660µs 280.488µs 287.416µs 5.29% 3.167 14.199 0.70% 0.135µs 1 200
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... throughput 3479276.958op/s 3652242.709op/s ± 24944.519op/s 3663331.400op/s ± 4745.715op/s 3665981.046op/s 3668814.842op/s 3670142.012op/s 3670933.562op/s 0.21% -3.045 12.953 0.68% 1763.844op/s 1 200
normalization/normalize_name/normalize_name/bad-name execution_time 33.802µs 33.887µs ± 0.101µs 33.872µs ± 0.046µs 33.912µs 34.071µs 34.239µs 34.603µs 2.16% 3.484 17.321 0.30% 0.007µs 1 200
normalization/normalize_name/normalize_name/bad-name throughput 28899153.600op/s 29510519.935op/s ± 86910.362op/s 29523159.091op/s ± 40498.967op/s 29564812.578op/s 29573583.357op/s 29575952.790op/s 29583930.715op/s 0.21% -3.421 16.724 0.29% 6145.491op/s 1 200
normalization/normalize_name/normalize_name/good execution_time 24.111µs 24.148µs ± 0.084µs 24.117µs ± 0.002µs 24.181µs 24.253µs 24.286µs 25.009µs 3.69% 6.927 61.569 0.35% 0.006µs 1 200
normalization/normalize_name/normalize_name/good throughput 39986393.362op/s 41411671.672op/s ± 141055.734op/s 41463703.783op/s ± 4092.676op/s 41466609.743op/s 41469838.257op/s 41474948.474op/s 41475596.577op/s 0.03% -6.766 59.246 0.34% 9974.147op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time [273.553µs; 274.082µs] or [-0.097%; +0.097%] None None None
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... throughput [3648785.638op/s; 3655699.779op/s] or [-0.095%; +0.095%] None None None
normalization/normalize_name/normalize_name/bad-name execution_time [33.873µs; 33.900µs] or [-0.041%; +0.041%] None None None
normalization/normalize_name/normalize_name/bad-name throughput [29498474.994op/s; 29522564.875op/s] or [-0.041%; +0.041%] None None None
normalization/normalize_name/normalize_name/good execution_time [24.136µs; 24.160µs] or [-0.048%; +0.048%] None None None
normalization/normalize_name/normalize_name/good throughput [41392122.704op/s; 41431220.641op/s] or [-0.047%; +0.047%] None None None

Group 8

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz b018033 1731324931 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time 517.987µs 519.143µs ± 2.087µs 518.952µs ± 0.278µs 519.194µs 519.562µs 522.386µs 540.283µs 4.11% 9.058 84.979 0.40% 0.148µs 1 200
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... throughput 1850880.354op/s 1926282.991op/s ± 7474.501op/s 1926960.258op/s ± 1032.869op/s 1928156.178op/s 1929343.917op/s 1930474.595op/s 1930552.011op/s 0.19% -8.998 84.151 0.39% 528.527op/s 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time 473.857µs 477.455µs ± 1.466µs 477.442µs ± 1.002µs 478.525µs 479.690µs 479.999µs 482.379µs 1.03% -0.148 -0.103 0.31% 0.104µs 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて throughput 2073060.506op/s 2094458.486op/s ± 6434.197op/s 2094494.861op/s ± 4406.223op/s 2098239.190op/s 2105844.004op/s 2108861.499op/s 2110340.258op/s 0.76% 0.165 -0.113 0.31% 454.966op/s 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time 192.409µs 193.141µs ± 0.233µs 193.175µs ± 0.188µs 193.313µs 193.481µs 193.548µs 193.620µs 0.23% -0.229 -0.721 0.12% 0.017µs 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters throughput 5164750.857op/s 5177578.755op/s ± 6260.138op/s 5176652.076op/s ± 5026.929op/s 5183484.148op/s 5186437.497op/s 5188889.530op/s 5197262.901op/s 0.40% 0.234 -0.716 0.12% 442.659op/s 1 200
normalization/normalize_service/normalize_service/[empty string] execution_time 100.391µs 101.058µs ± 0.179µs 101.054µs ± 0.093µs 101.170µs 101.385µs 101.447µs 101.556µs 0.50% -0.305 1.275 0.18% 0.013µs 1 200
normalization/normalize_service/normalize_service/[empty string] throughput 9846740.253op/s 9895297.830op/s ± 17579.601op/s 9895658.266op/s ± 9096.988op/s 9902671.513op/s 9923544.376op/s 9949196.739op/s 9961018.678op/s 0.66% 0.322 1.300 0.18% 1243.066op/s 1 200
normalization/normalize_service/normalize_service/test_ASCII execution_time 60.200µs 60.494µs ± 0.212µs 60.451µs ± 0.107µs 60.562µs 60.972µs 61.167µs 61.310µs 1.42% 1.444 2.067 0.35% 0.015µs 1 200
normalization/normalize_service/normalize_service/test_ASCII throughput 16310646.274op/s 16530857.705op/s ± 57630.554op/s 16542414.752op/s ± 29142.859op/s 16569810.798op/s 16599017.192op/s 16608186.136op/s 16611198.862op/s 0.42% -1.423 1.997 0.35% 4075.096op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time [518.853µs; 519.432µs] or [-0.056%; +0.056%] None None None
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... throughput [1925247.097op/s; 1927318.884op/s] or [-0.054%; +0.054%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time [477.252µs; 477.658µs] or [-0.043%; +0.043%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて throughput [2093566.768op/s; 2095350.204op/s] or [-0.043%; +0.043%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time [193.108µs; 193.173µs] or [-0.017%; +0.017%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters throughput [5176711.160op/s; 5178446.350op/s] or [-0.017%; +0.017%] None None None
normalization/normalize_service/normalize_service/[empty string] execution_time [101.034µs; 101.083µs] or [-0.025%; +0.025%] None None None
normalization/normalize_service/normalize_service/[empty string] throughput [9892861.467op/s; 9897734.194op/s] or [-0.025%; +0.025%] None None None
normalization/normalize_service/normalize_service/test_ASCII execution_time [60.464µs; 60.523µs] or [-0.049%; +0.049%] None None None
normalization/normalize_service/normalize_service/test_ASCII throughput [16522870.665op/s; 16538844.746op/s] or [-0.048%; +0.048%] None None None

Group 9

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz b018033 1731324931 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
write only interface execution_time 3.250µs 4.404µs ± 1.741µs 4.228µs ± 0.023µs 4.249µs 4.318µs 16.512µs 20.789µs 391.72% 8.033 64.677 39.43% 0.123µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
write only interface execution_time [4.163µs; 4.645µs] or [-5.478%; +5.478%] None None None

Group 10

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz b018033 1731324931 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
credit_card/is_card_number/ execution_time 8.607µs 8.621µs ± 0.007µs 8.620µs ± 0.002µs 8.623µs 8.628µs 8.634µs 8.694µs 0.86% 6.759 73.371 0.08% 0.000µs 1 200
credit_card/is_card_number/ throughput 115027176.037op/s 116000810.655op/s ± 88295.119op/s 116011720.881op/s ± 32023.859op/s 116036678.784op/s 116076228.220op/s 116157608.397op/s 116180234.070op/s 0.15% -6.691 72.412 0.08% 6243.408op/s 1 200
credit_card/is_card_number/ 3782-8224-6310-005 execution_time 91.638µs 92.411µs ± 0.753µs 92.070µs ± 0.345µs 92.973µs 93.774µs 93.792µs 94.621µs 2.77% 0.785 -0.758 0.81% 0.053µs 1 200
credit_card/is_card_number/ 3782-8224-6310-005 throughput 10568501.208op/s 10821898.490op/s ± 87595.284op/s 10861338.343op/s ± 40898.474op/s 10896329.821op/s 10903111.764op/s 10904162.866op/s 10912480.392op/s 0.47% -0.770 -0.793 0.81% 6193.922op/s 1 200
credit_card/is_card_number/ 378282246310005 execution_time 83.722µs 84.957µs ± 0.739µs 84.892µs ± 0.693µs 85.537µs 86.159µs 86.246µs 86.352µs 1.72% 0.335 -1.256 0.87% 0.052µs 1 200
credit_card/is_card_number/ 378282246310005 throughput 11580464.502op/s 11771520.049op/s ± 102132.243op/s 11779718.327op/s ± 96021.748op/s 11876996.797op/s 11889445.248op/s 11891630.148op/s 11944316.004op/s 1.40% -0.319 -1.269 0.87% 7221.840op/s 1 200
credit_card/is_card_number/37828224631 execution_time 8.610µs 8.621µs ± 0.006µs 8.621µs ± 0.002µs 8.623µs 8.626µs 8.629µs 8.695µs 0.86% 8.789 104.685 0.07% 0.000µs 1 200
credit_card/is_card_number/37828224631 throughput 115010209.261op/s 115995177.225op/s ± 81757.330op/s 115997394.527op/s ± 28862.254op/s 116028384.380op/s 116051965.512op/s 116139780.430op/s 116145331.523op/s 0.13% -8.726 103.713 0.07% 5781.116op/s 1 200
credit_card/is_card_number/378282246310005 execution_time 81.291µs 81.851µs ± 0.831µs 81.373µs ± 0.062µs 82.124µs 83.899µs 83.948µs 83.969µs 3.19% 1.464 0.723 1.01% 0.059µs 1 200
credit_card/is_card_number/378282246310005 throughput 11909126.971op/s 12218502.596op/s ± 122200.418op/s 12289119.829op/s ± 9358.849op/s 12294665.959op/s 12299590.178op/s 12300867.912op/s 12301504.474op/s 0.10% -1.446 0.663 1.00% 8640.874op/s 1 200
credit_card/is_card_number/37828224631000521389798 execution_time 62.949µs 63.028µs ± 0.047µs 63.021µs ± 0.017µs 63.040µs 63.080µs 63.119µs 63.566µs 0.87% 7.537 81.543 0.08% 0.003µs 1 200
credit_card/is_card_number/37828224631000521389798 throughput 15731663.336op/s 15866036.557op/s ± 11865.870op/s 15867851.737op/s ± 4339.001op/s 15872028.724op/s 15874891.636op/s 15875970.022op/s 15885883.804op/s 0.11% -7.477 80.610 0.07% 839.044op/s 1 200
credit_card/is_card_number/x371413321323331 execution_time 10.826µs 10.833µs ± 0.005µs 10.833µs ± 0.003µs 10.835µs 10.839µs 10.845µs 10.866µs 0.31% 2.204 12.675 0.04% 0.000µs 1 200
credit_card/is_card_number/x371413321323331 throughput 92028261.042op/s 92311851.138op/s ± 39173.562op/s 92310311.302op/s ± 27846.065op/s 92342486.792op/s 92358642.262op/s 92365173.896op/s 92366563.304op/s 0.06% -2.191 12.562 0.04% 2769.989op/s 1 200
credit_card/is_card_number_no_luhn/ execution_time 8.611µs 8.622µs ± 0.003µs 8.621µs ± 0.002µs 8.624µs 8.628µs 8.630µs 8.631µs 0.12% 0.330 -0.064 0.04% 0.000µs 1 200
credit_card/is_card_number_no_luhn/ throughput 115865468.604op/s 115987689.843op/s ± 46169.415op/s 116000568.405op/s ± 31183.684op/s 116024423.141op/s 116037837.618op/s 116085751.966op/s 116126461.717op/s 0.11% -0.328 -0.063 0.04% 3264.671op/s 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time 74.450µs 74.620µs ± 0.222µs 74.548µs ± 0.030µs 74.575µs 75.300µs 75.355µs 75.488µs 1.26% 2.558 5.211 0.30% 0.016µs 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput 13247090.163op/s 13401416.142op/s ± 39558.710op/s 13414231.158op/s ± 5331.105op/s 13419819.060op/s 13422480.238op/s 13424140.162op/s 13431814.596op/s 0.13% -2.552 5.178 0.29% 2797.223op/s 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time 67.676µs 67.901µs ± 0.954µs 67.721µs ± 0.025µs 67.734µs 68.047µs 73.721µs 73.742µs 8.89% 5.583 29.806 1.40% 0.067µs 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 throughput 13560855.162op/s 14729936.548op/s ± 192001.683op/s 14766425.578op/s ± 5530.049op/s 14772811.379op/s 14774955.269op/s 14775834.625op/s 14776248.658op/s 0.07% -5.557 29.513 1.30% 13576.569op/s 1 200
credit_card/is_card_number_no_luhn/37828224631 execution_time 8.610µs 8.621µs ± 0.004µs 8.621µs ± 0.002µs 8.623µs 8.628µs 8.630µs 8.641µs 0.24% 0.872 3.226 0.04% 0.000µs 1 200
credit_card/is_card_number_no_luhn/37828224631 throughput 115726619.634op/s 115993759.954op/s ± 50955.598op/s 116000315.298op/s ± 32449.517op/s 116030346.984op/s 116049414.640op/s 116104484.627op/s 116139917.290op/s 0.12% -0.866 3.202 0.04% 3603.105op/s 1 200
credit_card/is_card_number_no_luhn/378282246310005 execution_time 64.458µs 64.562µs ± 0.171µs 64.514µs ± 0.025µs 64.538µs 64.893µs 65.325µs 65.440µs 1.43% 3.549 12.376 0.26% 0.012µs 1 200
credit_card/is_card_number_no_luhn/378282246310005 throughput 15281253.441op/s 15489151.009op/s ± 40678.364op/s 15500459.589op/s ± 5975.582op/s 15506439.824op/s 15508974.734op/s 15510025.401op/s 15513860.083op/s 0.09% -3.535 12.274 0.26% 2876.395op/s 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time 62.980µs 63.025µs ± 0.022µs 63.028µs ± 0.019µs 63.039µs 63.073µs 63.084µs 63.100µs 0.11% 0.639 0.190 0.04% 0.002µs 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput 15847979.009op/s 15866598.687op/s ± 5654.921op/s 15865983.332op/s ± 4658.576op/s 15871703.926op/s 15873350.232op/s 15875414.777op/s 15878162.763op/s 0.08% -0.637 0.185 0.04% 399.863op/s 1 200
credit_card/is_card_number_no_luhn/x371413321323331 execution_time 10.826µs 10.833µs ± 0.004µs 10.834µs ± 0.004µs 10.836µs 10.841µs 10.845µs 10.849µs 0.14% 0.614 0.261 0.04% 0.000µs 1 200
credit_card/is_card_number_no_luhn/x371413321323331 throughput 92176812.284op/s 92308936.314op/s ± 35881.811op/s 92303275.398op/s ± 30853.434op/s 92341441.675op/s 92352624.490op/s 92361618.206op/s 92373091.718op/s 0.08% -0.612 0.254 0.04% 2537.227op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
credit_card/is_card_number/ execution_time [8.620µs; 8.622µs] or [-0.011%; +0.011%] None None None
credit_card/is_card_number/ throughput [115988573.801op/s; 116013047.509op/s] or [-0.011%; +0.011%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 execution_time [92.307µs; 92.516µs] or [-0.113%; +0.113%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 throughput [10809758.626op/s; 10834038.354op/s] or [-0.112%; +0.112%] None None None
credit_card/is_card_number/ 378282246310005 execution_time [84.855µs; 85.060µs] or [-0.121%; +0.121%] None None None
credit_card/is_card_number/ 378282246310005 throughput [11757365.503op/s; 11785674.596op/s] or [-0.120%; +0.120%] None None None
credit_card/is_card_number/37828224631 execution_time [8.620µs; 8.622µs] or [-0.010%; +0.010%] None None None
credit_card/is_card_number/37828224631 throughput [115983846.445op/s; 116006508.005op/s] or [-0.010%; +0.010%] None None None
credit_card/is_card_number/378282246310005 execution_time [81.736µs; 81.966µs] or [-0.141%; +0.141%] None None None
credit_card/is_card_number/378282246310005 throughput [12201566.794op/s; 12235438.399op/s] or [-0.139%; +0.139%] None None None
credit_card/is_card_number/37828224631000521389798 execution_time [63.021µs; 63.034µs] or [-0.010%; +0.010%] None None None
credit_card/is_card_number/37828224631000521389798 throughput [15864392.061op/s; 15867681.052op/s] or [-0.010%; +0.010%] None None None
credit_card/is_card_number/x371413321323331 execution_time [10.832µs; 10.833µs] or [-0.006%; +0.006%] None None None
credit_card/is_card_number/x371413321323331 throughput [92306422.059op/s; 92317280.217op/s] or [-0.006%; +0.006%] None None None
credit_card/is_card_number_no_luhn/ execution_time [8.621µs; 8.622µs] or [-0.006%; +0.006%] None None None
credit_card/is_card_number_no_luhn/ throughput [115981291.206op/s; 115994088.480op/s] or [-0.006%; +0.006%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time [74.589µs; 74.650µs] or [-0.041%; +0.041%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput [13395933.686op/s; 13406898.599op/s] or [-0.041%; +0.041%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time [67.769µs; 68.034µs] or [-0.195%; +0.195%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 throughput [14703326.961op/s; 14756546.135op/s] or [-0.181%; +0.181%] None None None
credit_card/is_card_number_no_luhn/37828224631 execution_time [8.621µs; 8.622µs] or [-0.006%; +0.006%] None None None
credit_card/is_card_number_no_luhn/37828224631 throughput [115986697.998op/s; 116000821.910op/s] or [-0.006%; +0.006%] None None None
credit_card/is_card_number_no_luhn/378282246310005 execution_time [64.538µs; 64.585µs] or [-0.037%; +0.037%] None None None
credit_card/is_card_number_no_luhn/378282246310005 throughput [15483513.379op/s; 15494788.639op/s] or [-0.036%; +0.036%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time [63.022µs; 63.029µs] or [-0.005%; +0.005%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput [15865814.969op/s; 15867382.404op/s] or [-0.005%; +0.005%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 execution_time [10.833µs; 10.834µs] or [-0.005%; +0.005%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 throughput [92303963.440op/s; 92313909.188op/s] or [-0.005%; +0.005%] None None None

Group 11

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz b018033 1731324931 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
two way interface execution_time 30.329µs 41.189µs ± 18.791µs 31.299µs ± 0.274µs 53.270µs 68.432µs 74.029µs 185.195µs 491.69% 4.286 27.797 45.51% 1.329µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
two way interface execution_time [38.584µs; 43.793µs] or [-6.323%; +6.323%] None None None

Group 12

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz b018033 1731324931 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching string interning on wordpress profile execution_time 193.683µs 195.245µs ± 0.403µs 195.264µs ± 0.176µs 195.434µs 195.773µs 196.233µs 197.380µs 1.08% 0.048 5.372 0.21% 0.028µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching string interning on wordpress profile execution_time [195.189µs; 195.301µs] or [-0.029%; +0.029%] None None None

Baseline

Omitted due to size.

Copy link
Contributor

@bantonsson bantonsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These booleans are woefully undocumented and I'm not completely sure about the expected life cycle, but from reading the code it looks like your interpretation is correct.

I think that it might be better to leave the worker as started if it's restartable and leave the checks in place.

@@ -296,7 +293,9 @@ impl TelemetryWorker {
self.log_err(&e);
}
self.data.started = false;
self.deadlines.clear_pending();
if !self.config.restartable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be enough to only include self.data.started = false; inside the if statement as well, and leave the exit early checks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends. Do we want Start, Stop, Stop to generate two stops? Because that's what PHP ends up generating. The second stop is a noop, but if I moved the assignment self.data.started = false under the condition if !self.config.restartable, then the stops would be effective

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the exit early checks are still in the code, then how would the second Stop be effective?

Copy link
Contributor Author

@cataphract cataphract Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, what you're proposing is that I reset started = false only if !restartable. In that case started would stay true forever once there is a start. So the early check

 if !self.data.started {
   return BREAK;
 }

would never be hit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I was confusing the early checks.

There is something in the logic that feels a bit broken. The FlushData is also protected by this !self.data.started check. Should it work after a restartable stop?

The things that Stop does, should they happen when the first request ends, or when the worker stops?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should [FlushData] work after a restartable stop?

My guess is yes, otherwise there is no way to send the data that's collected after the stop (build_observability_batch is called only from the handlers of Stop and FlushData), at least not without an intervening start+stop.

The things that Stop does, should they happen when the first request ends, or when the worker stops?

That is a good point. The final flush of the metrics should happen when the worker stops, not when handling Stop. I guess at some point they were the same, but then the restart thing was introduced. But regardless, once we have a way to send metrics after a Stop, for that happen periodic flushes should still happen. So FlushData shouldn't be skipped or unscheduled after a Stop.

@@ -458,7 +454,9 @@ impl TelemetryWorker {
.await;

self.data.started = false;
self.deadlines.clear_pending();
if !self.config.restartable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@cataphract cataphract force-pushed the glopes/flush-data-after-stop branch from 4f63a34 to 251bafa Compare July 4, 2024 15:39
@codecov-commenter
Copy link

codecov-commenter commented Jul 4, 2024

Codecov Report

Attention: Patch coverage is 83.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 71.56%. Comparing base (1fc6b11) to head (b018033).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #515      +/-   ##
==========================================
- Coverage   71.61%   71.56%   -0.06%     
==========================================
  Files         281      281              
  Lines       42500    42511      +11     
==========================================
- Hits        30437    30421      -16     
- Misses      12063    12090      +27     
Components Coverage Δ
crashtracker 44.95% <ø> (ø)
crashtracker-ffi 9.20% <ø> (ø)
datadog-alloc 98.73% <ø> (ø)
data-pipeline 91.63% <100.00%> (-0.58%) ⬇️
data-pipeline-ffi 0.00% <ø> (ø)
ddcommon 83.46% <ø> (ø)
ddcommon-ffi 69.12% <ø> (ø)
ddtelemetry 59.15% <83.33%> (+0.05%) ⬆️
ddtelemetry-ffi 22.13% <ø> (ø)
dogstatsd 89.45% <ø> (ø)
dogstatsd-client 79.77% <ø> (ø)
ipc 82.75% <ø> (-0.11%) ⬇️
profiling 84.30% <ø> (ø)
profiling-ffi 77.46% <ø> (ø)
serverless 0.00% <ø> (ø)
sidecar 37.42% <0.00%> (ø)
sidecar-ffi 0.00% <ø> (ø)
spawn-worker 50.36% <ø> (ø)
tinybytes 94.77% <ø> (ø)
trace-mini-agent 72.18% <16.66%> (-0.27%) ⬇️
trace-normalization 98.25% <ø> (ø)
trace-obfuscation 95.77% <ø> (ø)
trace-protobuf 77.67% <ø> (ø)
trace-utils 93.53% <74.19%> (-0.07%) ⬇️

@cataphract cataphract force-pushed the glopes/flush-data-after-stop branch 4 times, most recently from 447c409 to a7d11b0 Compare July 12, 2024 16:03
@cataphract cataphract requested review from a team as code owners July 12, 2024 16:03
@cataphract cataphract force-pushed the glopes/flush-data-after-stop branch 2 times, most recently from 9c34541 to e458f75 Compare October 15, 2024 11:47
@cataphract
Copy link
Contributor Author

@bwoebi Can you look at this? I need it for DataDog/dd-trace-php#2735

Telemetry workers are functionally dead after a Stop lifecycle action,
provided there's no intervening Start. While AddPoint actions are still
processed, their data is never flushed, since the Stop action handler
unschedules FlushMetrics and FlushData actions.

PHP sends a Stop action at the end of every request via
ddog_sidecar_telemetry_end(), but a Start action is only generated just
after a telemetry worker is spawned.

It is not clear to me whether the intention is to a Start/Stop pair on
every PHP requests (where Stop flushes the metrics) or if the intention
is to to have only such a pair in the first request, with the Stop event
generated by ddog_sidecar_telemetry_end() effectively a noop. It would
appear, judging by [this
comment](#391):

> Also allow the telemetry worker to have a mode where it's continuing
execution after a start-stop cycle, otherwise it won't send any more
metrics afterwards.

that the intention is to keep sending metrics after a Start/Stop pair.
In that case:

* The Stop action handler should not unschedule FlushData and
  FlushMetrics events and
* FlushData, if called outside a Start-Stop pair, should not be a noop.

Finally: swap the order in which FlushData and FlushMetrics are
scheduled so that FlushMetrics runs first and therefore its generated
data can be sent by the next FlushData.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants