Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tail latency issue on reads and writes due to slow run_pending_tasks #498

Open
tatsuya6502 opened this issue Feb 15, 2025 · 15 comments
Open

Comments

@tatsuya6502
Copy link
Member

tatsuya6502 commented Feb 15, 2025

Originally posted by @texascloud in #79 (comment)

Hello! I have come to you after much debugging. I don't know why, but the logic gated by this code causes .run_pending_tasks() to occasionally take significantly longer, especially if the number of entries which are being evicted is in the range of millions, but it presents at values significantly lower than that too.

if tbc >= 25_000.0 || tbc / real_cap >= 0.1

If I change the 0.1 to 0.5, I get results like this:

0215 00:20:22.976972  INFO [serial] inserting 5000000 items
0215 00:20:32.243169  INFO done after 9266ms - size: 4999936
0215 00:20:32.243266  INFO [before sync] size: 5000000
0215 00:20:39.282989  INFO [polling till empty] num_during_iter: 5000000 - removed: cycle/60001 total/60001 - elapsed_ms: 38 - items/ms: 1578
0215 00:20:43.281050  INFO [polling till empty] num_during_iter: 4939999 - removed: cycle/60002 total/120003 - elapsed_ms: 36 - items/ms: 1666
0215 00:20:44.280029  INFO [polling till empty] num_during_iter: 4879997 - removed: cycle/60002 total/180005 - elapsed_ms: 35 - items/ms: 1714
0215 00:20:47.279363  INFO [polling till empty] num_during_iter: 4819995 - removed: cycle/60002 total/240007 - elapsed_ms: 34 - items/ms: 1764
0215 00:20:50.280655  INFO [polling till empty] num_during_iter: 4759993 - removed: cycle/60002 total/300009 - elapsed_ms: 36 - items/ms: 1666
0215 00:20:52.283514  INFO [polling till empty] num_during_iter: 4699991 - removed: cycle/60002 total/360011 - elapsed_ms: 39 - items/ms: 1538
0215 00:20:55.279434  INFO [polling till empty] num_during_iter: 4639989 - removed: cycle/60002 total/420013 - elapsed_ms: 35 - items/ms: 1714
0215 00:20:57.284198  INFO [polling till empty] num_during_iter: 4579987 - removed: cycle/60002 total/480015 - elapsed_ms: 38 - items/ms: 1579
0215 00:21:00.284056  INFO [polling till empty] num_during_iter: 4519985 - removed: cycle/60002 total/540017 - elapsed_ms: 39 - items/ms: 1538
0215 00:21:03.286988  INFO [polling till empty] num_during_iter: 4459983 - removed: cycle/60002 total/600019 - elapsed_ms: 42 - items/ms: 1428
0215 00:21:05.283812  INFO [polling till empty] num_during_iter: 4399981 - removed: cycle/60002 total/660021 - elapsed_ms: 38 - items/ms: 1579
0215 00:21:08.282565  INFO [polling till empty] num_during_iter: 4339979 - removed: cycle/60002 total/720023 - elapsed_ms: 38 - items/ms: 1579
0215 00:21:10.257832  INFO [polling till empty] num_during_iter: 4279977 - removed: cycle/18143 total/738166 - elapsed_ms: 13 - items/ms: 1395
0215 00:21:12.271668  INFO [polling till empty] num_during_iter: 4261834 - removed: cycle/41859 total/780025 - elapsed_ms: 27 - items/ms: 1550
0215 00:21:13.284663  INFO [polling till empty] num_during_iter: 4219975 - removed: cycle/60002 total/840027 - elapsed_ms: 39 - items/ms: 1538
0215 00:21:16.284791  INFO [polling till empty] num_during_iter: 4159973 - removed: cycle/60002 total/900029 - elapsed_ms: 40 - items/ms: 1500
0215 00:21:18.282902  INFO [polling till empty] num_during_iter: 4099971 - removed: cycle/60002 total/960031 - elapsed_ms: 38 - items/ms: 1579
0215 00:21:21.285095  INFO [polling till empty] num_during_iter: 4039969 - removed: cycle/60002 total/1020033 - elapsed_ms: 39 - items/ms: 1538
0215 00:21:23.270587  INFO [polling till empty] num_during_iter: 3979967 - removed: cycle/37073 total/1057106 - elapsed_ms: 26 - items/ms: 1425
0215 00:21:24.259405  INFO [polling till empty] num_during_iter: 3942894 - removed: cycle/22929 total/1080035 - elapsed_ms: 15 - items/ms: 1528
0215 00:21:26.285166  INFO [polling till empty] num_during_iter: 3919965 - removed: cycle/60002 total/1140037 - elapsed_ms: 40 - items/ms: 1500
0215 00:21:29.284835  INFO [polling till empty] num_during_iter: 3859963 - removed: cycle/60002 total/1200039 - elapsed_ms: 39 - items/ms: 1538
0215 00:21:31.284801  INFO [polling till empty] num_during_iter: 3799961 - removed: cycle/60002 total/1260041 - elapsed_ms: 40 - items/ms: 1500
0215 00:21:34.284702  INFO [polling till empty] num_during_iter: 3739959 - removed: cycle/60002 total/1320043 - elapsed_ms: 40 - items/ms: 1500
0215 00:21:36.284188  INFO [polling till empty] num_during_iter: 3679957 - removed: cycle/60002 total/1380045 - elapsed_ms: 40 - items/ms: 1500
0215 00:21:39.285984  INFO [polling till empty] num_during_iter: 3619955 - removed: cycle/60002 total/1440047 - elapsed_ms: 41 - items/ms: 1463
0215 00:21:42.284987  INFO [polling till empty] num_during_iter: 3559953 - removed: cycle/60002 total/1500049 - elapsed_ms: 40 - items/ms: 1500
0215 00:21:45.286834  INFO [polling till empty] num_during_iter: 3499951 - removed: cycle/60002 total/1560051 - elapsed_ms: 41 - items/ms: 1463
inner loop longer now 55ms
inner loop longer now 53ms
inner loop longer now 53ms
inner loop longer now 53ms
inner loop longer now 58ms
inner loop longer now 57ms
inner loop longer now 60ms
inner loop longer now 65ms
inner loop longer now 59ms
inner loop longer now 61ms
inner loop longer now 61ms
inner loop longer now 57ms
inner loop longer now 60ms
inner loop longer now 58ms
inner loop longer now 53ms
inner loop longer now 52ms
inner loop longer now 51ms
inner loop longer now 50ms
inner loop longer now 55ms
inner loop longer now 59ms
inner loop longer now 59ms
inner loop longer now 62ms
inner loop longer now 61ms
inner loop longer now 60ms
inner loop longer now 57ms
inner loop longer now 55ms
inner loop longer now 58ms
inner loop longer now 59ms
inner loop longer now 55ms
inner loop longer now 53ms
inner loop longer now 47ms
inner loop longer now 45ms
inner loop longer now 47ms
inner loop longer now 57ms
inner loop longer now 63ms
inner loop longer now 58ms
inner loop longer now 52ms
inner loop longer now 54ms
inner loop longer now 61ms
inner loop longer now 58ms
inner loop longer now 56ms
inner loop longer now 57ms
inner loop longer now 63ms
inner loop longer now 60ms
inner loop longer now 64ms
inner loop longer now 64ms
inner loop longer now 62ms
inner loop longer now 55ms
inner loop longer now 59ms
inner loop longer now 60ms
inner loop longer now 57ms
inner loop longer now 57ms
inner loop longer now 58ms
inner loop longer now 49ms
inner loop longer now 55ms
inner loop longer now 51ms
inner loop longer now 49ms
inner loop longer now 52ms
inner loop longer now 47ms
inner loop longer now 60ms
inner loop longer now 57ms
inner loop longer now 60ms
other work in expiring entries took 3648
0215 00:21:50.899043  INFO [polling till empty] num_during_iter: 3439949 - removed: cycle/60002 total/1620053 - elapsed_ms: 3654 - items/ms: 16
inner loop longer now 58ms
inner loop longer now 60ms
0215 00:21:51.149464  INFO [polling till empty] num_during_iter: 3379947 - removed: cycle/60002 total/1680055 - elapsed_ms: 250 - items/ms: 240
0215 00:21:52.343265  INFO [polling till empty] num_during_iter: 3319945 - removed: cycle/46274 total/1726329 - elapsed_ms: 98 - items/ms: 472
0215 00:21:53.277026  INFO [polling till empty] num_during_iter: 3273671 - removed: cycle/13728 total/1740057 - elapsed_ms: 32 - items/ms: 429
0215 00:21:55.373140  INFO [polling till empty] num_during_iter: 3259943 - removed: cycle/60002 total/1800059 - elapsed_ms: 128 - items/ms: 468
0215 00:21:58.378099  INFO [polling till empty] num_during_iter: 3199941 - removed: cycle/60002 total/1860061 - elapsed_ms: 133 - items/ms: 451
0215 00:22:00.374099  INFO [polling till empty] num_during_iter: 3139939 - removed: cycle/60002 total/1920063 - elapsed_ms: 129 - items/ms: 465
0215 00:22:03.373482  INFO [polling till empty] num_during_iter: 3079937 - removed: cycle/60002 total/1980065 - elapsed_ms: 129 - items/ms: 465
0215 00:22:05.311049  INFO [polling till empty] num_during_iter: 3019935 - removed: cycle/60002 total/2040067 - elapsed_ms: 66 - items/ms: 909
0215 00:22:08.307935  INFO [polling till empty] num_during_iter: 2959933 - removed: cycle/60002 total/2100069 - elapsed_ms: 63 - items/ms: 952
0215 00:22:11.283043  INFO [polling till empty] num_during_iter: 2899931 - removed: cycle/60002 total/2160071 - elapsed_ms: 38 - items/ms: 1579
0215 00:22:13.283127  INFO [polling till empty] num_during_iter: 2839929 - removed: cycle/60002 total/2220073 - elapsed_ms: 38 - items/ms: 1579
0215 00:22:16.283067  INFO [polling till empty] num_during_iter: 2779927 - removed: cycle/60002 total/2280075 - elapsed_ms: 38 - items/ms: 1579
0215 00:22:18.282396  INFO [polling till empty] num_during_iter: 2719925 - removed: cycle/60002 total/2340077 - elapsed_ms: 37 - items/ms: 1621
0215 00:22:21.282380  INFO [polling till empty] num_during_iter: 2659923 - removed: cycle/60002 total/2400079 - elapsed_ms: 38 - items/ms: 1579
0215 00:22:23.270550  INFO [polling till empty] num_during_iter: 2599921 - removed: cycle/39493 total/2439572 - elapsed_ms: 26 - items/ms: 1518
0215 00:22:25.257015  INFO [polling till empty] num_during_iter: 2560428 - removed: cycle/20509 total/2460081 - elapsed_ms: 12 - items/ms: 1709
0215 00:22:26.281062  INFO [polling till empty] num_during_iter: 2539919 - removed: cycle/60002 total/2520083 - elapsed_ms: 36 - items/ms: 1666
0215 00:22:29.286712  INFO [polling till empty] num_during_iter: 2479917 - removed: cycle/60002 total/2580085 - elapsed_ms: 42 - items/ms: 1428
0215 00:22:31.284630  INFO [polling till empty] num_during_iter: 2419915 - removed: cycle/60002 total/2640087 - elapsed_ms: 39 - items/ms: 1538
0215 00:22:34.284504  INFO [polling till empty] num_during_iter: 2359913 - removed: cycle/60002 total/2700089 - elapsed_ms: 40 - items/ms: 1500
0215 00:22:36.284147  INFO [polling till empty] num_during_iter: 2299911 - removed: cycle/60002 total/2760091 - elapsed_ms: 39 - items/ms: 1538
0215 00:22:39.285287  INFO [polling till empty] num_during_iter: 2239909 - removed: cycle/60002 total/2820093 - elapsed_ms: 40 - items/ms: 1500
0215 00:22:42.284914  INFO [polling till empty] num_during_iter: 2179907 - removed: cycle/60002 total/2880095 - elapsed_ms: 40 - items/ms: 1500
0215 00:22:44.275285  INFO [polling till empty] num_during_iter: 2119905 - removed: cycle/44856 total/2924951 - elapsed_ms: 31 - items/ms: 1446
0215 00:22:45.255715  INFO [polling till empty] num_during_iter: 2075049 - removed: cycle/15146 total/2940097 - elapsed_ms: 11 - items/ms: 1376
0215 00:22:48.293616  INFO [polling till empty] num_during_iter: 2059903 - removed: cycle/60002 total/3000099 - elapsed_ms: 49 - items/ms: 1224
0215 00:22:51.292952  INFO [polling till empty] num_during_iter: 1999901 - removed: cycle/60002 total/3060101 - elapsed_ms: 48 - items/ms: 1250
0215 00:22:54.287375  INFO [polling till empty] num_during_iter: 1939899 - removed: cycle/60002 total/3120103 - elapsed_ms: 43 - items/ms: 1395
inner loop longer now 24ms
inner loop longer now 23ms
inner loop longer now 22ms
inner loop longer now 24ms
inner loop longer now 24ms
0215 00:22:56.424655  INFO [polling till empty] num_during_iter: 1879897 - removed: cycle/60002 total/3180105 - elapsed_ms: 179 - items/ms: 335
inner loop longer now 24ms
inner loop longer now 25ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 22ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 22ms
inner loop longer now 22ms
inner loop longer now 23ms
inner loop longer now 22ms
inner loop longer now 24ms
inner loop longer now 23ms
inner loop longer now 22ms
inner loop longer now 22ms
inner loop longer now 22ms
inner loop longer now 22ms
inner loop longer now 23ms
inner loop longer now 22ms
inner loop longer now 21ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 24ms
inner loop longer now 22ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 24ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 22ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 22ms
inner loop longer now 22ms
inner loop longer now 23ms
inner loop longer now 24ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 22ms
inner loop longer now 22ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 22ms
inner loop longer now 23ms
inner loop longer now 23ms
inner loop longer now 22ms
other work in expiring entries took 1461
0215 00:22:59.711709  INFO [polling till empty] num_during_iter: 1819895 - removed: cycle/60002 total/3240107 - elapsed_ms: 1467 - items/ms: 40
inner loop longer now 25ms
inner loop longer now 23ms
0215 00:23:01.420114  INFO [polling till empty] num_during_iter: 1759893 - removed: cycle/60002 total/3300109 - elapsed_ms: 175 - items/ms: 342
0215 00:23:03.280751  INFO [polling till empty] num_during_iter: 1699891 - removed: cycle/16633 total/3316742 - elapsed_ms: 36 - items/ms: 462
0215 00:23:04.334370  INFO [polling till empty] num_during_iter: 1683258 - removed: cycle/43369 total/3360111 - elapsed_ms: 89 - items/ms: 487
0215 00:23:06.374591  INFO [polling till empty] num_during_iter: 1639889 - removed: cycle/60002 total/3420113 - elapsed_ms: 130 - items/ms: 461
0215 00:23:09.370134  INFO [polling till empty] num_during_iter: 1579887 - removed: cycle/60002 total/3480115 - elapsed_ms: 125 - items/ms: 480
0215 00:23:11.369464  INFO [polling till empty] num_during_iter: 1519885 - removed: cycle/60002 total/3540117 - elapsed_ms: 125 - items/ms: 480
0215 00:23:14.351265  INFO [polling till empty] num_during_iter: 1459883 - removed: cycle/60002 total/3600119 - elapsed_ms: 106 - items/ms: 566
0215 00:23:16.311857  INFO [polling till empty] num_during_iter: 1399881 - removed: cycle/60002 total/3660121 - elapsed_ms: 67 - items/ms: 895
0215 00:23:19.302834  INFO [polling till empty] num_during_iter: 1339879 - removed: cycle/60002 total/3720123 - elapsed_ms: 58 - items/ms: 1034
0215 00:23:23.287930  INFO [polling till empty] num_during_iter: 1279877 - removed: cycle/60002 total/3780125 - elapsed_ms: 43 - items/ms: 1395
0215 00:23:24.286809  INFO [polling till empty] num_during_iter: 1219875 - removed: cycle/60002 total/3840127 - elapsed_ms: 42 - items/ms: 1428
0215 00:23:27.289109  INFO [polling till empty] num_during_iter: 1159873 - removed: cycle/60002 total/3900129 - elapsed_ms: 45 - items/ms: 1333
0215 00:23:29.289941  INFO [polling till empty] num_during_iter: 1099871 - removed: cycle/60002 total/3960131 - elapsed_ms: 44 - items/ms: 1363
0215 00:23:32.289917  INFO [polling till empty] num_during_iter: 1039869 - removed: cycle/60002 total/4020133 - elapsed_ms: 45 - items/ms: 1333
0215 00:23:35.287222  INFO [polling till empty] num_during_iter: 979867 - removed: cycle/60002 total/4080135 - elapsed_ms: 42 - items/ms: 1428
0215 00:23:37.285451  INFO [polling till empty] num_during_iter: 919865 - removed: cycle/60002 total/4140137 - elapsed_ms: 40 - items/ms: 1500
0215 00:23:40.289295  INFO [polling till empty] num_during_iter: 859863 - removed: cycle/60002 total/4200139 - elapsed_ms: 45 - items/ms: 1333
0215 00:23:42.288687  INFO [polling till empty] num_during_iter: 799861 - removed: cycle/60002 total/4260141 - elapsed_ms: 44 - items/ms: 1363
0215 00:23:45.288637  INFO [polling till empty] num_during_iter: 739859 - removed: cycle/60002 total/4320143 - elapsed_ms: 44 - items/ms: 1363
0215 00:23:47.272691  INFO [polling till empty] num_during_iter: 679857 - removed: cycle/38326 total/4358469 - elapsed_ms: 28 - items/ms: 1368
0215 00:23:48.261891  INFO [polling till empty] num_during_iter: 641531 - removed: cycle/21676 total/4380145 - elapsed_ms: 17 - items/ms: 1275
0215 00:23:50.288112  INFO [polling till empty] num_during_iter: 619855 - removed: cycle/60002 total/4440147 - elapsed_ms: 43 - items/ms: 1395
0215 00:23:53.290423  INFO [polling till empty] num_during_iter: 559853 - removed: cycle/60002 total/4500149 - elapsed_ms: 44 - items/ms: 1363
0215 00:23:55.289229  INFO [polling till empty] num_during_iter: 499851 - removed: cycle/60002 total/4560151 - elapsed_ms: 44 - items/ms: 1363
0215 00:23:58.289512  INFO [polling till empty] num_during_iter: 439849 - removed: cycle/60002 total/4620153 - elapsed_ms: 45 - items/ms: 1333
0215 00:24:00.308987  INFO [polling till empty] num_during_iter: 379847 - removed: cycle/60002 total/4680155 - elapsed_ms: 64 - items/ms: 937
0215 00:24:03.289557  INFO [polling till empty] num_during_iter: 319845 - removed: cycle/60002 total/4740157 - elapsed_ms: 44 - items/ms: 1363
inner loop longer now 12ms
inner loop longer now 11ms
inner loop longer now 12ms
inner loop longer now 12ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 12ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 12ms
inner loop longer now 12ms
inner loop longer now 12ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 12ms
inner loop longer now 11ms
inner loop longer now 12ms
inner loop longer now 12ms
inner loop longer now 12ms
other work in expiring entries took 525
0215 00:24:07.776642  INFO [polling till empty] num_during_iter: 259843 - removed: cycle/60002 total/4800159 - elapsed_ms: 532 - items/ms: 112
inner loop longer now 13ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 12ms
inner loop longer now 12ms
inner loop longer now 11ms
inner loop longer now 12ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
inner loop longer now 11ms
other work in expiring entries took 433
0215 00:24:08.683650  INFO [polling till empty] num_during_iter: 199841 - removed: cycle/60002 total/4860161 - elapsed_ms: 439 - items/ms: 136
0215 00:24:11.370386  INFO [polling till empty] num_during_iter: 139839 - removed: cycle/60002 total/4920163 - elapsed_ms: 125 - items/ms: 480
0215 00:24:13.372136  INFO [polling till empty] num_during_iter: 79837 - removed: cycle/60002 total/4980165 - elapsed_ms: 127 - items/ms: 472
0215 00:24:16.286089  INFO [polling till empty] num_during_iter: 19835 - removed: cycle/19835 total/5000000 - elapsed_ms: 41 - items/ms: 483
0215 00:24:17.245283  INFO total time syncing cache: 10957ms
0215 00:24:17.245334  INFO longest sync time seen: 3654ms

Results before changing it showed this:

0215 00:28:26.697036  INFO [serial] inserting 5000000 items
0215 00:28:35.977126  INFO done after 9280ms - size: 4999936
0215 00:28:35.977219  INFO [before sync] size: 5000000
0215 00:28:43.015708  INFO [polling till empty] num_during_iter: 5000000 - removed: cycle/60001 total/60001 - elapsed_ms: 36 - items/ms: 1666
0215 00:28:47.014463  INFO [polling till empty] num_during_iter: 4939999 - removed: cycle/60002 total/120003 - elapsed_ms: 35 - items/ms: 1714
0215 00:28:48.013772  INFO [polling till empty] num_during_iter: 4879997 - removed: cycle/60002 total/180005 - elapsed_ms: 35 - items/ms: 1714
0215 00:28:51.020654  INFO [polling till empty] num_during_iter: 4819995 - removed: cycle/60002 total/240007 - elapsed_ms: 42 - items/ms: 1428
0215 00:28:52.983376  INFO [polling till empty] num_during_iter: 4759993 - removed: cycle/3224 total/243231 - elapsed_ms: 4 - items/ms: 806
0215 00:28:54.016071  INFO [polling till empty] num_during_iter: 4756769 - removed: cycle/56778 total/300009 - elapsed_ms: 37 - items/ms: 1534
0215 00:28:56.019840  INFO [polling till empty] num_during_iter: 4699991 - removed: cycle/60002 total/360011 - elapsed_ms: 41 - items/ms: 1463
0215 00:28:59.022875  INFO [polling till empty] num_during_iter: 4639989 - removed: cycle/60002 total/420013 - elapsed_ms: 44 - items/ms: 1363
0215 00:29:01.020195  INFO [polling till empty] num_during_iter: 4579987 - removed: cycle/60002 total/480015 - elapsed_ms: 41 - items/ms: 1463
0215 00:29:04.019431  INFO [polling till empty] num_during_iter: 4519985 - removed: cycle/60002 total/540017 - elapsed_ms: 40 - items/ms: 1500
0215 00:29:07.020925  INFO [polling till empty] num_during_iter: 4459983 - removed: cycle/60002 total/600019 - elapsed_ms: 42 - items/ms: 1428
0215 00:29:09.021071  INFO [polling till empty] num_during_iter: 4399981 - removed: cycle/60002 total/660021 - elapsed_ms: 42 - items/ms: 1428
0215 00:29:12.021859  INFO [polling till empty] num_during_iter: 4339979 - removed: cycle/60002 total/720023 - elapsed_ms: 43 - items/ms: 1395
0215 00:29:13.992883  INFO [polling till empty] num_during_iter: 4279977 - removed: cycle/15786 total/735809 - elapsed_ms: 14 - items/ms: 1127
0215 00:29:16.010206  INFO [polling till empty] num_during_iter: 4264191 - removed: cycle/44216 total/780025 - elapsed_ms: 31 - items/ms: 1426
inner loop longer now 51ms
inner loop longer now 48ms
inner loop longer now 47ms
inner loop longer now 49ms
inner loop longer now 47ms
inner loop longer now 45ms
inner loop longer now 46ms
inner loop longer now 45ms
inner loop longer now 45ms
inner loop longer now 44ms
inner loop longer now 45ms
inner loop longer now 45ms
inner loop longer now 46ms
inner loop longer now 45ms
inner loop longer now 45ms
inner loop longer now 45ms
inner loop longer now 46ms
inner loop longer now 45ms
inner loop longer now 45ms
inner loop longer now 49ms
inner loop longer now 46ms
inner loop longer now 48ms
inner loop longer now 47ms
inner loop longer now 45ms
inner loop longer now 49ms
inner loop longer now 47ms
inner loop longer now 46ms
inner loop longer now 47ms
inner loop longer now 47ms
inner loop longer now 46ms
inner loop longer now 46ms
inner loop longer now 46ms
inner loop longer now 47ms
inner loop longer now 48ms
inner loop longer now 48ms
inner loop longer now 46ms
inner loop longer now 46ms
other work in expiring entries took 1808
0215 00:29:18.792774  INFO [polling till empty] num_during_iter: 4219975 - removed: cycle/60002 total/840027 - elapsed_ms: 1814 - items/ms: 33
inner loop longer now 56ms
inner loop longer now 49ms
inner loop longer now 48ms
inner loop longer now 48ms
inner loop longer now 46ms
inner loop longer now 51ms
inner loop longer now 50ms
inner loop longer now 47ms
inner loop longer now 49ms
inner loop longer now 47ms
inner loop longer now 47ms
inner loop longer now 49ms
inner loop longer now 48ms
inner loop longer now 49ms
inner loop longer now 49ms
inner loop longer now 49ms
inner loop longer now 49ms
inner loop longer now 49ms
inner loop longer now 49ms
inner loop longer now 50ms
inner loop longer now 52ms
inner loop longer now 48ms
inner loop longer now 47ms
inner loop longer now 50ms
inner loop longer now 54ms
inner loop longer now 57ms
inner loop longer now 56ms
other work in expiring entries took 1512
0215 00:29:21.497398  INFO [polling till empty] num_during_iter: 4159973 - removed: cycle/60002 total/900029 - elapsed_ms: 1518 - items/ms: 39
0215 00:29:23.139800  INFO [polling till empty] num_during_iter: 4099971 - removed: cycle/60002 total/960031 - elapsed_ms: 161 - items/ms: 372
0215 00:29:25.135986  INFO [polling till empty] num_during_iter: 4039969 - removed: cycle/60002 total/1020033 - elapsed_ms: 157 - items/ms: 382
0215 00:29:26.994142  INFO [polling till empty] num_during_iter: 3979967 - removed: cycle/4632 total/1024665 - elapsed_ms: 15 - items/ms: 308
0215 00:29:28.057601  INFO [polling till empty] num_during_iter: 3975335 - removed: cycle/55370 total/1080035 - elapsed_ms: 79 - items/ms: 700
0215 00:29:31.026661  INFO [polling till empty] num_during_iter: 3919965 - removed: cycle/60002 total/1140037 - elapsed_ms: 47 - items/ms: 1276
0215 00:29:33.017612  INFO [polling till empty] num_during_iter: 3859963 - removed: cycle/60002 total/1200039 - elapsed_ms: 39 - items/ms: 1538
0215 00:29:35.018555  INFO [polling till empty] num_during_iter: 3799961 - removed: cycle/60002 total/1260041 - elapsed_ms: 39 - items/ms: 1538
0215 00:29:38.020598  INFO [polling till empty] num_during_iter: 3739959 - removed: cycle/60002 total/1320043 - elapsed_ms: 42 - items/ms: 1428
0215 00:29:40.014173  INFO [polling till empty] num_during_iter: 3679957 - removed: cycle/52531 total/1372574 - elapsed_ms: 35 - items/ms: 1500
0215 00:29:40.982811  INFO [polling till empty] num_during_iter: 3627426 - removed: cycle/7471 total/1380045 - elapsed_ms: 4 - items/ms: 1867
0215 00:29:43.017916  INFO [polling till empty] num_during_iter: 3619955 - removed: cycle/60002 total/1440047 - elapsed_ms: 39 - items/ms: 1538
0215 00:29:46.019039  INFO [polling till empty] num_during_iter: 3559953 - removed: cycle/60002 total/1500049 - elapsed_ms: 40 - items/ms: 1500
0215 00:29:49.020344  INFO [polling till empty] num_during_iter: 3499951 - removed: cycle/60002 total/1560051 - elapsed_ms: 41 - items/ms: 1463
0215 00:29:51.019268  INFO [polling till empty] num_during_iter: 3439949 - removed: cycle/60002 total/1620053 - elapsed_ms: 40 - items/ms: 1500
inner loop longer now 45ms
inner loop longer now 45ms
inner loop longer now 40ms
inner loop longer now 41ms
inner loop longer now 39ms
inner loop longer now 37ms
inner loop longer now 41ms
inner loop longer now 38ms
inner loop longer now 44ms
inner loop longer now 41ms
inner loop longer now 40ms
inner loop longer now 37ms
inner loop longer now 40ms
inner loop longer now 37ms
inner loop longer now 36ms
inner loop longer now 38ms
inner loop longer now 37ms
inner loop longer now 38ms
inner loop longer now 39ms
inner loop longer now 38ms
inner loop longer now 37ms
inner loop longer now 37ms
inner loop longer now 40ms
inner loop longer now 37ms
inner loop longer now 40ms
inner loop longer now 37ms
inner loop longer now 39ms
inner loop longer now 39ms
inner loop longer now 38ms
inner loop longer now 38ms
inner loop longer now 38ms
inner loop longer now 39ms
inner loop longer now 38ms
inner loop longer now 40ms
inner loop longer now 37ms
inner loop longer now 38ms
inner loop longer now 38ms
inner loop longer now 38ms
inner loop longer now 37ms
inner loop longer now 37ms
inner loop longer now 39ms
other work in expiring entries took 1695
0215 00:29:55.679426  INFO [polling till empty] num_during_iter: 3379947 - removed: cycle/60002 total/1680055 - elapsed_ms: 1700 - items/ms: 35
inner loop longer now 44ms
inner loop longer now 41ms
inner loop longer now 43ms
inner loop longer now 42ms
inner loop longer now 42ms
inner loop longer now 45ms
inner loop longer now 44ms
inner loop longer now 38ms
inner loop longer now 40ms
inner loop longer now 40ms
inner loop longer now 38ms
inner loop longer now 38ms
inner loop longer now 40ms
inner loop longer now 41ms
inner loop longer now 40ms
inner loop longer now 39ms
inner loop longer now 39ms
inner loop longer now 40ms
inner loop longer now 38ms
inner loop longer now 39ms
inner loop longer now 38ms
inner loop longer now 37ms
inner loop longer now 38ms
other work in expiring entries took 987
0215 00:29:56.969572  INFO [polling till empty] num_during_iter: 3319945 - removed: cycle/18277 total/1698332 - elapsed_ms: 991 - items/ms: 18
0215 00:29:57.081888  INFO [polling till empty] num_during_iter: 3301668 - removed: cycle/41725 total/1740057 - elapsed_ms: 103 - items/ms: 405
0215 00:30:00.124419  INFO [polling till empty] num_during_iter: 3259943 - removed: cycle/60002 total/1800059 - elapsed_ms: 145 - items/ms: 413
0215 00:30:02.123934  INFO [polling till empty] num_during_iter: 3199941 - removed: cycle/60002 total/1860061 - elapsed_ms: 145 - items/ms: 413
0215 00:30:04.064641  INFO [polling till empty] num_during_iter: 3139939 - removed: cycle/60002 total/1920063 - elapsed_ms: 85 - items/ms: 705
0215 00:30:07.022588  INFO [polling till empty] num_during_iter: 3079937 - removed: cycle/60002 total/1980065 - elapsed_ms: 44 - items/ms: 1363
...<etc>
0215 00:32:20.978474  INFO total time syncing cache: 20188ms
0215 00:32:20.978507  INFO longest sync time seen: 1814ms

You'll notice the batch size is locked at 60,002, that's cuz I impl'd the Expiry trait to put a max cap on number of items to be evicted on any given run of .run_pending_tasks() when invoked manually cuz I was trying to restrain these spikes in latency that affect my service.

I'm happy to share the code I'm using for running this benchmark.

Would it be possible to make these settings tunable? What do they even mean? Can you think of anything that would solve my problem? I care more about the blocking nature of evictions than I do memory usage (depending on just how much more and the exact trade-off). I am doing I think ~6k inserts per second on an async Moka cache that can hold however many elements cuz I am bounding it by size in bytes of up to 8GB.

Thank you for any help

@tatsuya6502
Copy link
Member Author

tatsuya6502 commented Feb 15, 2025

Hi @texascloud

Thank you for reporting the issue. I moved your comment to this new issue from #79 (comment).

I'm happy to share the code I'm using for running this benchmark.

Yes, please share the code. I cannot tell what is going on without knowing more details. For example, is your cache has the eviction listener? run_pending_tasks behaves differently if the eviction listener is set or not (#417).

Another question; what is the tbc?

if tbc >= 25_000.0 || tbc / real_cap >= 0.1

Thanks!

@tatsuya6502
Copy link
Member Author

Another question; what is the tbc?

if tbc >= 25_000.0 || tbc / real_cap >= 0.1

Never mind. I realized that it was taken from moka's code:

https://github.com/moka-rs/moka/blob/v0.12.10/src/cht/map/bucket.rs#L733C1-L733C54

@tatsuya6502
Copy link
Member Author

tatsuya6502 commented Feb 15, 2025

Would it be possible to make these settings tunable? What do they even mean?

You are talking about this code, right?

if tbc >= 25_000.0 || tbc / real_cap >= 0.1

tbc here is the number of removals performed on the concurrent hash table since last reallocation of memory. This particular hash table needs to perform another reallocation to clean the garbage up.

Can you think of anything that would solve my problem?

I am not sure because I do not know what your benchmark program is doing. I will check when you share the code of the benchmark.

@texascloud
Copy link

Thanks for turning that comment into an issue! I have created a gist with my minimal repro as a CLI such that you can tweak the different values and observe the longest time an eviction takes based on those values.

https://gist.github.com/texascloud/8d16f92c9cbae3a220340a7abc472ea3

@texascloud
Copy link

FWIW it doesn't matter whether Expiry is set or not. I had added that logic cuz I originally thought the number of items being evicted in a batch was the deciding factor for controlling the upper-bound for how long a run of .run_pending_tasks() took. It turns out that the logic can run very fast so long as the internal data structure does not need to be re-balanced. So evicting hundreds of thousands of items looks to be pretty fast, maybe around 50ms, but evicting even 50,000 items will take at least 1,500ms if re-balancing of the internal data structure is triggered.

The way the re-balancing is manifesting feels like a "pause-the-world" GC, which gives unpredictably large tail latency. The fact .run_pending_tasks() takes up to 45,000ms (observed from actual service traffic) isn't itself an issue, but cuz the write/read channels are so small, they are entirely blocked until that logic finishes. That is what makes me really feel the pain of this.

If eviction had big duration spikes but didn't affect read/write perf of the cache, I wouldn't really care.

It's important to note too that I also tried setting both sync and async eviction listeners as a way to force the batch eviction logic + benefit from the 100ms duration timeout. That did not work since the re-balancing, once triggered, will eat up 1,500+ ms synchronously. There is no way for the timeout to cut short the re-balancing. The eviction listener I added was a no-op too.

@tatsuya6502
Copy link
Member Author

tatsuya6502 commented Feb 19, 2025

Hi. Thanks a lot for providing the minimal repro. I can reproduce the issue. I see some spikes in the elapsed time for the run_pending_tasks operation.

2025-02-19T12:00:23.950905Z  INFO moka_test: longest sync time seen: 976ms
moka::future::Cache result
$ cargo run --release -- --entries 5000000 --time-till-inserts-done-ms 16000 --poll-till-empty-ms 1000

2025-02-19T11:59:18.637474Z  INFO moka_test: [serial] inserting 5000000 items
2025-02-19T11:59:22.949546Z  INFO moka_test: done after 4312ms - size: 4999936
2025-02-19T11:59:22.949588Z  INFO moka_test: [before sync] size: 5000000
2025-02-19T11:59:35.065602Z  INFO moka_test: [polling till empty] num_during_iter: 5000000 - removed: cycle/168998 total/168998 - elapsed_ms: 114 - items/ms: 1482
2025-02-19T11:59:37.035270Z  INFO moka_test: [polling till empty] num_during_iter: 4831002 - removed: cycle/169000 total/337998 - elapsed_ms: 84 - items/ms: 2011
2025-02-19T11:59:39.010473Z  INFO moka_test: [polling till empty] num_during_iter: 4662002 - removed: cycle/169000 total/506998 - elapsed_ms: 58 - items/ms: 2913
2025-02-19T11:59:40.029596Z  INFO moka_test: [polling till empty] num_during_iter: 4493002 - removed: cycle/169000 total/675998 - elapsed_ms: 78 - items/ms: 2166
2025-02-19T11:59:42.927379Z  INFO moka_test: [polling till empty] num_during_iter: 4324002 - removed: cycle/169000 total/844998 - elapsed_ms: 976 - items/ms: 173
2025-02-19T11:59:43.315642Z  INFO moka_test: [polling till empty] num_during_iter: 4155002 - removed: cycle/169000 total/1013998 - elapsed_ms: 364 - items/ms: 464
2025-02-19T11:59:45.152806Z  INFO moka_test: [polling till empty] num_during_iter: 3986002 - removed: cycle/169000 total/1182998 - elapsed_ms: 201 - items/ms: 840
2025-02-19T11:59:45.995010Z  INFO moka_test: [polling till empty] num_during_iter: 3817002 - removed: cycle/61311 total/1244309 - elapsed_ms: 44 - items/ms: 1393
2025-02-19T11:59:47.004119Z  INFO moka_test: [polling till empty] num_during_iter: 3755691 - removed: cycle/107689 total/1351998 - elapsed_ms: 52 - items/ms: 2070
2025-02-19T11:59:48.034743Z  INFO moka_test: [polling till empty] num_during_iter: 3648002 - removed: cycle/169000 total/1520998 - elapsed_ms: 84 - items/ms: 2011
2025-02-19T11:59:50.861371Z  INFO moka_test: [polling till empty] num_during_iter: 3479002 - removed: cycle/169000 total/1689998 - elapsed_ms: 910 - items/ms: 185
2025-02-19T11:59:51.088165Z  INFO moka_test: [polling till empty] num_during_iter: 3310002 - removed: cycle/41331 total/1731329 - elapsed_ms: 136 - items/ms: 303
2025-02-19T11:59:52.082248Z  INFO moka_test: [polling till empty] num_during_iter: 3268671 - removed: cycle/127669 total/1858998 - elapsed_ms: 130 - items/ms: 982
2025-02-19T11:59:54.046885Z  INFO moka_test: [polling till empty] num_during_iter: 3141002 - removed: cycle/169000 total/2027998 - elapsed_ms: 95 - items/ms: 1778
2025-02-19T11:59:55.022306Z  INFO moka_test: [polling till empty] num_during_iter: 2972002 - removed: cycle/169000 total/2196998 - elapsed_ms: 71 - items/ms: 2380
2025-02-19T11:59:55.982467Z  INFO moka_test: [polling till empty] num_during_iter: 2803002 - removed: cycle/27992 total/2224990 - elapsed_ms: 31 - items/ms: 902
2025-02-19T11:59:57.007788Z  INFO moka_test: [polling till empty] num_during_iter: 2775010 - removed: cycle/141008 total/2365998 - elapsed_ms: 56 - items/ms: 2518
2025-02-19T11:59:58.594269Z  INFO moka_test: [polling till empty] num_during_iter: 2634002 - removed: cycle/169000 total/2534998 - elapsed_ms: 643 - items/ms: 262
2025-02-19T12:00:00.201893Z  INFO moka_test: [polling till empty] num_during_iter: 2465002 - removed: cycle/169000 total/2703998 - elapsed_ms: 250 - items/ms: 676
2025-02-19T12:00:01.055792Z  INFO moka_test: [polling till empty] num_during_iter: 2296002 - removed: cycle/169000 total/2872998 - elapsed_ms: 104 - items/ms: 1625
2025-02-19T12:00:03.278619Z  INFO moka_test: [polling till empty] num_during_iter: 2127002 - removed: cycle/68834 total/2941832 - elapsed_ms: 326 - items/ms: 211
2025-02-19T12:00:04.181376Z  INFO moka_test: [polling till empty] num_during_iter: 2058168 - removed: cycle/100166 total/3041998 - elapsed_ms: 231 - items/ms: 433
2025-02-19T12:00:05.017919Z  INFO moka_test: [polling till empty] num_during_iter: 1958002 - removed: cycle/169000 total/3210998 - elapsed_ms: 66 - items/ms: 2560
2025-02-19T12:00:08.414076Z  INFO moka_test: [polling till empty] num_during_iter: 1789002 - removed: cycle/218346 total/3429344 - elapsed_ms: 463 - items/ms: 471
2025-02-19T12:00:09.016506Z  INFO moka_test: [polling till empty] num_during_iter: 1570656 - removed: cycle/119654 total/3548998 - elapsed_ms: 65 - items/ms: 1840
2025-02-19T12:00:10.026342Z  INFO moka_test: [polling till empty] num_during_iter: 1451002 - removed: cycle/169000 total/3717998 - elapsed_ms: 74 - items/ms: 2283
2025-02-19T12:00:12.390498Z  INFO moka_test: [polling till empty] num_during_iter: 1282002 - removed: cycle/169000 total/3886998 - elapsed_ms: 440 - items/ms: 384
2025-02-19T12:00:13.250008Z  INFO moka_test: [polling till empty] num_during_iter: 1113002 - removed: cycle/169000 total/4055998 - elapsed_ms: 298 - items/ms: 567
2025-02-19T12:00:15.188425Z  INFO moka_test: [polling till empty] num_during_iter: 944002 - removed: cycle/169000 total/4224998 - elapsed_ms: 237 - items/ms: 713
2025-02-19T12:00:15.994421Z  INFO moka_test: [polling till empty] num_during_iter: 775002 - removed: cycle/45464 total/4270462 - elapsed_ms: 43 - items/ms: 1057
2025-02-19T12:00:17.059822Z  INFO moka_test: [polling till empty] num_during_iter: 729538 - removed: cycle/123536 total/4393998 - elapsed_ms: 108 - items/ms: 1143
2025-02-19T12:00:18.229627Z  INFO moka_test: [polling till empty] num_during_iter: 606002 - removed: cycle/169000 total/4562998 - elapsed_ms: 278 - items/ms: 607
2025-02-19T12:00:20.150004Z  INFO moka_test: [polling till empty] num_during_iter: 437002 - removed: cycle/169000 total/4731998 - elapsed_ms: 199 - items/ms: 849
2025-02-19T12:00:21.132597Z  INFO moka_test: [polling till empty] num_during_iter: 268002 - removed: cycle/169000 total/4900998 - elapsed_ms: 181 - items/ms: 933
2025-02-19T12:00:23.030711Z  INFO moka_test: [polling till empty] num_during_iter: 99002 - removed: cycle/99002 total/5000000 - elapsed_ms: 80 - items/ms: 1237
2025-02-19T12:00:23.950851Z  INFO moka_test: total time syncing cache: 7570ms
2025-02-19T12:00:23.950905Z  INFO moka_test: longest sync time seen: 976ms
2025-02-19T12:00:23.950912Z  INFO moka_test: [after sync] size: 0 - elapsed_ms: 61001
2025-02-19T12:00:23.950918Z  INFO moka_test: total duration: 65314ms

The way the re-balancing is manifesting feels like a "pause-the-world" GC, which gives unpredictably large tail latency.

That is right. moka's internal concurrent hash table (called cht) does blocking rehashing, so all write operations (e.g. insert, remove) to the cht are blocked while rehashing. The original author of cht tried to make it non-blocking (incremental), but it was buggy. So after forking it, I had to make it blocking to fix the bug (#34 (comment)). Even worse, I had to add the expression if tbc >= 25_000.0 || tbc / real_cap >= 0.1 to do copying garbage collection to prevent memory utilization issue (#72 (comment)), and it made the rehashing more frequent.

I am currently replacing cht with a new concurrent hash table implementation called papaya (#385 (comment)). I hope this will solve your issue. papaya does incremental rehashing by default to provide predictable and consistent latency. It also does not have to do the copying garbage collection.

https://github.com/ibraheemdev/papaya?tab=readme-ov-file#performance

I have only done some work with moka::sync::Cache so far, but it seems promising. I ran your test with moka::sync::Cache with papaya, and here is a result:

2025-02-19T13:24:18.530186Z  INFO moka_test: longest sync time seen: 227ms

As you can see, the elapse times are more stable. Also, papaya does incremental rehashing, so it does not block other operations though your repro does not tell the difference because it does not do inserts while rehashing.

moka::sync::Cache with papaya
025-02-19T13:23:10.391588Z  INFO moka_test: [serial] inserting 5000000 items
2025-02-19T13:23:17.528165Z  INFO moka_test: done after 7136ms - size: 4999936
2025-02-19T13:23:17.528186Z  INFO moka_test: [before sync] size: 5000000
2025-02-19T13:23:26.583867Z  INFO moka_test: [polling till empty] num_during_iter: 5000000 - removed: cycle/168998 total/168998 - elapsed_ms: 53 - items/ms: 3188
2025-02-19T13:23:28.647459Z  INFO moka_test: [polling till empty] num_during_iter: 4831002 - removed: cycle/169000 total/337998 - elapsed_ms: 117 - items/ms: 1444
2025-02-19T13:23:29.562388Z  INFO moka_test: [polling till empty] num_during_iter: 4662002 - removed: cycle/15680 total/353678 - elapsed_ms: 29 - items/ms: 540
2025-02-19T13:23:30.614620Z  INFO moka_test: [polling till empty] num_during_iter: 4646322 - removed: cycle/153320 total/506998 - elapsed_ms: 84 - items/ms: 1825
2025-02-19T13:23:31.617589Z  INFO moka_test: [polling till empty] num_during_iter: 4493002 - removed: cycle/169000 total/675998 - elapsed_ms: 87 - items/ms: 1942
2025-02-19T13:23:33.637911Z  INFO moka_test: [polling till empty] num_during_iter: 4324002 - removed: cycle/169000 total/844998 - elapsed_ms: 108 - items/ms: 1564
2025-02-19T13:23:34.595553Z  INFO moka_test: [polling till empty] num_during_iter: 4155002 - removed: cycle/65085 total/910083 - elapsed_ms: 65 - items/ms: 1001
2025-02-19T13:23:35.608686Z  INFO moka_test: [polling till empty] num_during_iter: 4089917 - removed: cycle/103915 total/1013998 - elapsed_ms: 78 - items/ms: 1332
2025-02-19T13:23:36.653665Z  INFO moka_test: [polling till empty] num_during_iter: 3986002 - removed: cycle/169000 total/1182998 - elapsed_ms: 123 - items/ms: 1373
2025-02-19T13:23:38.650928Z  INFO moka_test: [polling till empty] num_during_iter: 3817002 - removed: cycle/169000 total/1351998 - elapsed_ms: 120 - items/ms: 1408
2025-02-19T13:23:39.555550Z  INFO moka_test: [polling till empty] num_during_iter: 3648002 - removed: cycle/1042 total/1353040 - elapsed_ms: 25 - items/ms: 41
2025-02-19T13:23:40.660179Z  INFO moka_test: [polling till empty] num_during_iter: 3646960 - removed: cycle/167958 total/1520998 - elapsed_ms: 130 - items/ms: 1291
2025-02-19T13:23:41.646587Z  INFO moka_test: [polling till empty] num_during_iter: 3479002 - removed: cycle/169000 total/1689998 - elapsed_ms: 116 - items/ms: 1456
2025-02-19T13:23:43.671963Z  INFO moka_test: [polling till empty] num_during_iter: 3310002 - removed: cycle/169000 total/1858998 - elapsed_ms: 141 - items/ms: 1198
2025-02-19T13:23:45.682960Z  INFO moka_test: [polling till empty] num_during_iter: 3141002 - removed: cycle/169000 total/2027998 - elapsed_ms: 154 - items/ms: 1097
2025-02-19T13:23:46.622202Z  INFO moka_test: [polling till empty] num_during_iter: 2972002 - removed: cycle/69134 total/2097132 - elapsed_ms: 92 - items/ms: 751
2025-02-19T13:23:47.634510Z  INFO moka_test: [polling till empty] num_during_iter: 2902868 - removed: cycle/99866 total/2196998 - elapsed_ms: 104 - items/ms: 960
2025-02-19T13:23:48.682305Z  INFO moka_test: [polling till empty] num_during_iter: 2803002 - removed: cycle/169000 total/2365998 - elapsed_ms: 152 - items/ms: 1111
2025-02-19T13:23:50.674933Z  INFO moka_test: [polling till empty] num_during_iter: 2634002 - removed: cycle/169000 total/2534998 - elapsed_ms: 144 - items/ms: 1173
2025-02-19T13:23:52.672605Z  INFO moka_test: [polling till empty] num_during_iter: 2465002 - removed: cycle/169000 total/2703998 - elapsed_ms: 142 - items/ms: 1190
2025-02-19T13:23:53.670211Z  INFO moka_test: [polling till empty] num_during_iter: 2296002 - removed: cycle/88497 total/2792495 - elapsed_ms: 141 - items/ms: 627
2025-02-19T13:23:54.622296Z  INFO moka_test: [polling till empty] num_during_iter: 2207505 - removed: cycle/80503 total/2872998 - elapsed_ms: 92 - items/ms: 875
2025-02-19T13:23:55.674065Z  INFO moka_test: [polling till empty] num_during_iter: 2127002 - removed: cycle/169000 total/3041998 - elapsed_ms: 144 - items/ms: 1173
2025-02-19T13:23:57.682480Z  INFO moka_test: [polling till empty] num_during_iter: 1958002 - removed: cycle/169000 total/3210998 - elapsed_ms: 152 - items/ms: 1111
2025-02-19T13:23:59.685746Z  INFO moka_test: [polling till empty] num_during_iter: 1789002 - removed: cycle/169000 total/3379998 - elapsed_ms: 156 - items/ms: 1083
2025-02-19T13:24:00.611376Z  INFO moka_test: [polling till empty] num_during_iter: 1620002 - removed: cycle/76780 total/3456778 - elapsed_ms: 81 - items/ms: 947
2025-02-19T13:24:01.619402Z  INFO moka_test: [polling till empty] num_during_iter: 1543222 - removed: cycle/92220 total/3548998 - elapsed_ms: 90 - items/ms: 1024
2025-02-19T13:24:02.696418Z  INFO moka_test: [polling till empty] num_during_iter: 1451002 - removed: cycle/169000 total/3717998 - elapsed_ms: 167 - items/ms: 1011
2025-02-19T13:24:04.701066Z  INFO moka_test: [polling till empty] num_during_iter: 1282002 - removed: cycle/169000 total/3886998 - elapsed_ms: 170 - items/ms: 994
2025-02-19T13:24:06.714963Z  INFO moka_test: [polling till empty] num_during_iter: 1113002 - removed: cycle/169000 total/4055998 - elapsed_ms: 185 - items/ms: 913
2025-02-19T13:24:08.706211Z  INFO moka_test: [polling till empty] num_during_iter: 944002 - removed: cycle/169000 total/4224998 - elapsed_ms: 177 - items/ms: 954
2025-02-19T13:24:09.562664Z  INFO moka_test: [polling till empty] num_during_iter: 775002 - removed: cycle/2523 total/4227521 - elapsed_ms: 33 - items/ms: 76
2025-02-19T13:24:10.757244Z  INFO moka_test: [polling till empty] num_during_iter: 772479 - removed: cycle/166477 total/4393998 - elapsed_ms: 227 - items/ms: 733
2025-02-19T13:24:12.715065Z  INFO moka_test: [polling till empty] num_during_iter: 606002 - removed: cycle/169000 total/4562998 - elapsed_ms: 185 - items/ms: 913
2025-02-19T13:24:13.709420Z  INFO moka_test: [polling till empty] num_during_iter: 437002 - removed: cycle/169000 total/4731998 - elapsed_ms: 179 - items/ms: 944
2025-02-19T13:24:15.720519Z  INFO moka_test: [polling till empty] num_during_iter: 268002 - removed: cycle/169000 total/4900998 - elapsed_ms: 191 - items/ms: 884
2025-02-19T13:24:17.674445Z  INFO moka_test: [polling till empty] num_during_iter: 99002 - removed: cycle/99002 total/5000000 - elapsed_ms: 145 - items/ms: 682
2025-02-19T13:24:18.530160Z  INFO moka_test: total time syncing cache: 4579ms
2025-02-19T13:24:18.530186Z  INFO moka_test: longest sync time seen: 227ms
2025-02-19T13:24:18.530205Z  INFO moka_test: [after sync] size: 0 - elapsed_ms: 61002
2025-02-19T13:24:18.530209Z  INFO moka_test: total duration: 68140ms

You can try it by yourself by the following steps:

  1. Replace the moka dependency in your Cargo.toml:
    moka = { git = "https://gitlab.com/moka-labs/moka-gh385-seize/moka.git", branch = "papaya", features = ["sync"] }
  2. Replace the moka::future::Cache with moka::sync::Cache.
  3. Remove all .await from the insert and run_pending_tasks calls.

@texascloud
Copy link

texascloud commented Feb 19, 2025

Thanks for taking a look!

I realized I needed to perform reads/writes on the cache while the expirations were happening. I only focused on reads for one benchmark as I primarily read from the cache. This is the code I added:

Spawned thread to read every 1ms and print measurements

                let cache = moka::future::Cache::<String, u64>::builder()
                    .weigher(|_key: &String, value: &u64| -> u32 { size_of_val(value) as u32 })
                    .initial_capacity(131_072)
                    .max_capacity(10_000_000_000)
                    .eviction_policy(moka::policy::EvictionPolicy::lru())
                    .expire_after(ExpirationPolicy::new(
                        Duration::from_secs(16),
                        cmd.eviction_batch_size - 1,
                    ))
                    .build();

                let read_cache = cache.clone();
                let (read_tx, mut rx) = tokio::sync::oneshot::channel::<bool>();
                // A task which constantly reads/writes to the cache and measures
                // the max time.
                let max_read_latency = tokio::spawn(async move {
                    let mut max_latency = 0;
                    let mut reads = 0;
                    let mut slow_reads = 0;
                    let mut total_elapsed_ms = 0;
                    let mut derp = 0;
                    loop {
                        if rx.try_recv().is_ok() {
                            break;
                        }
                        let start = std::time::Instant::now();
                        let _ = read_cache.get("100").await;
                        let elapsed = start.elapsed();
                        let elapsed_ms = elapsed.as_millis();
                        reads += 1;
                        total_elapsed_ms += elapsed_ms;
                        derp += elapsed.as_micros();
                        if total_elapsed_ms > 1000 {
                            eprintln!("reads: {reads}");
                            total_elapsed_ms -= 1000;
                        }
                        max_latency = max_latency.max(elapsed_ms as u64);
                        if elapsed_ms > 300 {
                            slow_reads += 1;
                            eprintln!("read took {:?}ms", elapsed_ms);
                        }
                        tokio::time::sleep(tokio::time::Duration::from_millis(1)).await;
                    }
                    if reads > 0 && slow_reads > 0 {
                        eprintln!(
                            "slow reads: {slow_reads} - total reads: {reads} - accum read time: {derp}us - avg read time: {:.4}us - slow reads %: {:.2}",
                            derp / reads,
                            (slow_reads as f64 / reads as f64) * 100.0
                        );
                    }
                    max_latency
                });
    // ... other code that polls till cache is empty .....
                read_tx.send(true).unwrap();
                let max_read_latency = max_read_latency.await?;
                tracing::info!("max_read_latency: {:?}", max_read_latency);

The code above shows the real impact of the GC-style rehashing of the cht as used in futures::Cache. I would see output like this:

// NOTE: slow read == > 300ms
slow reads: 14 - total reads: 26,308 - accum read time: 17,380,000us - avg read time: 0ms - slow reads %: 0.05
0218 16:48:59.648532  INFO max_read_latency: Ok(3547)

I did an experiment where I used moka::sync::SegmentedCache since based on the code it looked like it would make it less likely for me to have a read/write affected by the rehashing cuz there are N cht internal to this cache. That showed significantly better results, though I did not randomize the key I was using, so the numbers only represent reading from one of the 1024 internal cht. Still, it seems like I can reduce the likelihood of a read/write being run while a rehashing is occurring by a factor of N where N equals the number of segments I have. I am not sure what the baseline overhead is for having 1024 segments is though.

For comparison, the sync test I did with SegmentedCache using 1024 segments gave these results:

// NOTE: slow read == > 1ms
slow reads: 1 - total reads: 74,208 - accum read time: 148,222us - avg read time: 1us - slow reads %: 0.00
0218 17:10:26.282540  INFO max_read_latency: Ok(2)

I am excited to see you are exploring an alternative to the existing cht solution wherein incremental rehashing is done.

For context, here's what kind of workload I am using Moka for:
On a machine with 64GB memory and 26 cores, we have a Rust server that receives ~70,000 QPS and has little tolerance for increases in p99 latencies for a majority of the traffic.

I am interested in trying Moka with papaya once it's ready. In order to use it, I'd need you to publish a release so I can vendor it internally :)

@tatsuya6502
Copy link
Member Author

I realized I needed to perform reads/writes on the cache while the expirations were happening. I only focused on reads for one benchmark as I primarily read from the cache. This is the code I added:

I have not had a chance to try the new code, but I will do so this weekend. (It is Friday night here in the China mainland.)

If moka with papaya works better than with the original cht, I will try to release a new version of moka with papaya as soon as I can. Otherwise, I will try to improve the copying-GC of cht to make it more efficient. (I will probably make it to run less frequently)

@ibraheemdev
Copy link

2025-02-19T13:24:18.530186Z INFO moka_test: longest sync time seen: 227ms

This is interesting, 227ms is still longer than I expect for papaya with the default seize batch size. I wonder if it's related to an issue I've noticed where the mbarrier instruction used by seize internally can introduce large latency spikes. This can be disabled by disabling the fast-barrier feature flag on seize, it might lower that number. Note that this will increase the latency of all operations by around ~20% depending on the machine, but it may be worth it for some users.

@tatsuya6502
Copy link
Member Author

tatsuya6502 commented Feb 22, 2025

My current random thoughts.

What is happening?

  1. run_pending_tasks takes too long when:
    • a. the cache is very large.
    • b. the cache has many entries to evict.
  2. cht's copying GC makes 1-a and 1-b worse.
  3. This change should have made 1-b worse:
  4. Slow reads (get) might be mostly caused by implicit execution of run_pending_tasks.
    • In [email protected], run_pending_tasks can be called in get and insert operations. (doc)
    • Until [email protected], run_pending_tasks was called by dedicated threads in the background, so get and insert never called run_pending_tasks.
  5. cht's blocking rehashing will cause latency spikes in general.
    • But if I remember correctly, cht's rehashing does not block reads. (It causes reads to retry though).
    • So this might be unrelated to the slow reads 4?

The slow reads 4 is the most concerning issue.

What can be done to fix or mitigate the issues?

  • a. I think the most effective way to fix issue 4 (slow reads) is to provide an option to do run_pending_tasks in background. (like [email protected] and earlier)
    • This will also make the issue 1 (slow run_pending_tasks) less severe.
  • b. To mitigate the issue caused by 3:
      1. Add yield in the loop in run_pending_tasks so that an async executor can run other async tasks.
      1. moka can provide an option to have max number of entries to evict in one run_pending_tasks call.
  • c. For future cache, doing above a and b means that moka has to depends on an async runtime, such as tokio or async-std.
    • This is because it will need sleep and yield (and spawn?) from an async runtime.
    • We can make it abstract over the runtime by using async_executors crate, then provide an implementation for tokio (for now).
    • For now, to do a, moka can just provide an option not to do run_pending_tasks implicitly. Then let the user to implement an async task that periodically calls run_pending_tasks.
      • When the write channel is full, cache writes is blocked and the cache has to run run_pending_tasks immediately. This is not possible if the user controls when to run run_pending_tasks
  • d. To fix the issue caused by cht (2 and 5), we will replace cht with papaya.

@tatsuya6502
Copy link
Member Author

I copied the repro to here and will start to modify:

@tatsuya6502
Copy link
Member Author

tatsuya6502 commented Feb 22, 2025

  1. Slow reads (get) might be mostly caused by implicit execution of run_pending_tasks.
    • In [email protected], run_pending_tasks can be called in get and insert operations. (doc)
    • Until [email protected], run_pending_tasks was called by dedicated threads in the background, so get and insert never called run_pending_tasks.

I confirmed this is true.

With implicit run_pending_tasks calls

I ran a modified version of the repro. This run had a --no-run-pending-tasks flag to disable the explicit run_pending_tasks calls. So run_pending_tasks was called only implicitly by get method.

  • There were 7 slow reads.
    • Slow read is a get operation that took longer than 300,000 μs (0.3 s).
  • The max read latency was 1,019,595 μs (1.02 s).
$ cargo run --bin main3 --release -- --entries 5000000 --poll-till-empty-ms 1000 --no-run-pending-tasks
...
slow reads: 7 - total reads: 24610 - accum read time: 7783562us - avg read time: 316us - slow reads %: 0.0003

read latency histogram (μs):

[      0,       1):   4736
[      1,       2):   8514
[      2,       4):   8015
[      4,       8):   2545
[      8,      16):    254
[     16,      32):    105
[     32,      64):    235
[     64,     128):     43
[    128,     256):     24
[    256,     512):     25
[    512,    1024):     49
[   1024,    2048):     14
[   2048,    4096):      2
[   4096,    8192):      2
[   8192,   16384):      6
[  16384,   32768):      8
[  32768,   65536):      6
[  65536,  131072):     13
[ 131072,  262144):      5
[ 262144,  524288):      6
[ 524288, 1048576):      3
[1048576, 2097152):      0
[2097152, inf):      0
2025-02-22T10:52:01.772943Z  INFO main3: max_read_latency: 1019595μs
2025-02-22T10:52:01.772950Z  INFO main3: total duration: 63626ms

Without implicit run_pending_tasks calls

I ran the same repro without the --no-run-pending-tasks flag, and used a modified version of [email protected], which disables the implicit run_pending_tasks calls in get method.

diff --git a/src/future/base_cache.rs b/src/future/base_cache.rs
index f52ceeb..30244aa 100644
--- a/src/future/base_cache.rs
+++ b/src/future/base_cache.rs
@@ -463,7 +463,7 @@ where
         op: ReadOp<K, V>,
         now: Instant,
     ) -> Result<(), TrySendError<ReadOp<K, V>>> {
-        self.apply_reads_if_needed(&self.inner, now).await;
+        // self.apply_reads_if_needed(&self.inner, now).await;
         let ch = &self.read_op_ch;
         match ch.try_send(op) {
             // Discard the ReadOp when the channel is full.
  • There was no slow read.
  • The max read latency was 2,865 μs (0.003 s).
$ cargo run --bin main3 --release -- --entries 5000000 --poll-till-empty-ms 1000
...

slow reads: 0 - total reads: 28487 - accum read time: 87728us - avg read time: 3us - slow reads %: 0.0000

read latency histogram (μs):

[      0,       1):   8476
[      1,       2):  14150
[      2,       4):   4614
[      4,       8):    995
[      8,      16):     78
[     16,      32):     47
[     32,      64):     36
[     64,     128):     25
[    128,     256):     17
[    256,     512):     11
[    512,    1024):     20
[   1024,    2048):     13
[   2048,    4096):      5
[   4096,    8192):      0
[   8192,   16384):      0
[  16384,   32768):      0
[  32768,   65536):      0
[  65536,  131072):      0
[ 131072,  262144):      0
[ 262144,  524288):      0
[ 524288, 1048576):      0
[1048576, 2097152):      0
[2097152, inf):      0
2025-02-22T10:55:47.332187Z  INFO main3: max_read_latency: 2865μs
2025-02-22T10:55:47.332202Z  INFO main3: total duration: 64635ms

@tatsuya6502 tatsuya6502 changed the title The future::Cache::run_pending_task method occasionally takes significantly longer under heavy writes when Expiry is set (?) Tail latency issue on reads and writes due to slow run_pending_tasks Feb 24, 2025
@tatsuya6502
Copy link
Member Author

I am currently replacing cht with a new concurrent hash table implementation called papaya.

You can try it by yourself by the following steps:

  1. Replace the moka dependency in your Cargo.toml:
    moka = { git = "https://gitlab.com/moka-labs/moka-gh385-seize/moka.git", branch = "papaya", features = ["sync"] }
  2. Replace the moka::future::Cache with moka::sync::Cache.
  3. Remove all .await from the insert and run_pending_tasks calls.

Yesterday, I pushed a commit to the above repository to replace cht in future::Cache with papaya, so now I can run the repro using future::Cache.

$ cargo run --bin main3-papaya --release -- --entries 5000000 --poll-till-empty-ms 1000 \
      --no-run-pending-tasks

However, I got similar result to the one with cht. I think I need to run more threads to do concurrent reads (and maybe writes) to see the difference between cht's blocking rehashing and papaya's incremental rehashing.

Anyway, I started to work on the following today.

  • a. I think the most effective way to fix issue 4 (slow reads) is to provide an option to do run_pending_tasks in background. (like [email protected] and earlier)
    • This will also make the issue 1 (slow run_pending_tasks) less severe.

But instead of providing such an option to run_pending_tasks in background, I am going to provide some tuning parameters for run_pending_tasks, so that run_pending_tasks can skip time-consuming tasks when called implicitly by a cache read.

It might look like the following:

let cfg = MaintenanceConfig::builder()
    // Configures the `run_pending_tasks` implicitly called within a read
    // operation such as `get` and `get_with`.
    .auto_run_on_reads(
        AutoRunConfig::builder()
            // Default is to repeat the following 4 times:
            //
            // 1. Process all pending read logs.
            // 2. Do other stuff (e.g. evictions).
            // 3. Go to 1.
            .process_pending_read_logs(ProcessingConfig::MaxCount(
                NonZeroUsize::new(1_000).unwrap(),
            ))
            // Default is the same as `process_pending_read_logs`.
            .process_pending_write_logs(ProcessingConfig::Disabled)
            // Default is `Unlimited`.
            .process_expired_entries(ProcessingConfig::Disabled)
            // Default is `Unlimited`.
            .process_invalidated_entries(ProcessingConfig::Disabled)
            .build(),
    )
    // Configures the `run_pending_tasks` implicitly called within a write
    // operation such as `insert`, `remove` or `get_with` resulting an insertion.
    .auto_run_on_writes(
        AutoRunConfig::builder()
            .process_pending_read_logs(ProcessingConfig::MaxCount(
                NonZeroUsize::new(10_000).unwrap(),
            ))
            .process_pending_write_logs(ProcessingConfig::MaxCount(
                NonZeroUsize::new(1_000).unwrap(),
            ))
            .process_expired_entries(ProcessingConfig::MaxCount(
                NonZeroUsize::new(2_000).unwrap(),
            ))
            .process_invalidated_entries(ProcessingConfig::Disabled)
            .build(),
    )
    // Configures the operation logs that need to be processed.
    .operation_logs(
        OperationLogsConfig::builder()
            // The maximum number of pending read logs that can be stored.
            // If the number of pending read logs exceeds this value, new
            // read logs will be discarded. Default is `384`
            .max_pending_read_logs(NonZeroUsize::new(10_000).unwrap())
            // The maximum number of pending write logs that can be stored.
            // If the number of pending write logs exceeds this value, write
            // operations will be blocked. Default is `384`.
            .max_pending_write_logs(NonZeroUsize::new(1_000).unwrap())
            // Configures the triggers that will cause the `run_pending_tasks`
            // method to be implicitly called within a read or write operation.
            .trigger_auto_run_when(
                AutoRunTriggers::builder()
                    // This trigger fires when the number of pending read logs
                    // exceeds the specified value. Default is `64`.
                    .pending_read_logs_exceed(NonZeroUsize::new(1_000).unwrap())
                    // This trigger fires when the number of pending write logs
                    // exceeds the specified value. Default is `64`.
                    .pending_write_logs_exceed(NonZeroUsize::new(250).unwrap())
                    // This trigger fires when the time past since the last
                    // `run_pending_tasks` call exceeds the specified duration.
                    // Set to `None` to disable this trigger. Default is `300ms`.
                    .time_past(Some(Duration::from_millis(300)))
                    .build(),
            )
            .build(),
    )
    .build();

let cache = Cache::<String, u64>::builder()
    .weigher(|_key: &String, value: &u64| -> u32 { size_of_val(value) as u32 })
    .initial_capacity(131_072)
    .max_capacity(10_000_000_000)
    .eviction_policy(EvictionPolicy::lru())
    .expire_after(...))
    .maintenance_config(cfg)  // Added.
    .build();

Doing this should be much easier than providing an option to run_pending_tasks in background. I hope I can finish this in a few days, and release a new patch version of moka.

@texascloud
Copy link

@tatsuya6502 Amazing investigation! I really like the histogram view of the read latencies. It provides a clear picture of the improvements.

But instead of providing such an option to run_pending_tasks in background, I am going to provide some tuning parameters for run_pending_tasks, so that run_pending_tasks can skip time-consuming tasks when called implicitly by a cache read.

This sounds like a good solution to me. Curious to see if it fully addresses the perf issues I was having since I will still be calling .run_pending_tasks() every ~5s to get accurate cache info for metrics reporting.

b. To mitigate the issue caused by 3:
Add yield in the loop in run_pending_tasks so that an async executor can run other async tasks.
moka can provide an option to have max number of entries to evict in one run_pending_tasks call.

FWIW, if I am reading the code correctly, creating an async eviction listener which simply runs tokio::task::yield_now() achieves the first mitigation, and the Expiry trait impl in my repro kind of satisfies the second, though it isn't perfect.

I would fully welcome being able to set a max number of entries to evict per .run_pending_tasks() call, however, is the number of things that need to be evicted really the culprit? I saw with different batch sizes that the sync time would still be fast, up until a cht re-hash was triggered.

Does limiting the number of entries to evict per run make the re-hash less impactful when it does run?

@texascloud
Copy link

texascloud commented Feb 25, 2025

It might look like the following:
/* example config /*

Is this example configuration one I would use to disable most of the expensive things in .run_pending_tasks() when triggered via a read/write?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants