Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move epoch garbage collection to flize #95

Closed
wants to merge 14 commits into from

Conversation

domenicquirl
Copy link
Collaborator

@domenicquirl domenicquirl commented Oct 22, 2020

After chatting with Acrimon on Discord, we added the missing features (mostly unprotected shields) to flize, making it possible to replace crossbeam_epoch with it in flurry. The main motivation behind this change is to overcome some of our performance issues that seem to be related to garbage collection (see #50, jonhoo/bustle#2). There's some amount of stuff going on here, but I tried to keep the code as familiar as possible. Preliminary evaluation:

Disadvantages

  • Type/Trait Complexity & Lifetimes: there are (obviously) some differences in the interface the two libraries provide. flize's Shield is a trait with an associated lifetime bound, which permeates into a lot of our types and functions now, making it harder to understand and maintain them (in particular for new contributors). For example, NodeIter<'g, K, V> is now NodeIter<'m, 'g, K, V, SH>, where SH: flize::Shield<'m>. Or, for a function example, my now favourite function (guard lifetimes elided):
  pub(crate) fn guarded_eq<'m1, 'm2, SH, SH2>(
      &'m1 self,
      other: &'m2 Self,
      our_guard: &Guard<'m1, SH>,
      their_guard: &Guard<'m2, SH2>,
  ) -> bool
  where
      V: PartialEq,
      SH: flize::Shield<'m1>,
      SH2: flize::Shield<'m2>
  • Shared Collectors: it is no longer possible to share a collector with other data structures, since flize doesn't have the concept of a global default collector.
    • Because of this, MapRef and SetRef cannot be constructed with_guard anymore
  • We now have our own small ebr module to extend flize to our needs, which is some additional code to maintain
  • I've had to write a lot of manual Debug impls (though that can be fixed very easily by implementing Debug for flize types, mainly atomics)

Advantages

  • No Owned Type: flize only uses Atomic and Shared. This removes a lot of Owned::new(...).into_shared(&guard), which almost always happens together in flurry, and has proven to be quite ergonomic.
  • Collector safety: The PR keeps the collecor/guard check on map access to ensure guards from the correct collector are used (disallow_evil).
  • Performance: flize is fast, which means flurry is now also fast! The map now has usable performance comparable to other concurrent maps (see bench results below)

Overall, even if the flize implementation is slightly less ergonomic than the current crossbeam one this change is a massive win in my eyes. I'd much rather have a library which is a valid alternative for real use than a "useless" one which is easier to maintain. Big air-quotes here, as I am very well aware that this started as a learning project, given that it did for me as well. But even for that it always had work done on flurry feel unrewarding in some respect, just because "it's not like anyone will ever use this anyways".

Benchmarks

As mentioned above, running (the first version of) https://github.com/xacrimon/conc-map-bench indicates major performance improvements for most tasks. Our main slow points still seem to be inserts in general and single-thread overhead, but for other tasks and higher thread counts see for yourself:

Bench results on my machine INSERT

-- RwLockStd
25165824 operations across 1 thread(s) in 4.1842974s; time/op = 165ns
25165824 operations across 2 thread(s) in 4.5482347s; time/op = 179ns
25165824 operations across 3 thread(s) in 4.8963708s; time/op = 193ns
25165824 operations across 4 thread(s) in 4.9127281s; time/op = 194ns
25165824 operations across 5 thread(s) in 5.0914631s; time/op = 201ns
25165824 operations across 6 thread(s) in 5.1736532s; time/op = 204ns
25165824 operations across 7 thread(s) in 5.3755057s; time/op = 212ns
25165824 operations across 8 thread(s) in 5.6824829s; time/op = 225ns

-- CHashMap
25165824 operations across 1 thread(s) in 5.5475931s; time/op = 219ns
25165824 operations across 2 thread(s) in 4.1545495s; time/op = 164ns
25165824 operations across 3 thread(s) in 3.6631931s; time/op = 145ns
25165824 operations across 4 thread(s) in 3.5636953s; time/op = 141ns
25165824 operations across 5 thread(s) in 3.8907204s; time/op = 154ns
25165824 operations across 6 thread(s) in 3.7126855s; time/op = 147ns
25165824 operations across 7 thread(s) in 3.7939694s; time/op = 150ns
25165824 operations across 8 thread(s) in 3.6153663s; time/op = 143ns

-- Contrie
25165824 operations across 1 thread(s) in 19.3256565s; time/op = 766ns
25165824 operations across 2 thread(s) in 9.8763793s; time/op = 391ns
25165824 operations across 3 thread(s) in 6.5838239s; time/op = 261ns
25165824 operations across 4 thread(s) in 5.4415494s; time/op = 215ns
25165824 operations across 5 thread(s) in 4.9086136s; time/op = 194ns
25165824 operations across 6 thread(s) in 5.0110217s; time/op = 198ns
25165824 operations across 7 thread(s) in 4.4787998s; time/op = 177ns
25165824 operations across 8 thread(s) in 5.0837114s; time/op = 201ns

-- Flurry
25165824 operations across 1 thread(s) in 11.0327772s; time/op = 438ns
25165824 operations across 2 thread(s) in 4.9960757s; time/op = 197ns
25165824 operations across 3 thread(s) in 3.5890423s; time/op = 142ns
25165824 operations across 4 thread(s) in 2.6178409s; time/op = 103ns
25165824 operations across 5 thread(s) in 2.4521642s; time/op = 96ns
25165824 operations across 6 thread(s) in 2.350806s; time/op = 92ns
25165824 operations across 7 thread(s) in 2.1080351s; time/op = 83ns
25165824 operations across 8 thread(s) in 2.1402698s; time/op = 84ns

-- DashMapV3
25165824 operations across 1 thread(s) in 4.9713333s; time/op = 196ns
25165824 operations across 2 thread(s) in 3.0867169s; time/op = 122ns
25165824 operations across 3 thread(s) in 2.4237726s; time/op = 95ns
25165824 operations across 4 thread(s) in 2.1705871s; time/op = 85ns
25165824 operations across 5 thread(s) in 1.8898701s; time/op = 74ns
25165824 operations across 6 thread(s) in 1.8302189s; time/op = 71ns
25165824 operations across 7 thread(s) in 1.9578536s; time/op = 77ns
25165824 operations across 8 thread(s) in 2.5197997s; time/op = 99ns

READ

-- RwLockStd
25165824 operations across 1 thread(s) in 2.8450142s; time/op = 112ns
25165824 operations across 2 thread(s) in 2.6005495s; time/op = 102ns
25165824 operations across 3 thread(s) in 2.9530743s; time/op = 116ns
25165824 operations across 4 thread(s) in 3.7836839s; time/op = 150ns
25165824 operations across 5 thread(s) in 3.1795386s; time/op = 126ns
25165824 operations across 6 thread(s) in 3.3245617s; time/op = 131ns
25165824 operations across 7 thread(s) in 3.9840663s; time/op = 158ns
25165824 operations across 8 thread(s) in 3.6619738s; time/op = 145ns

-- CHashMap
25165824 operations across 1 thread(s) in 6.1441085s; time/op = 243ns
25165824 operations across 2 thread(s) in 4.1612992s; time/op = 164ns
25165824 operations across 3 thread(s) in 3.2587518s; time/op = 129ns
25165824 operations across 4 thread(s) in 3.0584123s; time/op = 121ns
25165824 operations across 5 thread(s) in 3.4522208s; time/op = 136ns
25165824 operations across 6 thread(s) in 3.4502188s; time/op = 136ns
25165824 operations across 7 thread(s) in 3.5243859s; time/op = 139ns
25165824 operations across 8 thread(s) in 3.2542338s; time/op = 129ns

-- Contrie
25165824 operations across 1 thread(s) in 4.4747205s; time/op = 176ns
25165824 operations across 2 thread(s) in 2.4922609s; time/op = 98ns
25165824 operations across 3 thread(s) in 1.6747643s; time/op = 65ns
25165824 operations across 4 thread(s) in 1.5041451s; time/op = 59ns
25165824 operations across 5 thread(s) in 1.0541919s; time/op = 41ns
25165824 operations across 6 thread(s) in 925.9906ms; time/op = 36ns
25165824 operations across 7 thread(s) in 933.8123ms; time/op = 37ns
25165824 operations across 8 thread(s) in 1.0068524s; time/op = 39ns

-- Flurry
25165824 operations across 1 thread(s) in 2.5889212s; time/op = 102ns
25165824 operations across 2 thread(s) in 1.7753731s; time/op = 69ns
25165824 operations across 3 thread(s) in 1.6216983s; time/op = 63ns
25165824 operations across 4 thread(s) in 1.4349656s; time/op = 56ns
25165824 operations across 5 thread(s) in 1.4070166s; time/op = 55ns
25165824 operations across 6 thread(s) in 1.0889074s; time/op = 42ns
25165824 operations across 7 thread(s) in 1.068765s; time/op = 41ns
25165824 operations across 8 thread(s) in 1.065837s; time/op = 41ns

-- DashMapV3
25165824 operations across 1 thread(s) in 2.6373341s; time/op = 104ns
25165824 operations across 2 thread(s) in 1.6152775s; time/op = 63ns
25165824 operations across 3 thread(s) in 1.2954522s; time/op = 50ns
25165824 operations across 4 thread(s) in 1.2461302s; time/op = 48ns
25165824 operations across 5 thread(s) in 1.1239509s; time/op = 43ns
25165824 operations across 6 thread(s) in 1.1168707s; time/op = 43ns
25165824 operations across 7 thread(s) in 1.2209516s; time/op = 47ns
25165824 operations across 8 thread(s) in 1.1155557s; time/op = 43ns

UPDATE

-- RwLockStd
25165824 operations across 1 thread(s) in 3.1979387s; time/op = 126ns
25165824 operations across 2 thread(s) in 3.6861083s; time/op = 146ns
25165824 operations across 3 thread(s) in 3.3732953s; time/op = 133ns
25165824 operations across 4 thread(s) in 3.3740196s; time/op = 133ns
25165824 operations across 5 thread(s) in 3.5900936s; time/op = 142ns
25165824 operations across 6 thread(s) in 3.7601384s; time/op = 149ns
25165824 operations across 7 thread(s) in 3.8402625s; time/op = 152ns
25165824 operations across 8 thread(s) in 3.9234121s; time/op = 155ns

-- CHashMap
25165824 operations across 1 thread(s) in 5.5699635s; time/op = 220ns
25165824 operations across 2 thread(s) in 3.7534172s; time/op = 148ns
25165824 operations across 3 thread(s) in 3.340075s; time/op = 132ns
25165824 operations across 4 thread(s) in 3.5080474s; time/op = 139ns
25165824 operations across 5 thread(s) in 3.3624801s; time/op = 133ns
25165824 operations across 6 thread(s) in 3.5680631s; time/op = 141ns
25165824 operations across 7 thread(s) in 3.6638752s; time/op = 145ns
25165824 operations across 8 thread(s) in 3.5436977s; time/op = 140ns

-- Contrie
25165824 operations across 1 thread(s) in 8.0046352s; time/op = 317ns
25165824 operations across 2 thread(s) in 4.365312s; time/op = 172ns
25165824 operations across 3 thread(s) in 2.9479402s; time/op = 116ns
25165824 operations across 4 thread(s) in 2.2043506s; time/op = 87ns
25165824 operations across 5 thread(s) in 2.1326051s; time/op = 84ns
25165824 operations across 6 thread(s) in 1.5834109s; time/op = 62ns
25165824 operations across 7 thread(s) in 1.439043s; time/op = 56ns
25165824 operations across 8 thread(s) in 1.5088232s; time/op = 59ns

-- Flurry
25165824 operations across 1 thread(s) in 3.8062255s; time/op = 151ns
25165824 operations across 2 thread(s) in 2.4032358s; time/op = 95ns
25165824 operations across 3 thread(s) in 1.8500285s; time/op = 72ns
25165824 operations across 4 thread(s) in 1.5584881s; time/op = 61ns
25165824 operations across 5 thread(s) in 1.4674148s; time/op = 57ns
25165824 operations across 6 thread(s) in 1.3686862s; time/op = 53ns
25165824 operations across 7 thread(s) in 1.4423715s; time/op = 56ns
25165824 operations across 8 thread(s) in 1.4779185s; time/op = 57ns

-- DashMapV3
25165824 operations across 1 thread(s) in 3.1669407s; time/op = 125ns
25165824 operations across 2 thread(s) in 2.2521689s; time/op = 89ns
25165824 operations across 3 thread(s) in 1.5633163s; time/op = 61ns
25165824 operations across 4 thread(s) in 1.4805222s; time/op = 58ns
25165824 operations across 5 thread(s) in 1.4239066s; time/op = 55ns
25165824 operations across 6 thread(s) in 1.3356814s; time/op = 52ns
25165824 operations across 7 thread(s) in 1.331327s; time/op = 52ns
25165824 operations across 8 thread(s) in 1.816247s; time/op = 71ns

Roadmap for remaining work/discussion:

  • Consistency checks for
    • comments mentioning guards
    • guard vs. shield nomenclature
    • names of lifetime variables
  • Find out what triggers the concurrent_associate test to fail (was on the flize side)
  • Update trait bounds/impls for non-default features (serde, rayon)
  • Decide whether SharedExt::into_box should be unsafe: Originally I thought I'd create a Wrapper struct around flize::Shared which then always heap allocates new shared objects. However, the shared pointers are passed as a parameter a lot, making Deref not play out as nicely as I hoped from a code perspective. So instead, the current code uses an extension trait, which makes it technically possible to construct Shareds to other-than-heap data and then call into box. This is not public interface, but may be a consideration also for other developers.
  • Maybe add Debug impls to flize first and remove the manual impls in this PR

This change is Reviewable

@codecov
Copy link

codecov bot commented Oct 22, 2020

Codecov Report

Merging #95 (b073872) into master (a69a423) will decrease coverage by 1.49%.
The diff coverage is 80.32%.

Impacted Files Coverage Δ
benches/flurry_hashbrown.rs 0.00% <ø> (ø)
src/rayon_impls.rs 0.00% <ø> (ø)
src/serde_impls.rs 0.00% <0.00%> (ø)
src/set_ref.rs 52.27% <68.75%> (+7.03%) ⬆️
src/node.rs 70.86% <70.18%> (-3.15%) ⬇️
src/raw/mod.rs 78.94% <70.83%> (-9.41%) ⬇️
src/ebr.rs 71.42% <71.42%> (ø)
src/map_ref.rs 86.53% <77.77%> (-0.50%) ⬇️
src/map.rs 84.10% <89.62%> (+0.01%) ⬆️
src/set.rs 95.77% <96.66%> (+1.83%) ⬆️
... and 5 more

@domenicquirl
Copy link
Collaborator Author

Some more on the issue with the concurrent_associate test:
The test itself mostly just performs a bunch of inserts across multiple threads. This intermittently triggers map resizes, as more keys are added and the backing allocation expands while the inserts go on. There seems to be an issue of some thread somehow accessing a table after it has already been dropped, resulting in either an access violation or, if the memory is still valid, an attempt to access an empty list of buckets.

I've seen this panic from put, get_node and, very curiously, Table::find on processing a Moved entry. The latter in particular confuses me, as the error there occurs with the next_table of the table find was called on. Several tools I've run also report the symptom issue (invalid pointer/memory race), but are expectedly unhelpful in determining the root cause.

As I didn't (or at least didn't intend to) touch the underlying logic of the map implementation, in particular soundness of defer_destroying tables after a finished resize (which is still the only place this happens), I am somewhat suspicious of this problem. I've also only been ably to trigger it somewhat reliably with this one specific test in debug mode. The weird thing to me is that in all of the places where the invalid access happens, the pointer to the table is obtained through a guard which is held for the entire method. And since the table is the old one, it is obtained before the resize finishes, thus also before the defer_destroy.

@domenicquirl
Copy link
Collaborator Author

cc @xacrimon for both concurrent maps and flize

@domenicquirl
Copy link
Collaborator Author

There seems to also be some work left with updated trait bounds for non-default features, so I'll add that to the list.

@xacrimon
Copy link

I've recently made a change which may fix a bug related to invalid advancing of the epoch. You may want to retry the tests with master if you think flize may be the issue. You may also want to try turning off default features for flize.

@domenicquirl
Copy link
Collaborator Author

Gave that a quick try, master + no features still triggers the error.
I'm honestly not sure if this is flize or a flurry issue. The flurry logic 100% works with the crossbeam implementation and so far I've not been able to find a place where I messed something up in the flize re-implementation, thus my suspicion. However it's also entirely possible that I overlook something in this volume of changes. I already tried setting up a test that manually creates a race condition w/ defer_destroy and table accesses using flize stuff, but couldn't get that to reproduce the error.

@xacrimon
Copy link

I think I may have found a bug in flize now that I am looking through the code. Seems to be very rare but plausible in some conditions. I will be opening an issue and hopefully I can fix it today or tomorrow.

@xacrimon
Copy link

Created issue at flize with link https://github.com/xacrimon/flize/issues/69. I will reply here once progress is made.

@jonhoo
Copy link
Owner

jonhoo commented Oct 24, 2020

This is really cool — thank you for putting in the effort! ❤️ I'm very swamped with finishing up my thesis at the moment, but will get back to this once my schedule frees up a little.

@xacrimon
Copy link

xacrimon commented Oct 25, 2020

@domenicquirl I've now published v4.2.0 "Blaze It" which contains fixes for a few bugs previously identified.

@domenicquirl
Copy link
Collaborator Author

After a quick check with the new version (nice release name 😉) the panics still occur with the concurrent_associate test. I haven't had time yet to look into this further myself, but something's still off here :/

@domenicquirl
Copy link
Collaborator Author

While going through the comments again I noticed that, since flize's lack of a global collector makes me have one collector per map right not, we might be able to lift some 'static bounds from keys and values 🤔

@xacrimon
Copy link

xacrimon commented Oct 31, 2020

I've released 4.2.1 which fixes yet another bug. Encouraging not using the global collector and using structure specific ones to be able to life those static bounds is one of the major points. Flize should be easy to use without a global collector.

@xacrimon
Copy link

xacrimon commented Nov 7, 2020

Working on yet another patch now

@domenicquirl
Copy link
Collaborator Author

Hm, I don't get these clippy warnings locally. But I've made note of them and will go through them another time.

@xacrimon I updated to current master to roll back the Debug impls and your most recent set of changes seems to have fixed the issue with the concurrent_associate test (at least it no longer reproduces locally and didn't occur in any CI run) 🎉
However, the ASAN/LSAN failures contain a lot of flize. Care to take a look at that?

The miri failure is also due to using flize, though only because you use some syscall that miri cannot do.

Finally, current bench since who doesn't like more numbers:

INSERT

-- RwLockStd
25165824 operations across 1 thread(s) in 3.1112291s; time/op = 123ns
25165824 operations across 2 thread(s) in 3.2211314s; time/op = 127ns
25165824 operations across 3 thread(s) in 3.1949594s; time/op = 126ns
25165824 operations across 4 thread(s) in 4.3218119s; time/op = 170ns
25165824 operations across 5 thread(s) in 4.4376931s; time/op = 175ns
25165824 operations across 6 thread(s) in 4.40785s; time/op = 174ns
25165824 operations across 7 thread(s) in 4.5828952s; time/op = 181ns
25165824 operations across 8 thread(s) in 5.0664303s; time/op = 200ns

-- Contrie
25165824 operations across 1 thread(s) in 14.7252863s; time/op = 584ns
25165824 operations across 2 thread(s) in 7.7370728s; time/op = 307ns
25165824 operations across 3 thread(s) in 5.0794579s; time/op = 201ns
25165824 operations across 4 thread(s) in 4.2489909s; time/op = 167ns
25165824 operations across 5 thread(s) in 4.2065207s; time/op = 166ns
25165824 operations across 6 thread(s) in 3.6223751s; time/op = 143ns
25165824 operations across 7 thread(s) in 3.5228882s; time/op = 139ns
25165824 operations across 8 thread(s) in 3.9129787s; time/op = 155ns

-- Flurry
25165824 operations across 1 thread(s) in 8.360963s; time/op = 331ns
25165824 operations across 2 thread(s) in 4.0612237s; time/op = 160ns
25165824 operations across 3 thread(s) in 3.4739486s; time/op = 137ns
25165824 operations across 4 thread(s) in 3.0557539s; time/op = 121ns
25165824 operations across 5 thread(s) in 2.7574359s; time/op = 109ns
25165824 operations across 6 thread(s) in 3.3052425s; time/op = 131ns
25165824 operations across 7 thread(s) in 3.1301315s; time/op = 124ns
25165824 operations across 8 thread(s) in 3.6126234s; time/op = 143ns

-- DashMapV3
25165824 operations across 1 thread(s) in 3.9942649s; time/op = 158ns
25165824 operations across 2 thread(s) in 2.3649468s; time/op = 93ns
25165824 operations across 3 thread(s) in 1.7927193s; time/op = 70ns
25165824 operations across 4 thread(s) in 1.4755842s; time/op = 57ns
25165824 operations across 5 thread(s) in 1.3392656s; time/op = 52ns
25165824 operations across 6 thread(s) in 1.2562882s; time/op = 49ns
25165824 operations across 7 thread(s) in 1.2352752s; time/op = 48ns
25165824 operations across 8 thread(s) in 1.6374721s; time/op = 64ns

READ

-- RwLockStd
25165824 operations across 1 thread(s) in 2.0873606s; time/op = 82ns
25165824 operations across 2 thread(s) in 2.2354854s; time/op = 88ns
25165824 operations across 3 thread(s) in 2.3609682s; time/op = 93ns
25165824 operations across 4 thread(s) in 2.4741011s; time/op = 97ns
25165824 operations across 5 thread(s) in 2.8927381s; time/op = 114ns
25165824 operations across 6 thread(s) in 3.0899503s; time/op = 122ns
25165824 operations across 7 thread(s) in 3.2720078s; time/op = 129ns
25165824 operations across 8 thread(s) in 3.4052621s; time/op = 135ns

-- Contrie
25165824 operations across 1 thread(s) in 3.3630103s; time/op = 133ns
25165824 operations across 2 thread(s) in 1.7374071s; time/op = 68ns
25165824 operations across 3 thread(s) in 1.3412467s; time/op = 52ns
25165824 operations across 4 thread(s) in 979.8144ms; time/op = 38ns
25165824 operations across 5 thread(s) in 807.5347ms; time/op = 32ns
25165824 operations across 6 thread(s) in 710.2867ms; time/op = 28ns
25165824 operations across 7 thread(s) in 691.9107ms; time/op = 27ns
25165824 operations across 8 thread(s) in 620.1679ms; time/op = 24ns

-- Flurry
25165824 operations across 1 thread(s) in 1.9622737s; time/op = 77ns
25165824 operations across 2 thread(s) in 1.3287911s; time/op = 52ns
25165824 operations across 3 thread(s) in 1.0343351s; time/op = 40ns
25165824 operations across 4 thread(s) in 1.3497852s; time/op = 52ns
25165824 operations across 5 thread(s) in 948.9328ms; time/op = 37ns
25165824 operations across 6 thread(s) in 883.4476ms; time/op = 35ns
25165824 operations across 7 thread(s) in 836.4695ms; time/op = 33ns
25165824 operations across 8 thread(s) in 860.8753ms; time/op = 34ns

-- DashMapV3
25165824 operations across 1 thread(s) in 2.0134959s; time/op = 79ns
25165824 operations across 2 thread(s) in 1.2995284s; time/op = 50ns
25165824 operations across 3 thread(s) in 1.0334194s; time/op = 40ns
25165824 operations across 4 thread(s) in 923.3008ms; time/op = 36ns
25165824 operations across 5 thread(s) in 1.1188129s; time/op = 43ns
25165824 operations across 6 thread(s) in 1.197148s; time/op = 46ns
25165824 operations across 7 thread(s) in 875.0489ms; time/op = 34ns
25165824 operations across 8 thread(s) in 847.4865ms; time/op = 33ns

UPDATE

-- RwLockStd
25165824 operations across 1 thread(s) in 2.4583979s; time/op = 97ns
25165824 operations across 2 thread(s) in 2.4853831s; time/op = 98ns
25165824 operations across 3 thread(s) in 2.6654809s; time/op = 105ns
25165824 operations across 4 thread(s) in 3.0389016s; time/op = 120ns
25165824 operations across 5 thread(s) in 3.1299052s; time/op = 124ns
25165824 operations across 6 thread(s) in 3.3228441s; time/op = 131ns
25165824 operations across 7 thread(s) in 3.6030417s; time/op = 142ns
25165824 operations across 8 thread(s) in 3.5921058s; time/op = 142ns

-- Contrie
25165824 operations across 1 thread(s) in 6.1480751s; time/op = 243ns
25165824 operations across 2 thread(s) in 3.1098613s; time/op = 123ns
25165824 operations across 3 thread(s) in 2.2017492s; time/op = 87ns
25165824 operations across 4 thread(s) in 1.580198s; time/op = 62ns
25165824 operations across 5 thread(s) in 1.3325822s; time/op = 52ns
25165824 operations across 6 thread(s) in 1.1378914s; time/op = 44ns
25165824 operations across 7 thread(s) in 1.1632136s; time/op = 45ns
25165824 operations across 8 thread(s) in 1.0009695s; time/op = 39ns

-- Flurry
25165824 operations across 1 thread(s) in 3.0274261s; time/op = 120ns
25165824 operations across 2 thread(s) in 1.8291541s; time/op = 71ns
25165824 operations across 3 thread(s) in 1.5162128s; time/op = 59ns
25165824 operations across 4 thread(s) in 1.180083s; time/op = 46ns
25165824 operations across 5 thread(s) in 1.1958379s; time/op = 46ns
25165824 operations across 6 thread(s) in 1.1429559s; time/op = 44ns
25165824 operations across 7 thread(s) in 1.0205986s; time/op = 39ns
25165824 operations across 8 thread(s) in 1.1821357s; time/op = 46ns

-- DashMapV3
25165824 operations across 1 thread(s) in 2.3511851s; time/op = 92ns
25165824 operations across 2 thread(s) in 1.4883932s; time/op = 58ns
25165824 operations across 3 thread(s) in 1.2101926s; time/op = 47ns
25165824 operations across 4 thread(s) in 1.1327995s; time/op = 44ns
25165824 operations across 5 thread(s) in 903.808ms; time/op = 35ns
25165824 operations across 6 thread(s) in 907.7498ms; time/op = 36ns
25165824 operations across 7 thread(s) in 862.2404ms; time/op = 34ns
25165824 operations across 8 thread(s) in 945.879ms; time/op = 37ns

@xacrimon
Copy link

The miri failure can be fixed in testing by disabling the fast-barrier feature so it doesn't attempt to use the accelerated OS barriers strategy. I'll take a look at ASAN/LSAN but are you recompiling std when getting those warnings?

@xacrimon
Copy link

xacrimon commented Nov 23, 2020

I've also added accelerated barriers for macOS now which may be of interest.

@domenicquirl
Copy link
Collaborator Author

Hm, this would be a separate -Z flag, right? I don't see that in the CI commands, but I don't actually know. @jonhoo do you?

@xacrimon
Copy link

It's done by passing -Zbuild-std to cargo.

@domenicquirl
Copy link
Collaborator Author

Ok so idk why (a) I am not getting half the clippy warnings even with nightly and (b) it shows them one after the other where I do one from_iter -> collect it shows me and then there is like exactly the same line 10 lines above BUT that should be all now. Only the sanitizers remaining 🤔

@xacrimon
Copy link

xacrimon commented Dec 7, 2020

Eek. I've created the acrimon/dgl branch which I'd like you to try if possible. It does some tricks to avoid pushing garbage to the global queue and also multithreaded cleanup of the global queue. Not sure if you've been having problems with garbage buildup but this should solve it.

@domenicquirl
Copy link
Collaborator Author

domenicquirl commented Dec 8, 2020

I tried it out locally and it mostly works fine. Will run benchmarks again when I can do so on the same machine in idle. In one of the test runs I got a spurious thread 'concurrent_associate::test_concurrent_insert' panicked at 'assertion failed: self.next_table.load(Ordering::SeqCst, shield).is_null()', flurry\src\map.rs:3037:9 from HashMap::drop, which may or may not be related (will need to investigate further, but also seems like it isn't the end of the world). Pushing to see what the sanitizers have to say about this.

@domenicquirl
Copy link
Collaborator Author

Still a lot of complaints it seems :/

@ibraheemdev ibraheemdev mentioned this pull request Feb 9, 2022
5 tasks
@ibraheemdev
Copy link
Collaborator

Now that #102 is merged I think this can be closed, unless you're still planning on pursuing this @domenicquirl?

@domenicquirl
Copy link
Collaborator Author

Now that #102 is merged I think this can be closed, unless you're still planning on pursuing this @domenicquirl?

Yes, I'll close it. One of the primary motivations of this PR was to make flurry at least viable in terms of performance, so that even if if won't be literally the fastest impl people coming over from the streams and working on flurry can contribute to something that is more than just a learning exercise. You have more than addressed that with Your PR (really, kudos again!), so switching collection schemes again now seems obviously redundant.

It's also been quite some time since last working on this, and part of why this PR stalled was that there were sanitizer errors with the flize collectors of which I don't know whether they were due to usage or flize itself (I recall Acri mentioning it was undergoing further rewrites).

Thank You for coming up with an even better way (and continuing to iterate and improve on it in #80)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants