Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify delta decoding #21

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Simplify delta decoding #21

wants to merge 1 commit into from

Conversation

nyurik
Copy link
Contributor

@nyurik nyurik commented Jun 30, 2022

I just wrote a delta encoding/decoding crate that takes care of some edge cases and has a lot of testing and perf benchmarking, and this PR makes use of that crate.

I'm still hacking on the PBF encoding, so the delta encoding from that lib will be used as well.

Note that I use copied() iterator which simplifies value handling without adding any perf costs (dealing with values vs refs is simpler too).

The WayRefIter is technically not needed other than to implement the ExactSizeIterator on it. If the exact size hint is not required, that whole struct can be replaced with this for backward compatibility.

pub type WayRefIter<'a> = DeltaDecoderIter<Copied<SliceIter<'a, i64>>>;

P.S. Updates MSRV to 1.56 -- minimal version that supports 2021 edition

@nyurik
Copy link
Contributor Author

nyurik commented Jun 30, 2022

P.S. Please let me know about this PR because it conflicts a little bit with the upgrade to protobuf 3 which i'm also trying to get done (sadly, there are a LOT of changes in protobuf 3)

@nyurik
Copy link
Contributor Author

nyurik commented Jul 1, 2022

@b-r-u it seems Windows CI still requires a very old version of Rust - the one without 2021 edition support. Can we drop that?

@nyurik nyurik force-pushed the deltas branch 2 times, most recently from a07aea7 to 7ca6dcd Compare July 2, 2022 00:36
@nyurik
Copy link
Contributor Author

nyurik commented Jul 2, 2022

This should pass CI after #31 is merged

@nyurik nyurik force-pushed the deltas branch 3 times, most recently from 60009af to 0e9239c Compare July 2, 2022 03:19
@b-r-u
Copy link
Owner

b-r-u commented Jul 3, 2022

Thanks for these ideas. They definitely make the code more readable. Although I'm not sure if I want to introduce another dependency for that. Would you be okay with adding a (maybe non-generic, i64) version of that to a new util module?

And it might be possible to implement ExactSizeIterator for the encoders/decoders if and only if the underlying iterator implements it with something like this:

impl<I> ExactSizeIterator for DeltaEncoderIter<I> where I: ExactSizeIterator

@nyurik
Copy link
Contributor Author

nyurik commented Jul 3, 2022

Thanks for the exact size idea. WRT dependency count -- the current dependencies count is 54, and with deltas it goes up to 56 -- so the increase is not that huge esp considering the crate size. Additionally, it turns out there are a lot of gotchas (like the wrapping aspect) - so it might make sense to keep it as a separate crate just because of all the testing it should go through (note that it already went through several revisions). I could of course make a copy of that code, but as we use more of it (i.e. for writing), it seems keeping this as a separate crate might make more sense.

P.S. I actually had to do a lot of benchmarking on it too to ensure proper performance.

I just wrote a delta encoding/decoding crate that takes care of some edge cases and has a lot of testing and perf benchmarking, and this PR makes use of that crate.

I'm still hacking on the PBF encoding, so the delta encoding from that lib will be used as well.

Note that I use `copied()` iterator which simplifies value handling without adding any perf costs (dealing with values vs refs is simpler too).
@nyurik
Copy link
Contributor Author

nyurik commented Jul 3, 2022

I published an updated version of the delta-encoding crate, and updated this PR to remove the WayRefIter. Thx for the suggestion! A colleague of mine has made some code for the highly optimized delta-decoding (allowing CPU vectorization), but I'm still researching if it can be applied to my crate.

@nyurik
Copy link
Contributor Author

nyurik commented Jul 3, 2022

Joel has published his vectorized delta code under MIT and apache license! I might try to add that to the delta lib too. https://gist.github.com/athre0z/579c64f2961e5603d9fe1ab2c6d4380f

@b-r-u
Copy link
Owner

b-r-u commented Jul 3, 2022

I like the idea of having a crate that handles everything surrounding delta encoding. But for this crate (osmpbf) right now, there is only a very narrow use-case: Adding/subtracting two i64s.
It would be interesting to see if improving delta encoding beyond the naive implementation actually makes a measurable impact, because there is so much else going on (especially everything zlib-related) that is way more expensive.

With vectorization, I think the code would have to be restructured for it to make sense. You would have to bulk-convert the delta values in one go instead of for example taking only one delta value for each component of a dense node per iteration.
One (quick and dirty) way to do this would be to mutate the delta encoded Vecs once when elements are decoded and assume afterwards that they contain the actual values. Then the iteration over dense nodes or way refs wouldn't have to deal with delta values. But my guess would be that this one addition per component has almost no impact on runtime.

Another interesting thing would be to look into prefix sum implementations. Decoding a delta coded array is very similar to computing the prefix sum and for that there are SIMD implementations. Raph Levien looked into this in the context of 2d rasterization (See https://github.com/raphlinus/font-rs/blob/master/src/accumulate.rs).

@nyurik
Copy link
Contributor Author

nyurik commented Jul 4, 2022

Just a minor correction - DenseNodeInfoIter uses both i64 and i32 delta values. Which means that I would either have to depend on the num-traits (for the wrapping_add function), or would have to do this in a macro. Either is ok i guess. Which would you prefer?

Thanks for the link! Interesting indeed, some day i do hope to have this level of optimization :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants