Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve encoding/decoding consistency in tests #669

Open
drinckes opened this issue Dec 27, 2024 · 9 comments
Open

Improve encoding/decoding consistency in tests #669

drinckes opened this issue Dec 27, 2024 · 9 comments

Comments

@drinckes
Copy link
Contributor

We have known for a long time that the same lat/lng, on different platforms, operating systems and languages will encode to different plus codes (see for example #665). This is down to differences in floating point precision and implementations, as well as different compiler optimisations.

This last is kind of scary because it implies that encodings could change between language/compiler versions.

An alternative is that we stop trying to fight/solve this issue ourselves, and instead make use of the GNU MPFR floating point library.

The main goal of MPFR is to provide a library for multiple-precision floating-point computation which is both efficient and has a well-defined semantics. It copies the good ideas from the ANSI/IEEE-754 standard for double-precision floating-point arithmetic (53-bit significand).

Apart from anything else, this could improve the consistency of plus code encoding/decoding between languages and platforms.

The plus code languages in our repo and the MPFR status are:

Implementation Availability Note
c MPFR is natively in C
cpp there are multiple C++ interfaces for MPFR available. Care will need to be taken to choose one compatible with the other languages.
dart
go the float package references MPFR, it might be good enough?
java
js, js/closure
plpgsql
python
ruby
rust
visualbasic

This is a not insignificant amount of work. I would be interested in thoughts below, and suggest we come to a consensus before embarking on implementing?

@bocops
Copy link
Contributor

bocops commented Dec 27, 2024

Isn't the actual issue at play here that individual implementations use floating-point types for an algorithm that is canonically defined to be performed on fixed-point data in the OLC Specification?

If implementations need to be changed, anyway, they could be changed to fixed-point arithmetic to better align with the specification. If floating-point arithmetic needs to be preserved, then the specification should be probably amended to state that implementations may use that but need to ensure that encoding/decoding is "precise enough" - however that gets defined.

@drinckes drinckes changed the title Improve encoding/decoding reliability with high precision library Improve encoding/decoding consistency with high precision library Dec 27, 2024
@drinckes drinckes changed the title Improve encoding/decoding consistency with high precision library Improve encoding/decoding consistency in tests Dec 27, 2024
@drinckes
Copy link
Contributor Author

drinckes commented Dec 27, 2024

Hmm. Good comment.

The problem is that our tests pass floating-point numbers even though the functions are using fixed point for the calculations. (This is true of the C implementation and most of the others.)

I was thinking that we could use MPFR types and operations to provide functions like encodeHighPrecision(). The regular encode() function would just convert the arguments to MPFR types and call the high precision version, but your comment has made me think that we could just split the existing functions into encodeFixed() and have encode() convert the double to long long and call it.

Either way, we get functions we can call from tests that should isolate us from floating point differences, and give us better testability.

(I've changed the title to reflect the issue is in our tests.)

@drinckes
Copy link
Contributor Author

(One argument in favour of using the MPFR library is that the coordinates in the test files can still be readable lat/lngs, rather than unreadable long integers.)

@bocops
Copy link
Contributor

bocops commented Dec 28, 2024

Ah, then I misunderstood the original suggestion, thanks for the explanation. Does it make sense to add external library dependencies to OLC implementations across the board, though, if we're (mostly) just enhancing tests?

The alternative of having a function exposed that could be called with precomputed integer values directly would mean that we don't need to deal with additional dependencies in OLC itself, but would let the user beware.

I assume that this would be unimportant for most users who just generate 8- or 10-digit codes from device location data with dubious accuracy, but would mostly be used by users who need to input precise locations for 12+-digit codes? In the latter case, these users might already have their location data in some high-precision format that they can use to calculate the necessary integer values themselves.

In that case, the MPFR library could still be used in test code to convert readable test files to integer data during testing.

@drinckes
Copy link
Contributor Author

Does it make sense to add external library dependencies to OLC implementations across the board, though, if we're (mostly) just enhancing tests?

I know, it makes me squirm too and it's a good point.

The alternative of having a function exposed that could be called with precomputed integer values directly would mean that we don't need to deal with additional dependencies in OLC itself, but would let the user beware.

Could you take a look at #670? I've exposed the integer-input function in C (OLC_EncodeFixed, probably a bad name), updated the encoding tests (commented out all the other tests), and added a macos-latest test runner that we know from #665 generated different codes when using doubles. Using the integer values it passes, so we can be fairly confident that we don't have any issues on that platform.

I'm interested what others think - using a high precision float library would mean significant additional dependencies for the languages listed above, whereas just splitting the implementations into an integer-input function, and a float->integer conversion gives us improved testability and consistency.

@drinckes drinckes linked a pull request Dec 30, 2024 that will close this issue
@bocops
Copy link
Contributor

bocops commented Jan 2, 2025

Could you take a look at #670?

The function split looks good. One minor thing I notice is that conversion/sanitization is now split between the two functions. We're multiplying and adding MAX in the floating-point-input function for an integer value that should be in range [0..2*MAX] - but might not be, because adjusting and normalizing lat/lng happens later, in the integer-input function.

Keeping that there makes sense, because we likely don't want to fail for a recoverable input oddity - but while we're massaging the inputs anyway, we could also move the +MAX there, so that the expected inputs of the integer-input function are a better understandable lat*LAT_CONSTANT, lng*LNG_CONSTANT. Either way is ultimately fine, though.

Regarding function naming, I think I'd prefer some variant of the "precision" suffix you brought up earlier: encodeHighPrecision(), encodePrecise(), ...?

@drinckes
Copy link
Contributor Author

drinckes commented Jan 3, 2025

I totally agree with moving the +MAX into the integer function, good idea.

I'm not sure about using precision in the encoding names - to me "high precision" is associated with longer codes, so how do you feel about encodeIntegers()?

(We are just bikeshedding at this point. 😁 )

@bocops
Copy link
Contributor

bocops commented Jan 6, 2025

Right, better not get lost in the details - encodeIntegers() works just as well. :)

@bocops
Copy link
Contributor

bocops commented Jan 9, 2025

Another sub-issue should be created for updating the OLC specification with the new integer encoding function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants