Improve encoding/decoding consistency in tests #669

drinckes · 2024-12-27T10:41:26Z

We have known for a long time that the same lat/lng, on different platforms, operating systems and languages will encode to different plus codes (see for example #665). This is down to differences in floating point precision and implementations, as well as different compiler optimisations.

This last is kind of scary because it implies that encodings could change between language/compiler versions.

An alternative is that we stop trying to fight/solve this issue ourselves, and instead make use of the GNU MPFR floating point library.

The main goal of MPFR is to provide a library for multiple-precision floating-point computation which is both efficient and has a well-defined semantics. It copies the good ideas from the ANSI/IEEE-754 standard for double-precision floating-point arithmetic (53-bit significand).

Apart from anything else, this could improve the consistency of plus code encoding/decoding between languages and platforms.

The plus code languages in our repo and the MPFR status are:

Implementation	Availability	Note
`c`	✅	MPFR is natively in C
`cpp`	✅	there are multiple C++ interfaces for MPFR available. Care will need to be taken to choose one compatible with the other languages.
`dart`	❌
`go`	❌	the `float` package references MPFR, it might be good enough?
`java`	✅
`js`, `js/closure`	❌
`plpgsql`	❌
`python`	✅
`ruby`	✅
`rust`	✅
`visualbasic`	❌

This is a not insignificant amount of work. I would be interested in thoughts below, and suggest we come to a consensus before embarking on implementing?

The text was updated successfully, but these errors were encountered:

bocops · 2024-12-27T12:34:55Z

Isn't the actual issue at play here that individual implementations use floating-point types for an algorithm that is canonically defined to be performed on fixed-point data in the OLC Specification?

If implementations need to be changed, anyway, they could be changed to fixed-point arithmetic to better align with the specification. If floating-point arithmetic needs to be preserved, then the specification should be probably amended to state that implementations may use that but need to ensure that encoding/decoding is "precise enough" - however that gets defined.

drinckes · 2024-12-27T13:22:55Z

Hmm. Good comment.

The problem is that our tests pass floating-point numbers even though the functions are using fixed point for the calculations. (This is true of the C implementation and most of the others.)

I was thinking that we could use MPFR types and operations to provide functions like encodeHighPrecision(). The regular encode() function would just convert the arguments to MPFR types and call the high precision version, but your comment has made me think that we could just split the existing functions into encodeFixed() and have encode() convert the double to long long and call it.

Either way, we get functions we can call from tests that should isolate us from floating point differences, and give us better testability.

(I've changed the title to reflect the issue is in our tests.)

drinckes · 2024-12-27T14:10:50Z

(One argument in favour of using the MPFR library is that the coordinates in the test files can still be readable lat/lngs, rather than unreadable long integers.)

bocops · 2024-12-28T10:29:34Z

Ah, then I misunderstood the original suggestion, thanks for the explanation. Does it make sense to add external library dependencies to OLC implementations across the board, though, if we're (mostly) just enhancing tests?

The alternative of having a function exposed that could be called with precomputed integer values directly would mean that we don't need to deal with additional dependencies in OLC itself, but would let the user beware.

I assume that this would be unimportant for most users who just generate 8- or 10-digit codes from device location data with dubious accuracy, but would mostly be used by users who need to input precise locations for 12+-digit codes? In the latter case, these users might already have their location data in some high-precision format that they can use to calculate the necessary integer values themselves.

In that case, the MPFR library could still be used in test code to convert readable test files to integer data during testing.

drinckes · 2024-12-30T12:02:19Z

Does it make sense to add external library dependencies to OLC implementations across the board, though, if we're (mostly) just enhancing tests?

I know, it makes me squirm too and it's a good point.

The alternative of having a function exposed that could be called with precomputed integer values directly would mean that we don't need to deal with additional dependencies in OLC itself, but would let the user beware.

Could you take a look at #670? I've exposed the integer-input function in C (OLC_EncodeFixed, probably a bad name), updated the encoding tests (commented out all the other tests), and added a macos-latest test runner that we know from #665 generated different codes when using doubles. Using the integer values it passes, so we can be fairly confident that we don't have any issues on that platform.

I'm interested what others think - using a high precision float library would mean significant additional dependencies for the languages listed above, whereas just splitting the implementations into an integer-input function, and a float->integer conversion gives us improved testability and consistency.

bocops · 2025-01-02T09:46:04Z

Could you take a look at #670?

The function split looks good. One minor thing I notice is that conversion/sanitization is now split between the two functions. We're multiplying and adding MAX in the floating-point-input function for an integer value that should be in range [0..2*MAX] - but might not be, because adjusting and normalizing lat/lng happens later, in the integer-input function.

Keeping that there makes sense, because we likely don't want to fail for a recoverable input oddity - but while we're massaging the inputs anyway, we could also move the +MAX there, so that the expected inputs of the integer-input function are a better understandable lat*LAT_CONSTANT, lng*LNG_CONSTANT. Either way is ultimately fine, though.

Regarding function naming, I think I'd prefer some variant of the "precision" suffix you brought up earlier: encodeHighPrecision(), encodePrecise(), ...?

drinckes · 2025-01-03T13:44:53Z

I totally agree with moving the +MAX into the integer function, good idea.

I'm not sure about using precision in the encoding names - to me "high precision" is associated with longer codes, so how do you feel about encodeIntegers()?

(We are just bikeshedding at this point. 😁 )

bocops · 2025-01-06T16:20:01Z

Right, better not get lost in the details - encodeIntegers() works just as well. :)

bocops · 2025-01-09T09:23:35Z

Another sub-issue should be created for updating the OLC specification with the new integer encoding function.

drinckes self-assigned this Dec 27, 2024

drinckes mentioned this issue Dec 27, 2024

Use Mac for testing edge cases differing across processors #665

Open

drinckes added java python ruby c++ rust c labels Dec 27, 2024

drinckes mentioned this issue Dec 27, 2024

Tests failing on Mac M1 chip #652

Closed

drinckes changed the title ~~Improve encoding/decoding reliability with high precision library~~ Improve encoding/decoding consistency with high precision library Dec 27, 2024

drinckes changed the title ~~Improve encoding/decoding consistency with high precision library~~ Improve encoding/decoding consistency in tests Dec 27, 2024

drinckes mentioned this issue Dec 30, 2024

Evaluate fixed precision C tests #670

Closed

drinckes linked a pull request Dec 30, 2024 that will close this issue

Evaluate fixed precision C tests #670

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve encoding/decoding consistency in tests #669

Improve encoding/decoding consistency in tests #669

drinckes commented Dec 27, 2024

bocops commented Dec 27, 2024

drinckes commented Dec 27, 2024 •

edited

Loading

drinckes commented Dec 27, 2024

bocops commented Dec 28, 2024 •

edited

Loading

drinckes commented Dec 30, 2024

bocops commented Jan 2, 2025

drinckes commented Jan 3, 2025 •

edited

Loading

bocops commented Jan 6, 2025

bocops commented Jan 9, 2025

Improve encoding/decoding consistency in tests #669

Improve encoding/decoding consistency in tests #669

Comments

drinckes commented Dec 27, 2024

bocops commented Dec 27, 2024

drinckes commented Dec 27, 2024 • edited Loading

drinckes commented Dec 27, 2024

bocops commented Dec 28, 2024 • edited Loading

drinckes commented Dec 30, 2024

bocops commented Jan 2, 2025

drinckes commented Jan 3, 2025 • edited Loading

bocops commented Jan 6, 2025

bocops commented Jan 9, 2025

drinckes commented Dec 27, 2024 •

edited

Loading

bocops commented Dec 28, 2024 •

edited

Loading

drinckes commented Jan 3, 2025 •

edited

Loading