-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Floating-point numbers are uglified, a.k.a. write shortest floating-point representation with round-trip guarantee #1289
Comments
+1 I am running into this issue too. Emitting a @jbeder Can you take a look? |
This sounds reasonable, open to PRs. But you'll have to be careful about the API here, you don't want unintended consequences. |
The default should be "use shortest representation with round-trip guarantee" (like fmt's There should also probably be member functions for reverting to shortest, like |
Ah, yes, #649 uglifies a lot. A quick look, fmt uses internally an adjusted copy* of dragonbox. I see following options:
"adjusted copy": a copy of the orginal code, but wrapped into an additional namespace, such that users of yaml-cpp still can depend on there own version of dragenbox. Implementing custom solution is out of scope of my time resources and abilities.
|
Thanks for taking a look, @SGSSGene. Looking at the relevant part of the changelog of fmt, it looks like @jk-jeon, the author of jk-jeon/dragonbox themselves, implemented the Dragonbox algorithm in fmt (fmtlib/fmt#1882, fmtlib/fmt#1887, fmtlib/fmt#1894). I think this is how fmt avoids the license mix issue. This might be a long-shot but @jk-jeon, is there any chance you could contribute a similar change to yaml-cpp? If not, I think that the second or third option may be the best options. I would like to note that whichever option we take, we should ensure that it does not break compatibility with the Bazel build system set up in this repository that is in parallel with the CMake build scripts, since yaml-cpp is both used within Google and by other Bazel projects. |
I will make a minimal draft PR as a discussion base, in the next few days. |
@davidzchen @SGSSGene @Anton3 First of all, thank you for having interests in my work! I recommend you to do any of the three options. Unfortunately I have no time to actually make a PR, but I can participate in maintaining the copy if you get one in your codebase. I can make PR's if I come up with some upgrades, or also you can poke me if you find some issues in it. |
@jk-jeon: I am a big fan of your work 😅 I just read your post about fixed precision floats: https://www.reddit.com/r/cpp/comments/1dv2nu1/the_complete_solution_for_floornxy_computation/ I have created a PR #1293 that utilizes dragonbox. It might not do it in the most performant way. Open to any suggestion. |
@SGSSGene Thanks 😊 I do not mind publishing code under MIT. I think maybe a short note about my agreement in a comment would be nice. I briefly looked at the PR, but I think it can cause some subtle rounding error. The problem here is that the result of the nearest-rounding of I don't think there is a simple way to work around it. You probably should just give up Also, adding some more precision when the exponent is positive is questionable as well. I didn't investigate it closely, but I wouldn't be surprised if that breaks roundtrippability. Maybe not? I don't know honestly. What |
@jk-jeon: yes I am aware that there might be rounding errors if not full precision is used. So for everyone, I think there are a few discussion points:
I believe that writing a complete custom formatter is out of scope, time-wise on my side. |
@SGSSGene That is not correct, because this shortest roundtrip business is a bit different from rounding. What it does is not trying to find the minimum number of digits such that the result of rounding at that digit will roundtrip. What it does is to find whatever number, which in principle can have nothing to do with rounding, which will roundtrip when fed to a correct parser. So, let's look at the example I brought: However, rounding the original floating-point number Yet, if you feed this into a correct parser, then it will interpret this number into not This sounds tricky, but it is not something super esoteric if you think about it: The roundtrip business is not supposed to find the closest decimal to |
@jk-jeon: Ah! yes, you are right. Ok. In this case I will write a simple custom formatter, that also supports scientific notation. How annoying 😅. |
I believe not having roundtrip guarantee by default after #649 is a bug.
This is a non-obvious API, but could work. In principle, some code could rely on there being exactly N digits after
It is important to switch to scientific notation for very small / very large numbers. What is not critical:
|
Welcome to floating-point hell! This monstrosity never lets me down 😂 So my understanding is that you want to allow the users to provide the precision argument, or the users also can ignore it and let it to be the default. My recommendation is, actually make it an overload set, so when the precision argument is absent, redirect the input to Dragonbox with a custom formatter (which simply adds trailing zeros if fixed-point format is chosen but it has insufficient amount of digits to fill out the integer part), and if the precision argument is present, redirect the input to As @SGSSGene pointed out, floff is supposed to do this precision-based printing, but (1) its implementation has never really polished, (2) if you use it along with Dragonbox you are pointlessly duplicating the same (but slightly different) table twice, and more importantly (3) it only does scientific printing and modifying it for other formats is very nontrivial. Boost.CharConv already did this modification (and also table deduplication IIRC), but if you don't need absolute performance and are okay with its locale-dependence, then And just to be sure: you can't rely on |
Thank you for all your great input.
#649 fixes the roundtrip guarantee, but is not producing the shortest string representation possible. Creating very ugly outputs.
I updated #1293.
Scientific notation will always use exactly 4 characters of the form To stay as close as possible to @jk-jeon: currently, I am assuming the exponent is always a value between -99 and 99 (only two digits), is this a valid assumption for double and float? (Will this still be valid for long double?)
|
@SGSSGene For binary32 ( https://en.wikipedia.org/wiki/IEEE_754 (see the table) Encoding of https://en.wikipedia.org/wiki/Extended_precision For both this format and for |
Add dragonbox to compute the required precision to print floating point numbers. This avoids uglification of floating point numbers that happen by default via std::stringstream. Numbers like 34.34 used to be converted to '34.340000000000003' as strings. With this version they will be converted to the string '34.34'. This fixes issue jbeder#1289
Add dragonbox to compute the required precision to print floating point numbers. This avoids uglification of floating point numbers that happen by default via std::stringstream. Numbers like 34.34 used to be converted to '34.340000000000003' as strings. With this version they will be converted to the string '34.34'. This fixes issue jbeder#1289
Add dragonbox to compute the required precision to print floating point numbers. This avoids uglification of floating point numbers that happen by default via std::stringstream. Numbers like 34.34 used to be converted to '34.340000000000003' as strings. With this version they will be converted to the string '34.34'. This fixes issue jbeder#1289
Add dragonbox to compute the required precision to print floating point numbers. This avoids uglification of floating point numbers that happen by default via std::stringstream. Numbers like 34.34 used to be converted to '34.340000000000003' as strings. With this version they will be converted to the string '34.34'. This fixes issue jbeder#1289
Add dragonbox to compute the required precision to print floating point numbers. This avoids uglification of floating point numbers that happen by default via std::stringstream. Numbers like 34.34 used to be converted to '34.340000000000003' as strings. With this version they will be converted to the string '34.34'. This fixes issue jbeder#1289
Add dragonbox to compute the required precision to print floating point numbers. This avoids uglification of floating point numbers that happen by default via std::stringstream. Numbers like 34.34 used to be converted to '34.340000000000003' as strings. With this version they will be converted to the string '34.34'. This fixes issue jbeder#1289
@SGSSGene I might get around to check the PR on my code later this week. Initial comments:
|
I'm not a repo maintainer, but a common practice is to put a patch/diff of your modifications of |
@Anton3: thank you for your feedback. Looking forward to your results. As a reaction to your feedback I did the following:
|
@SGSSGene Finally got around to looking at the PR. Everything is OK now. Running it through my codebase, everything works, but I found some broken tests like: EXPECT_EQ(ToString(yaml), "... 0.00200000009 ..."); The error message here says that the value on the left is actually So I'll have to fix those tests manually, but this is exactly the improvement we want. |
Add dragonbox to compute the required precision to print floating point numbers. This avoids uglification of floating point numbers that happen by default via std::stringstream. Numbers like 34.34 used to be converted to '34.340000000000003' as strings. With this version they will be converted to the string '34.34'. This fixes issue jbeder#1289
Add dragonbox to compute the required precision to print floating point numbers. This avoids uglification of floating point numbers that happen by default via std::stringstream. Numbers like 34.34 used to be converted to '34.340000000000003' as strings. With this version they will be converted to the string '34.34'. This fixes issue #1289
In the last few years, libraries like
fmt
have mastered printing of floating-point numbers. They use shortest representation with round-trip guarantee.Meanwhile, yaml-cpp, after #649, started to uglify numbers in my configs (I use yaml-cpp to patch them).
For example, before:
After:
The text was updated successfully, but these errors were encountered: