Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: quad-double precision. #184

Open
cosurgi opened this issue Jan 17, 2020 · 70 comments
Open

Feature request: quad-double precision. #184

cosurgi opened this issue Jan 17, 2020 · 70 comments

Comments

@cosurgi
Copy link

cosurgi commented Jan 17, 2020

Hi,

as suggested in math #303 I repost this here:

How feasible it is to integrate quad-double precision, It has 212 bits in significand and is faster than MPFR:

I have just recently integrated boost multiprecision with a fairly large numerical computation software YADE.

Float128 isn't enough for my needs, yade would greatly benefit from quad-double (as it is much faster than MPFR). Also yade can serve as a really large testing ground. I have implemented CGAL numerical traits and EIGEN numerical traits to use multiprecision. And all of them are tested daily in our pipeline, see an example run here. Or a more specific example for float128: test (need to scroll up) and check

@cosurgi
Copy link
Author

cosurgi commented Jan 17, 2020

@ckormanyos

How feasible it is to integrate quad-double precision...

This is a good question. I would combine that with also software support for float128_t for those platforms lacking quadmath. In fact, I would see full cross-plattform availability of a float128_t (be it hardware backend or software pure) as a prerequisite before going for a float256_t.

This definitely would be great. Especially if float128 will use compiler/assembler version when available.

Then there is the topic of design questions.

Exrtract float128_t/256_t from the same template?
Support other digit splits, etc.
Lots of questions.

Yeah, it definitely depends on the approach. The simplest one is of course to use qd-2.3.22.tar.gz verbatim and only add the necessary multiprecision wrapper. A much more interesting approach is to extract the design used to "quadruple" the precision in there. In such a way that quad-float128 would be simply another template of the same design. 512bits is exactly what I will need in my calculations.

@ckormanyos
Copy link
Member

ckormanyos commented Jan 17, 2020

A much more interesting approach is to extract the design used to "quadruple" the precision in there. In such a way that quad-float128 would be simply another template of the same design

I was thinking more of the binary128 and binary256 formats that are mentiod in IEEE754:2008. But these would be (at least in their portable and generic form) software-intensive and thus lose the hardware acceleration that is the very thing you seek. So that is not exactly what the original request is about.

The simplest one is of course to use qd-2.3.22.tar.gz verbatim and only add the necessary multiprecision wrapper...

This might unfortunately have liensing inconsistencies, in the sense that using the code directly with BSL might be problematic.

I think that binary128/256 is an idea we could talk about. But I mentioned previously, I think there would be several important design considerations. Another idea would be to create a set of popular backends intended to wrap a given type for the "number" template in Boost.Multiprecision. That might be a way to navigate through the licensing differences. I mean, you could even wrap the old FORTRAN77 REAL*16 that way.

I think I'd like to continue this discussion and see what possibilities there are.

Kind regards, Chris

@cosurgi
Copy link
Author

cosurgi commented Jan 17, 2020

Since binary128/256 would be software intensive I would prefer other approach. When hardware support appears in future CPUs adding such binary support should be fairly easy. As you said, it is not a matter of this thread.

a set of popular backends intended to wrap a given type for the "number" template in Boost.Multiprecision

This solution is much more interesting for me. I think that you have already done that with __float128 and mpfr:: (the file /usr/include/mpreal.h from the debian package libmpfrc++-dev), am I correct?

Also the idea to use the "quadrupling" method described in that paper is a worthwhile direction for me. Even if it means completely reimplementing this idea in boost from scratch. This will take a lot of time though.

So I could summarise following:

  1. backend intended to wrap quad-double. The user has to install and configure qd-2.3.22.tar.gz by himself, but then boost would immediately "understand" and wrap it.
  2. "quadrupling" precision approach, a pure boost-only solution.

@ckormanyos
Copy link
Member

ckormanyos commented Jan 18, 2020

a set of popular backends intended to wrap a given type for the "number" template in Boost.Multiprecision

  1. backend intended to wrap quad-double. The user has to install and configure qd-2.3.22.tar.gz by himself, but then boost would immediately "understand" and wrap it.

Correct. There are already Boost.Multiprecision "float" backend wrappers for __float128/quadmath, GMP/MPIR, MPFR and an "int" wrapper for tommath, and these require that the user environment is able to find and link to the required target-specific lib.

I'm not sure what it means for an existing UDT number library to be important/cool/special enough to be wrapped with a supported backend in Boost? I'm also a bit hesitant to directly use DD/QD prototypes when looking at the licensing of the original work. Nonetheless, DD and QD have been around for a while. In fact, I used them heavily about 10-12 years ago. Back then there was a different DD package with 106 binary digits. Maybe it was the predecessor of the work we are mentioning in this post.

Even though I'm not particularly fond of the "FP hack" required for x86 architecture in these packages, and I'm also not sure exactly how portable DD and QD are, these libs are quite useful.

This idea seems feasible to me.

Kind regards, Chris

@pabristow
Copy link
Contributor

The downside of double-double (and I presume of quad-double) has been shown to be some uncertainty estimating about epsilon and consequence difficulties in estimating precision and uncertainties.

The Apple/Darwin platforms all show 'different' behaviour in many Boost.Math tests.

So the 106 or 212 significand bits may be over-optimistic. If you value speed more then ...

@ckormanyos
Copy link
Member

The downside of double-double (and I presume of quad-double) has been shown to be some uncertainty estimating about epsilon and consequence difficulties in estimating precision and uncertainties.

Indeed. Interestingly enough, two of the greater challenges in potentially wrapping DD/QD and higher orders would be getting the types to behave like proper C++ types and figuring out how to mix the function name calls within BSL headers.

These are the two challenges I'm facing in another (unpublished) project that wraps the high precision type of a popular computer algebra system for Boost.Multiprecision.

Still, I think the DD/QD topic might be interesting to pursue in a feasibility study. Perhaps some of the epsilon inconsistencies could be addressed if the concepts of unity, comparison, frexp, ldexp are established more clearly in small add-on functionalities beyond the raw original implementation.

Kind regards, Chris

@jzmaddock
Copy link
Collaborator

This is a good idea, and in principle easy enough.

As mentioned above already, epsilon is likely to the the main sticking point, there are 2 possible definitions:

  • A strict interpretation of the std, is to pick the smallest number which can be added to 1 and yield a different result - this is numeric_limits<double>::min() or even denorm_min(). This is the definition that gcc on apple used, and it's pretty useless for any real work.
  • A loose definition, would be 2^(B-1) if we have B bits of precision (53*4 in this case). This is more useful generally, but can still lead to nasty surprises.

BTW some time back I tried pretty hard to get Boost.Math working with apples "double double" and just couldn't get everything working - so much of the reasoning about what happens to a number under operation X breaks for these types that it's a bit of a loosing battle. That said, we have since added cpp_dec_float and mpf_float both of which have hidden guard digits and seem to be working OK. In principle, these types aren't that different I hope.

@ckormanyos
Copy link
Member

How feasible it is to integrate quad-double precision, It has 212 bits in significand and is faster than MPFR:

This is a good idea, and in principle easy enough.

Thanks John, for your experienced advice. That's what I was seeking in terms of a kind of nod.

If there are no other blocking points, then I will write this up ASAP as a feasibility study for Boost GSoC 2020. The timing of this user-raised issue is, I believe, perfect for that potential venue.

Kind regards, Chris

@NAThompson
Copy link
Contributor

I'm curious about @cosurgi's use case. I feel like the current Boost.Multiprecision covers the intended use well as it stands, which only maybe a factor of 2 available for performance increase. But if you're using quad double, well, performance can't be a huge issue anyway.

@cosurgi
Copy link
Author

cosurgi commented Jan 18, 2020

@NAThompson even a two-times performance increase means waiting for result one month or two months :)

My use case will be quantum mechanics and quantum chromodynamics. Currently I am preparing the yade source code for all of this. The experimental measurements in some cases are more precise than 20 or even 30 significant digits. Performing high precision calculations is a big deal here. Since I plan to calculate volumes significantly larger than Planck's length I have estimated that my target required precision is about 150 digits, which is 512bits, And it has to be as fast as possible. I know it will take years to achieve, but that's the goal.

My current strategy to achieve this is to make sure that yade is 100% compatibile with boost multiprecision. So that when new faster and more precise types appear I will be able to instantly benefit from them. And hope that someday CPUs with native floating point 512 bit type will appear :)

@cosurgi
Copy link
Author

cosurgi commented Jan 18, 2020

BTW: MPFR is about 10-times slower. The difference in waiting one month and 10 months for a result becomes really meaningful.

@ckormanyos
Copy link
Member

MPFR is about 10-times slower. The difference in waiting one month and 10 months for a result...

So let's prefer the 1 month.

@ckormanyos
Copy link
Member

ckormanyos commented Jan 19, 2020

This is a good idea, and in principle easy enough.
As mentioned above already, epsilon is likely to the the main sticking point, ...

Yes.

BTW some time back I tried pretty hard to get Boost.Math working with apples "double double" and just couldn't get everything working

This comment rang a bell in my mind. So I mounted a drive from 2007. As it turns out, I developed my own version of q_float in March 2005. I had completely forgotten about this. It appears as though my original work used add/sub/mul/div routines originally from a developer known as K. Briggs in doubledouble. But these had subsequently been modified in a file called quad_float, that I had discovered in something at that time called WinNTL-5_3_2, which appears to live on to the present day in newer versions.

I got remarkably far with this code, and am somewhat surprised that I had forgotten about it. If we pursue this topic, I will be sharing my work in a smaller e-mail circle.

In that work, it looks like I simply empirically selected 31 for digits10 and calculated base-2 digits from that. Using this, I selected epsilon to simply be 10^-30. But I had curiously failed to implement any number for max_digits10.

Bset regards, Chris

@jzmaddock
Copy link
Collaborator

Interesting comments and use cases - thanks for this.

There's one other potential blocker that springs to mind: the special functions are kind of tricky to implement for types with large bit counts, but small exponent ranges. quad_float is perhaps the archetype here :(

There's an open bug for tgamma on this: #150, but it wouldn't surprise me if there aren't others lurking once you go hunting.

Question: can double_double be doubled? And doubled again etc? In other words could these be implemented via nested templates?

@ckormanyos
Copy link
Member

ckormanyos commented Jan 19, 2020

The simplest one is of course to use qd-2.3.22.tar.gz verbatim and only add the necessary multiprecision wrapper.

One thing I mentioned about this package is licensing of the code.

I believe that I am getting this interpretation. Not yet, however, fully sure... But I also notice that the x86-FPU-fix encloses on the top-level call mechanism all fucntions using the package. This means, as is confirmed in the DD/QD documentation, that the code will have a hard time being used in multithreading for x86. I would prefer to use a package that sets and resets the x86-FPU-fix within the individual subroutines that need it, thereby allowing for a synchronization mechanism if trying for multithreading.

So at the present, I am not aware of an existing kind of DD or QD that would fully satisfy my personal requirements. That being said, one could focus on a popular one. But there seem to be several popular ones dating all the way back to doubledouble that I had mentioned above.

Best regards, Chris

@ckormanyos
Copy link
Member

ckormanyos commented Jan 19, 2020

Question: can double_double be doubled? And doubled again etc? In other words could these be implemented via nested templates?

You read my mind. I think the answers are yes and yes. But I am also wary of such nested templates having once gotten poor results with an integer type using something like this. But that experience was more than 20 ago, I was less experienced at programming C++ back then and compilers got much (massively much) better at parsing nested templates since then.

@jzmaddock
Copy link
Collaborator

But I also notice that the x86-FPU-fix encloses on the top-level call mechanism all fucntions using the package

Wouldn't that only be required for code executing on the old x87 co-processor, and not x64 code using the SSE registers?

@ckormanyos
Copy link
Member

But I also notice that the x86-FPU-fix encloses on the top-level call mechanism all fucntions using the package

Wouldn't that only be required for code executing on the old x87 co-processor, and not x64 code using the SSE registers?

Yes. I have also just sent you some work in another thread. In that work, the x86-FPU-fix is only needed for that and only for that legacy FPU.

@ckormanyos
Copy link
Member

write this up ASAP as a feasibility study for Boost GSoC 2020. The timing of this user-raised issue is, I believe, perfect for that potential venue.

This has been done near the bottom of this page
Boost.Multiprecision: Double-Double and Quad-Double Types.

@jzmaddock
Copy link
Collaborator

@cosurgi : There's a very tentative version in draft PR referenced above. I would welcome feedback if you'd like to give it a try and see how far you can get.

There are some issues, and some possibly controversial design choices:

  • I've fixed up add/subtract/divide to handle infinities correctly. However, for the ultimate performance, this is perhaps not such a good idea?
  • The one and only very basic test case, has a number of failures. string IO is horribly broken, and printing out the value 64 does not result in "64" for example. This is a direct consequence of some of the arithmetic operators - for example 2^10 does not lead to exactly 1024 :(
  • I've done no testing of special functions or Boost.Math support generally.
  • Many more tests are required, although there are likely to be so many failures from our existing boilerplated tests it's hard to know quite how to proceed. Iostream round tripping isn't even worth trying - it will never work correctly IMO.

Anyhow, if you have any real world code you can plug this into, it would be interesting to hear how you get on. It's all a bit too "fast and loose" for me!

@cosurgi
Copy link
Author

cosurgi commented Feb 3, 2020

@jzmaddock I'm sorry for late reply. This looks very interesting. Yes I have a real-world code YADE with a full blown testing suite for math functions I will try to use it this or the next week.

Just to be sure - I should take this commit ?

The issues:

  • YADE heavily depends on correct handling of NaN and Inf. Without them CGAL does not work at all.
  • Ouch: string IO roundtripping is very important for me.
  • I have tests of all math functions, they are tested against python mpmath. I picked the tolerances by experimenting with the results. That's why string IO is important - strings are the only way to get high precision numbers into python.
  • Ouch again. Maybe we could craft something here? Even using cpp_bin_float as a helper type is acceptable for me. String IO was never supposed to be fast. It just should be accurate.

See some example math tests results for MPFR 150 - scroll up to see the math functions being tested by the earlier mentioned script.

@cosurgi
Copy link
Author

cosurgi commented Feb 3, 2020

Oh, btw if 2^10 is 1023.99999999(9)..... then it's acceptable for me, because the error will be smaller than the tolerance level. We only need the working round tripping.

@jzmaddock
Copy link
Collaborator

Oh, btw if 2^10 is 1023.99999999(9)..... then it's acceptable for me, because the error will be smaller than the tolerance level. We only need the working round tripping.

This is actually the reason that round tripping doesn't work - I mean it's approximately correct (to within an epsilon or so), but without a correctly functioning pow() we basically can't implement correct string output. Actually even the basic arithmetic operations aren't correctly rounded either... as long as string IO is approximately correct (to within an epsilon or so - ie drop one digit and it would round to the "correct" result), wouldn't your tests still function OK?

@cosurgi
Copy link
Author

cosurgi commented Feb 3, 2020

wouldn't your tests still function OK?

heh. I suppose that all math tests would work. But the round tripping test would fail ;) I suppose I could implement an exception for this one.

@jzmaddock
Copy link
Collaborator

Meanwhile.... there are a few issues with the current version - mostly order of initialization issues in numeric_limits<qd_real>, give me a day or so and I'll push a fix as they prevent tgamma and others from working.

@jzmaddock
Copy link
Collaborator

Nevermind, fix pushed. The C99 functions including tgamma etc are now working, note that they are not quite std conforming in that they don't adhere to the non-normative appendix F which deals with handling of infinities and NaNs.

@ckormanyos
Copy link
Member

See also this post

@ckormanyos
Copy link
Member

ckormanyos commented Aug 17, 2021

Work on quad has begun...

If you have been following this older thread, there is a lot of new progress on double/quad-float within the context of the GSoC21 (thanks @sinandredemption and @cosurgi). We now have a really cool draft of quad. We decided to implement a flexible template called

template<typename FloatingPointType>
class cpp_quad_float;

which can be instantiated for float, double, long double or boost::multiprecision::float128. This class quadruples these floating-point types basically in terms of precision.

If you get the fork here, you can use <boost/multiprecision/cpp_quad_float.hpp> and some of its convenience types such as boost::multiprecision::cpp_quad_dbl. A sample is shown below.

If you like, try quad-float on your favorite function or part of Boost.Math. We would like to hear field reports.

I like using boost::math::tgamma(). The sample below shows a sample of this.

#include <iomanip>
#include <iostream>

#include <boost/math/constants/constants.hpp>
#include <boost/math/special_functions/gamma.hpp>
#include <boost/multiprecision/cpp_quad_float.hpp>

int main()
{
   using big_float_type = boost::multiprecision::cpp_quad_dbl;

   // N[Sqrt[Pi], 201]
   // 1.77245385090551602729816748334114518279754945612238712821380778985291128459103218137495065673854466541622682362428257066623615286572442260252509370960278706846203769865310512284992517302895082622893210

   std::cout << std::endl << "represent_tgamma_half" << std::endl;

   const big_float_type g    = boost::math::tgamma(big_float_type(0.5F));
   const big_float_type ctrl = sqrt(boost::math::constants::pi<big_float_type>());

   const big_float_type delta = fabs(1 - (g / ctrl));

   const std::streamsize original_streamsize = std::cout.precision();
   std::cout << std::setprecision(std::numeric_limits<big_float_type>::digits10) << g    << std::endl;
   std::cout << std::setprecision(std::numeric_limits<big_float_type>::digits10) << ctrl << std::endl;
   std::cout << std::scientific << std::setprecision(4) << delta << std::endl;
   std::cout.precision(original_streamsize);
   std::cout.unsetf(std::ios::scientific);

   const bool result_is_ok = (delta < std::numeric_limits<big_float_type>::epsilon() * 40U);

   std::cout << "result_is_ok: " << std::boolalpha << result_is_ok << std::endl;

   return (result_is_ok ? 0 : -1);
}

Cc: @jzmaddock, @mborland, @NAThompson, @pabristow

@pabristow
Copy link
Contributor

pabristow commented Aug 24, 2021 via email

@ckormanyos
Copy link
Member

Would the normal math tests – but including type boost::multiprecision::cpp_quad_dbl be a good starting test?

Hi @pabristow thank you for following up on this. Actually, there are quite a few dedicated Multiprecision tests in the standard Multiprecision CI which exercise both the tape's mathematical functions and operations as well as certain special functions from Math.

We are in the process of using/adapting these for the reduced range(s) of double/quad-float. I thik we have about 1/3 or more of the tests running... So we are moving forward on that. We made a dedicated double/quad-float CI for this projec only in the GSoC timeframe so we have been running tests on pushes, PR and also a nightly CI.

@mborland
Copy link
Member

@ckormanyos Seeing as though we did not participate in GSOC 2022 what still needs to happen on the GSOC 2021 fork to bring quad-double to MP?

@ckormanyos
Copy link
Member

ckormanyos commented Aug 15, 2022

what still needs to happen on the GSOC 2021 fork to bring quad-double to MP

Hi Matt (@mborland) we got a good start, but a lot still remains to do for getting Multiprecision-Ready.

Together with Janek (@cosurgi) and our excellent GSoC '21 colleague Fahad (@sinandredemption) we got a good start on the basic algorithms and functionality.

We implemented this thing with double-float and quad-float whereby instantiation with float, double, long double or even __float128 is foreseen.

Quad-float is tricky since it has something like around 30 decimal digits but an exponent range of 39. Otherwise, the design was strong.

From the top of my mind, we still need to...

  • Verify the design or choices made in numeric limits.
  • Ensure that algorithms add, sub, mul, div, sqrt are properly and efficiently implemented.
  • Implement more eval_ functions. In this design we dicided to first go for standalone implementations, then subsequently hook them into number via eval_ functions. If we (or anyone) started again, one might go directly to the eval_ functions, as standalone is not really needed in the design.
  • Optimize elementary transcendentals for these low digit counts.
  • Integrate the backends into a lot of the existing standard tests. We did some testing hook ups to standard tests of sine, cosine, exp and the like but the list of these and their content are somewhat far from complete.

This was (and is) a really cool project and it would be good to make some progress. I remember toward the end we did some really preliminary benchmarking and were a bit disappointed with the performance of quad double.

Perhaps Fahad has some clear recollections of the state of open points.

We also have some open issues and a few branches. They might handle the merge from main: I'll see if a good branch can be rebased or merged from main into branch tonight.

Cc: @jzmaddock

@ckormanyos
Copy link
Member

ckormanyos commented Dec 28, 2022

I'll see if a good branch can be rebased or merged from main into branch

Hi Matt (@mborland) I was able to successfully merge the develop branch from BoostOrg into the GSoC double-double/quad-double GSoC21 repository.

At the moment, only a few elementary function tests and (most of) the arithmetic tests run in a special CI process that I wrote for that project. The branches were about 500 commits behind BoostOrg, so I took a bit of time to merge.

At the moment, the three branches develop, gsoc2021_double_float_chris and gsoc2021_double_float have been synchronized with both each other as well as with develop from BoostOrg. Any of these branches can be used to get an overview of what's going on.

If I get a chance in the upcoming year, I'd like to get back to this project and maybe see if double-double or quad-double can be finished (enough) for productive use. We still have to work out the edge cases and get all the elementary function tests running.

If anyone has interest, time, ideas, thoughts, please feel free to add/join-in

Cc: @cosurgi and @mborland and @jzmaddock and @sinandredemption

@ckormanyos
Copy link
Member

The fork in its updated state can be found here.

Cc: @mborland and @jzmaddock

@jzmaddock
Copy link
Collaborator

Thanks Chris! As you say, it was really just the corner cases and adequate testing that was holding this back, would be nice to see it finished off...

@ckormanyos
Copy link
Member

ckormanyos commented Jan 4, 2023

Hi @cosurgi and @sinandredemption

  • I spent a lot of time standardizing double-float and the new work behaves quite well with respect to the requirements of a Boost.Multiprecision type. Ultimately I decided to first handle cpp_double_fp_backend and when this is production-ready, then and only then move forward to cpp_quad_fp_backend
  • I invested a lot of time recently in NaNs, inf, zeros and implementing the needed eval_-whatever-functions. I also added a lot of (fork-local) tests including elementary functions, arithmetic tests, float-I/O, complex adaption and more.
  • I went for implementation of rudimentary constexpr-ness and targeted C++14.
  • Some mysterious behaviors remain such as with 10-byte long double, round tripping and spurios tolerance mysteries, but these are limited and the behavior is so far quite good.
  • Below you will find a sample of the cool things you can do with number<cpp_double_double> for instance.
  1. At this time, I could actually use some help implementing the full testing including spec-fun.
  2. I would also like to optimize some more eval_-functions such as trig functions (I did optimize exp() and log()).

Can anyone actually help me with 1?
Is anyone interested in contributing to 2?

Cc: @mborland and @jzmaddock

#include <iomanip>
#include <iostream>

#include <boost/math/constants/constants.hpp>
#include <boost/math/special_functions/gamma.hpp>
#include <boost/multiprecision/cpp_double_fp.hpp>

int main()
{
  using double_double_type = boost::multiprecision::cpp_double_double;

  constexpr double_double_type a = 1.5;
  constexpr double_double_type b = 3.5;
  constexpr double_double_type c = a * b;
  constexpr double_double_type d = b / a;

  // Test rudimentary compile time constexpr.
  static_assert(c == 5.25);
  static_assert(d < 3);

  // Test tgamma(), sqrt() and pi<>().

  double_double_type tg = boost::math::tgamma(double_double_type(0.5));

  const auto flg = std::cout.flags();

  std::cout << std::setprecision(static_cast<std::streamsize>(std::numeric_limits<double_double_type>::digits10))
            << std::fixed
            << tg
            << std::endl;

  std::cout << std::setprecision(static_cast<std::streamsize>(std::numeric_limits<double_double_type>::digits10))
            << std::fixed
            << sqrt(boost::math::constants::pi<double_double_type>())
            << std::endl;

  const auto relative = fabs(1 - (tg / sqrt(boost::math::constants::pi<double_double_type>())));

  const auto result_tg_is_ok = (relative < (std::numeric_limits<double_double_type>::epsilon() * 10));

  const auto result_is_ok = result_tg_is_ok;

  std::cout << "result_is_ok: " << std::boolalpha << result_is_ok << std::endl;

  std::cout.flags(flg);

  return (result_is_ok ? 0 : -1);
}

@sinandredemption
Copy link

Hello @ckormanyos

I would definitely want to contribute to #1. Please allow me some time to re-familiarize myself with the codebase, and setup my development environment.

@ckormanyos
Copy link
Member

I would definitely want to contribute to #1

Hi @sinandredemption (Fahad) I'm happy to see you on this thread.

At the present time, I am doing quite a few small, but significant changes reagrding use of NaNs, inf, limits, zeros and the 128-bit floating-point type.

I actually found that Multiprecision standalone is really clean regarding some of these items. I also just finished up using Boost.Multiprecision's near-native float128_type instead of the wrapped number type that we had used during the GSoC. This is a big round of changes. So in a short time, you can use cpp_double_float128 with a lot less overhead. In fact, I even removed the dependency for non-ANSI so you can use this type with eithe std=c++14,17,2a,... or std=gnu++14,17,2a.

One of the first significant testing problems I found has been in rounding near the split-point of the division between fisrt/second. So if you'd like to look at the rounding tests that would be a great place to start.

Cc: @cosurgi and @jzmaddock and @mborland

@ckormanyos
Copy link
Member

ckormanyos commented Jan 5, 2023

Hi @sinandredemption there is also another point regarding even more adaption to constexpr-ness. There are still some constexpr artifacts (like in numeric_limits) that are compile-time-evaluated yet non-constexpr. Often times this is because of the use of something like sqrtf(), sqrt(), sqrtl(), sqrtq() or similar for frexp(), ldexp().

This results in a handful of functions that could use some of Matt's (@mborland) constexpr-cmath faculties to go all the way constexpr on numeric_limits<>. This will possibly also apply to functions like eval_sqrt(), probably even eval_log() and eval_exp() too.

This could also be an interesting place to jump on and contribute. I can explain more in offline chat(s) if/when needed.

@ckormanyos
Copy link
Member

Introduction of the cpp_double_fp_backend<> template has begun in #515, but this will take a few revisions until complete.

If/when this finishes in a whle, I'd then suggest we move on with quad-float

Cc: @sinandredemption and @cosurgi and @jzmaddock and @mborland

@AuroraDysis
Copy link

If possible, could you support a greater variety of combinations?

For example, MultiFloats.jl is a very fast library that represents extended-precision numbers with 2x, 3x, ..., up to 8x the precision of double, which might be helpful.

@ckormanyos
Copy link
Member

could you support a greater variety of combinations?

Hi @AuroraDysis that looks interesting and many thanks for your interest in double/quad/multi-float(s).

During the GSoC and after that, I remember We/I were having trouble figuring out the exact algorithms for add/sub/mul/div and struggling with overflow/underfolw/NanNs. For the moment, this project is on the back burner.

Perhaps the Julia code you reference could give inshight into this. If I/we ever pick up this issue, we will be sure to have a look, but i fear this might have rather low priority at the moment.

@AuroraDysis
Copy link

@ckormanyos Thanks for your reply. I found multi-floats to be very useful in my research area and believe it will play an even more important role in the future. If you have the chance to pick up this project again and need some help, I'd like to contribute or collaborate with you.

As for double floats, DoubleFloats.jl is the most mature library I have ever used, although it is not as fast as MultiFloats.jl.

@ckormanyos
Copy link
Member

ckormanyos commented Jan 13, 2024

found multi-floats to be very useful in my research area and believe it will play an even more important role in the future. If you have the chance to pick up this project again and need some help, I'd like to contribute or collaborate with you.

The plan is to do this and look into multi-floats and ultimately get something adhering to Boost.Multiprecision/Math. If (which actually means when) I/we watm this up again, I'll be sure to look into your referencs and contact you regarding forward motion.

Thanks @AuroraDysis for your interest and willingness to help. When this gets moving again, we will be in touch.

@LegalizeAdulthood
Copy link

I'm not an expert in the details of floating-point math, but I'm pretty good at beating build systems into submission :), and I poked around on my fork of the existing QD library to make it build under CMake. If I can be of some assistance, please ping me.

@ckormanyos
Copy link
Member

ckormanyos commented Dec 13, 2024

If I can be of some assistance, please ping me

Well when we left off, there were some edge cases in the full test suite of Boost.Math's special functions when using the double-double and quad-double types. Some of these were probably resolved since there actually were problems at edge exponents of elliptic integrals and similar like Carlson funcs.

Aside from that, we had two competing implementations of mul/div and these seemed to slightly differ.

I would like to:

  • Find the right primitives for add/sub/mul/div/sqrt
  • Pass all of our local repo tests.
  • Pass the tests when our types are used in Boost.Math's specfun tests.

It might take a while to, let's say, warm this up, but we really weren't too far off the mark in the day.

I might have a slot to look into this in upcoming, but I really have a lot of time pressuer in life.

Kind regards @LegalizeAdulthood

Cc: @mborland and @jzmaddock and @cosurgi

@LegalizeAdulthood
Copy link

Is there a repo with a github wirkflow for the tests I can clone?

@ckormanyos
Copy link
Member

Is there a repo with a github wirkflow for the tests I can clone?

Yes certainly. Again i think we got pretty close to stability.

  • I'm torn between two worlds, the old double-double and newer implementations.
  • I also forgot to mention, we were getting stuck and tripped up on infs, NaNs and such edge cases.

So if we can just get stable operations and edge case handling this thing will be done. The formal testing and plug in to Multiprecisin was/is about done. but I need to get the branch and see if it still plays with Boost-Develop branch.

@ckormanyos
Copy link
Member

Is there a repo with a github wirkflow for the tests I can clone?

Yes Richard (@LegalizeAdulthood) we did this work in the 2021 Boost-GSoC.

The repo is here.

I'll take a few minutes ASAP to see if it shakes out well with merge develop into our branch and then we can start to boogy.

Cc: @mborland and @jzmaddock

@ckormanyos
Copy link
Member

OK @LegalizeAdulthood in this sync attempt I'm attempting to merge Boost.Develop Multiprecision into our branch at GSoC2021 to get this thing ready for action.

@ckormanyos
Copy link
Member

ckormanyos commented Dec 15, 2024

attempting to merge Boost.Develop Multiprecision into our branch at GSoC2021 to get this thing ready for action.

It worked rather well. But ONLY the cpp_double_fp_backend is exercised in CI at the moment. Quad is still pending.

The blaze-down on double-something is as follows:

  • Running only cpp_double_fp_backend (no quad backend in CI yet).
  • Most tests pass for double-float, double-double, double-long double and (if available) double-__float128.
  • Algebra, limits and all normal stuff such as elementary transcendental functions all pass.
  • There are five (5) files failing on specfun.
  • Of these specfun failures, some fail simple tolerances. Other failures are due to limited exponent range. These don't bother me.
  • Some very few selected specfun failures are due to confusion between zero and NaN or similar. These are pretty much the last remaining true problems in double-whatever.

I did, in fact, merge this to the develop of the GSoC branch even with certain failures of specfun.

So what does this mean?

  • It is difficult to reach the quality needed for Multiprecision backend.
  • But at least cpp_double_fp_backend is close to getting there.
  • Algebra, most elementary transcendental functions and limits work fine.
  • I will be pursuing the final edge cases that fail.
  • We need to review add/sub/mul/div/sqrt to ensure that the right, optimal algorithms are being used.
  • Edge cases of zeros/NaNs seem to need work.
  • Some specfun tests having huge exponential results need to be excluded for double-double testing.
  • Need to restore test_exp.cpp which I removed temporarily from the Jamfile.

Cc: @LegalizeAdulthood and @jzmaddock and @mborland

@ckormanyos
Copy link
Member

ckormanyos commented Dec 15, 2024

I found multi-floats to be very useful in my research area and believe it will play an even more important role in the future. If you have the chance to pick up this project again and need some help, I'd like to contribute or collaborate with you.

Hello @AuroraDysis based on interest from @LegalizeAdulthood as well as my own, I am, in fact, picking up this project again.

I think at the moment the project is struggling right at the double-double level to actually get the right algorithms for add/sub/mul/div/sqrt and a few more. I looked at the links to the Julia-based code you gave and the double-double primitives look quite good. Have you ever converted any of these DD/QD primitives to the C/C++ domain? I can start that if need be.

@AuroraDysis
Copy link

Hi @ckormanyos, I'm excited to hear about the renewed interest in this project.

I think at the moment the project is struggling right at the double-double level to actually get the right algorithms for add/sub/mul/div/sqrt and a few more. I looked at the links to the Julia-based code you gave and the double-double primitives look quite good. Have you ever converted any of these DD/QD primitives to the C/C++ domain? I can start that if need be.

As for your question, I have not yet had the opportunity to convert it to C/C++.

@ckormanyos
Copy link
Member

ckormanyos commented Dec 23, 2024

Is there a repo with a github wirkflow for the tests I can clone?

Hi @LegalizeAdulthood just for info, we are finishing the extreme edge-cases of the cpp_double_fp_backend<T> template at the moment. We have a few tiny issues with subnormals NaNs, infinities and zeros.

The type is, however, robust enough to use for, let's say, regular calculations such as those that do not over/underflow or reach NaNs.

Just for fun, I plugged the cpp_double_double backend having precision of 32 decimal digits into a full Mandelbrot calculation. I compared this with cpp_dec_float<32>. The double-double type is about twice as fast (see table below).

Some may consider this to be slow, as I do, compared to the expectations for this type. But I definitely think this type is worth pursuing for several reasons. The safety (infinities, NaNs, divide-by-zero, etc.) checks that make this type well-behaved also slow it down considerably. I could imagine a version that throws away checks and is specifically called unsafe. Furthermore, I feel that the double-double backend is/could-be particulalry well-suited for GPU programming if we ever get around to that. So the initial timing report indicates quite some potential down the road.

VERY Initial Timing Report

  • Full classic Mandelbrot image $2048{\times}2048$ pixels at 32 decimal digit precision.
  • cpp_double_double versus cpp_dec_float<32>
  • $2000$ iterations, half-width $1.35$, centered around the point $(0.0, -0.75i)$.
  • Using 15 cores multithreading on Intel Core-i9.
number type time (s)
cpp_double_double 27
cpp_dec_float<32> 52

Cc: @mborland and @sinandredemption and @cosurgi and @jzmaddock

@ckormanyos
Copy link
Member

ckormanyos commented Jan 5, 2025

So, we are just about done cleaning up and optimizing cpp_double_fp_backend. We have working cpp_double_float, cpp_double_double , cpp_double_long_double and (with the right compilers) cpp_double_float128.

I squeezed down the perf a bit on:

  • Full classic Mandelbrot image $2048{\times}2048$ pixels at 32 decimal digit precision.
  • cpp_double_double versus cpp_dec_float<32>
  • $2000$ iterations, half-width $1.35$, centered around the point $(0.0, -0.75i)$.
  • Using 15 cores multithreading on Intel Core-i9.
number type time (s)
cpp_double_double 19
cpp_dec_float<32> 52
cpp_bin_float<32, digit_base_10> 89

See the image below. And also note that low digits is one of the few and rare parameter ranges in which dec_float might have the upper-hand on bin_float perf. So cpp_couble_fp_backend<double> is really fast here.

There could/would be a lot more to push down if we enabled something like a template parameter:

const bool SkipEdgeChecks = false

Image

Cc: @sinandredemption and @cosurgi and @jzmaddock and @mborland

@ckormanyos
Copy link
Member

ckormanyos commented Jan 5, 2025

I have a good mind to finish the docs and try for a review among our authors and to get cpp_double_fp_backend into 1.88 in the early spring.

Cc: @sinandredemption and @cosurgi and @jzmaddock and @mborland

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests