Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make possible to not use fast math #29

Closed
illwieckz opened this issue Jul 28, 2022 · 7 comments · Fixed by #51
Closed

Make possible to not use fast math #29

illwieckz opened this issue Jul 28, 2022 · 7 comments · Fixed by #51

Comments

@illwieckz
Copy link
Member

illwieckz commented Jul 28, 2022

See:

He said:

Without fast math we get the same output for the same input on all the platforms and architectures.

Otherwise the shasum of output was different on different machines that compressed the same input.

So we may want to disable fast-math, and maybe even pass -nofast-math to be 100% sure some environment variables doesn't introduce it again. The related patch only affects the legacy Makefile so someone has to implement it in CMake as well.

@illwieckz
Copy link
Member Author

Some quotes from:

illwieckz dixit:

If I'm right, some “fast math” options may not only disable some checking, but also make possible to use some dedicated hardware that may even do more precise computation, and while that may not wrong, that may produce images with slight different colors and then checksum.

-ffast-math is more about selectively bypassing IEEE/ISO standard restrictions than it is to not care about precision.
-- https://discourse.llvm.org/t/rfc-deprecate-ofast/78687/26

IEEE talks about digital precision. For example, mul + add may not have the same binary answer as mla, so IEEE assumes precision is lost. But it’s often gained.
-- https://discourse.llvm.org/t/rfc-deprecate-ofast/78687/30

So, maybe the checksum being different is not the symptom of a bug. But it is probably expected that using fast math breaks reproducibility because then the math functions or even hardware don't have to be conformant to some IEEE standard and even things like level of precision may differ across software/hardware implementations.

For example with the Dæmon game engine we had to update some of our tests when we added an option to disable SSE, because then the x87 compute produced a slightly different result. It was not wrong, just the precision differs, actually SSE had higher precision than x87 so the result was 0.4261826 with SSE but 0.426183 with x87:

Since the tool is meant to produce distributable files, it looks to be a good idea to have build options guaranteeing the reproducibility of the result.

If someone implements a game engine that embeds libcrn to automatically convert PNG and JPG images to DDS/CRN and to store the generated DDS/CRN in a cache, it's probably fine to not care about reproducibility.

But when someone is implementing a toolchain like Urcheon for producing a distributable game with pre-computed DDS/CRN, this one may want to have a knob to enable reproducibility, even if at expense of spending more time at producing the released game.

I think I'll add to Dæmon's crunch a CMake option as a knob to favor reproducibility (and then disable fast math). This option will likely be enabled by default (ffast math disabled by default).

blaztinn dixit:

Yes, that is also the conclusion I got to (with regards to fast math optimizations).

We are using this lib to produce the artifacts at build time and we're caching them by the checksum on some server. For this use-case the fast math being disabled is an appropriate setting.

But I see how it can be beneficial to turn the fast math on if used in an app/game at runtime. I like your approach to using a build flag for it so the user of the lib can decide what to use.

illwieckz added a commit that referenced this issue Jun 26, 2024
- Add USE_LTO to enable LTO,
  enabled by default.
- Add USE_EXTRA_OPTIMIZATION to also enable -O3 when it is not used by default
  enabled by default.
- Add USE_FAST_MATH, to produce reproducible CRN files this should be disabled,
  disabled by default.
- Increase warning verbosity level.
- Generate maximum amount of debug information, including macro definitions.
- Always disable strict aliasing, the code requires it to always be disabled.

Fixes #29
@slipher
Copy link
Member

slipher commented Jun 26, 2024

What kind of "reproducibility" are we talking about here?

  • If you mean that a crunch built with the exact same compiler for the same target platform always produces the same results, I expect that should happen with any floating point options.
  • If you mean that a crunch built by any compiler for any platform produces the same results, that's probably a bad goal. You'd have to do very slow things like 100% IEEE conformance and -ffloat-store.

It sounds like blaztinn had a very specific use case like "we want to hit the cache most of time when building with any of the 3 versions of the compiler devs in our shop have installed right now" which doesn't generalize to most users.

@illwieckz
Copy link
Member Author

I would prefer if doing the release packages from the same source produce the same packages, be it on Linux on amd64 or on macOS on arm64.

@illwieckz
Copy link
Member Author

Another good example of how fast math may produce different things while not being wrong:

https://stackoverflow.com/questions/6430448/why-doesnt-gcc-optimize-aaaaaa-to-aaaaaa

Q:

What I am curious about is that when I replaced pow(a,6) with a*a*a*a*a*a using GCC 4.5.1 and options "-O3 -lm -funroll-loops -msse4", it uses 5 mulsd instructions:

movapd  %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13

while if I write (a*a*a)*(a*a*a), it will produce

movapd  %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm13, %xmm13

which reduces the number of multiply instructions to 3.

A:

Because Floating Point Math is not Associative. The way you group the operands in floating point multiplication has an effect on the numerical accuracy of the answer.
As a result, most compilers are very conservative about reordering floating point calculations unless they can be sure that the answer will stay the same, or unless you tell them you don't care about numerical accuracy. For example: the -fassociative-math option of gcc which allows gcc to reassociate floating point operations, or even the -ffast-math option which allows even more aggressive tradeoffs of accuracy against speed.

It is probably not a big problem to disable fast math to produce a release build of packages, while enabling fast math to produce nightly builds of the same packages.

@illwieckz
Copy link
Member Author

In #51 I make the usage of fast math optional (and also make sure it is also used with MSVC when enabled, unlike before):

Now another question is: should this option be enabled by default or not?

Maybe enabling it by default is not bad, people with specific need of reproducibility would look at the available options anyway.

illwieckz added a commit that referenced this issue Jun 26, 2024
- Add USE_LTO to enable LTO,
  enabled by default.
- Add USE_EXTRA_OPTIMIZATION to also enable -O3 when it is not used by default
  enabled by default.
- Add USE_FAST_MATH, to produce reproducible CRN files this should be disabled,
  enabled by default.
- Increase warning verbosity level.
- Generate maximum amount of debug information, including macro definitions.
- Always disable strict aliasing, the code requires it to always be disabled.

Fixes #29
@slipher
Copy link
Member

slipher commented Jun 26, 2024

Getting the same results on all systems with SSE instructions might be doable. The GCC default float handling, Visual Studio 2022+ /fp:precise, or Visual Studio (any version) /fp:strict should result in more or less direct translations of the source to SSE. It's the x87 extended precision stuff that makes things really bad.

It is probably not a big problem to disable fast math to produce a release build of packages, while enabling fast math to produce nightly builds of the same packages.

Doesn't sound like a great idea to me. Tests with a testing build become less relevant the more differences there are from a release build.

@illwieckz illwieckz changed the title Don't use fast math Make possible to not use fast math Jun 26, 2024
@illwieckz
Copy link
Member Author

I made the USE_FAST_MATH option introduced in #51 to be enabled by default.

Having the option makes possible for specific needs to disable that easily, while most people would just be happy to benefit from the fastest tool.

illwieckz added a commit that referenced this issue Jun 27, 2024
- Add USE_LTO to enable LTO,
  enabled by default.
- Add USE_EXTRA_OPTIMIZATION to also enable -O3 when it is not used by default
  enabled by default.
- Add USE_FAST_MATH, to produce reproducible CRN files this should be disabled,
  enabled by default.
- Increase warning verbosity level.
- Generate maximum amount of debug information, including macro definitions.
- Always disable strict aliasing, the code requires it to always be disabled.

Fixes #29
illwieckz added a commit that referenced this issue Jun 28, 2024
- Add USE_LTO to enable LTO,
  enabled by default.
- Add USE_EXTRA_OPTIMIZATION to also enable -O3 when it is not used by default
  enabled by default.
- Add USE_FAST_MATH, to produce reproducible CRN files this should be disabled,
  enabled by default.
- Increase warning verbosity level.
- Generate maximum amount of debug information, including macro definitions.
- Always disable strict aliasing, the code requires it to always be disabled.

Fixes #29
illwieckz added a commit that referenced this issue Jun 28, 2024
- Add USE_LTO to enable LTO,
  enabled by default.
- Add USE_EXTRA_OPTIMIZATION to also enable -O3 when it is not used by default
  enabled by default.
- Add USE_FAST_MATH, to produce reproducible CRN files this should be disabled,
  enabled by default.
- Increase warning verbosity level.
- Generate maximum amount of debug information, including macro definitions.
- Always disable strict aliasing, the code requires it to always be disabled.

Fixes #29
illwieckz added a commit that referenced this issue Jun 28, 2024
- Add USE_LTO to enable LTO,
  enabled by default.
- Add USE_EXTRA_OPTIMIZATION to also enable -O3 when it is not used by default
  enabled by default.
- Add USE_FAST_MATH, to produce reproducible CRN files this should be disabled,
  enabled by default.
- Increase warning verbosity level.
- Generate maximum amount of debug information, including macro definitions.
- Always disable strict aliasing, the code requires it to always be disabled.

Fixes #29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants