Skip to content

Latest commit

 

History

History
74 lines (64 loc) · 3.97 KB

README.md

File metadata and controls

74 lines (64 loc) · 3.97 KB

#Demystifying Floating Point John Farrier, CPPCon 2015

Slides and Source Code from "Demystifying Floating Point Numbers" given at CPPCon 2015.

##Notes

  • IEEE Single Format - 1 sign, 8 exponent, 23 fraction
  • IEEE Double Format - 1 sign, 11 exponent, 52 fraction
  • Has several rounding options (Java only permits "Round to Nearest")
  • Do not test for equality
  • Order of operations matters
  • The relative magnitudes of numbers involved in a computataion matters
  • Prefer to multiply, add, and subtract - divide only when necessary
  • Avoid elementary functions i.e. logarithmic, exponential, trigonometric, hyperbolic

##General Online References

##General Offline References

  • Hacker's Delight, (Warren)
  • Handbook of Floating-Point Arithmetic (Muller, Brisebarre)
  • Numerical Computing with IEEE Floating Point Arithmetic (Overton)

##Compiler Specific Information ###Visual Studio Microsoft Visual C++ Floating-Point Optimization https://msdn.microsoft.com/library/aa289157.aspx

####Compiler Flags

  • /fp:[precise | except[-] | fast | strict ]

####Pragmas

  • float_control( value,setting [push] | push | pop )
  • fenv_access [ON | OFF]
  • fp_contract [ON | OFF]

###GCC Math_Optimization_Flags Semantics of Floating Point Math in GCC https://gcc.gnu.org/wiki/FloatingPointMath https://gcc.gnu.org/wiki/Math_Optimization_Flags

####Compiler Flags

  • -ffinite-math-only
  • -fno-rounding-math
  • -fno-signaling-nans
  • -fno-signed-zeros
  • -fnotrapping-math
  • -frounding-math
  • -fsignaling-nans
  • -funsafe-math-optimizations
  • -funsafe-math-optimizations