This library is an experiment in generalizing the famed "fast inverse square root" hack to other exponents. For example:
- Fast cubic root
- Fast inverse 4th root
- More/less precise variations
- Fast roots optimized for mean-square error instead of worst-case error
Fast roots are useful in performance-intensive applications where a small amount of relative error is tolerable. Modern CPUs offer intrinsic support for sqrt(y)
, making a fast approximation unnecessary. The other approximations in this library often outperform their standard equivalents, however, including 1/sqrt(y)
and especially calls to pow
.
This library defines root approximations in terms of four parameters:
- Root Index
N
(integer) - Magic Constant
K
(integer) - Number of Newtonian refinement steps
R
(unsigned integer) - Pseudo-Newtonian Constant
M
(floating-point, close to 1/N)
The Nth root of a floating-point value y
is approximated using these steps:
- Reinterpret the bits of
y
as an integeri
. - Calculate
K + i / N
and reinterpret it into a floating-pointx
, our initial estimate. - Apply
R
pseudo-Newtonian refinements, improving our estimate's accuracy.x *= (1-M) + M * y / x^(1/p)
- Return
x
.
For example:
float fastinvsqrt(const float y)
{
/* 1 */ union {float x; int32_t i;}; x = y;
/* 2 */ i = 0x5f32a121 - (i >> 1);
/* 3 */ x *= 1.5351f - 0.535102f * y * (x*x);
/* 4 */ return x;
}
For more information about how this works, see the References section below.
Fast roots are conventionally optimized to minimize worst-case relative error. Applications exist, however, where it may be better to minimize mean error, mean-squared error or error when y
is very close to 1.
In this library, designs are evaluated using these criteria:
- Relative computation time
- Root-mean-square relative error
sqrt(mean: (approx_f(y) - f(y))^2 / y)
- Mean relative error
mean: (approx_f(y) - f(y)) / y
- Worst-case relative error
(approx_f(y) - f(y)) / y
This table lists the best constants I've found for Nth roots with a single pseudo-Newtonian refinement. For example, N=2
corresponds to the square root while N=-2
corresponds to the inverse square root.
All error values in this table are based on the relative error (f(y) - y^(1/N)) / y^(1/N)
. The error for which the function was optimized is marked in bold.
Error values were calculated exhaustively for these constants; I believe these are the most accurate designs for this parameterization.
N | 0 Refinements | 1 Refinement | 2 Refinements |
---|---|---|---|
+2 | k: 0x1fbb4f2e m: n/a — error — max: .0347475 RMS: .0190506 mean: –.005133 |
k: 0x1fbed49a m: +.510929 — error — max: .000239058 RMS: .000148007 mean: –.0000454788 |
k: 0x1fbb75ad m: +.500122 — error — max: 1.68567e-7 RMS: 4.7967e-8 mean: –1.41369e-8 |
–2 | k: 0x5f37642f m: n/a — error — max: .0342129 RMS: .0244769 mean: +0.01338 |
k: 0x5f32a121 m: -.535102 — error — max: .000773445 RMS: .000494072 mean: –.000025526 |
k: 0x5f3634f9 m: -.501326 — error — max: 1.40452e-6 RMS: 8.95917e-7 mean: +6.21959e-7 |
+3 | k: 0x2a510680 m: n/a — error — max: .0315547 RMS: .0180422 mean: +.00321401 |
k: 0x2a543aa3 m: +.347252 — error — max: .000430098 RMS: .000237859 mean: –.0000977778 |
k: 0x2a4fcd03 m: +.333818 — error — max: 6.45394e-7 RMS: 2.90881e-7 mean: –2.40255e-7 |
–3 | k: 0x54a232a3 m: n/a — error — max: .0342405 RMS: .0195931 mean: +.0068072 |
k: 0x549da7bf m: -.364707 — error — max: .00102717 RMS: .000742809 mean: +.000277928 |
k: 0x54a1b99d m: -.334677 — error — max: 2.18458e-6 RMS: 1.04454e-6 mean: +6.17437e-7 |
+4 | k: 0x2f9b374e m: n/a — error — max: .0342323 RMS: .015625 mean: +.00425431 |
k: 0x2f9ed7c0 m: +.266598 — error — max: .000714053 RMS: .000444122 mean: –.000195135 |
k: 0x2f9b8068 m: +.250534 — error — max: 9.49041e-7 RMS: 5.28477e-7 mean: –4.76837e-07 |
–4 | k: 0x4f58605b m: n/a — error — max: .0312108 RMS: .020528 mean: +.00879536 |
k: 0x4f542107 m: -.277446 — error — max: .00110848 RMS: .000733642 mean: +.000175304 |
k: 0x4f58020d m: -.251282 — error — max: 2.76944e-6 RMS: 1.3487e-6 mean: +5.97779e-7 |
See root_cellar_generated.h
for reference implementations of these functions.
These designs were found using a fast approximation of the maximum error. There is room to improve these constants (with extremely marginal benefit).
N | 0 Refinements | 1 Refinement | 2 Refinements |
---|---|---|---|
+2 | k: 0x1ff769e5b00cb024 m: n/a error max: .0347474 |
k: 0x1ff7da9258189b10 m: +.51093 error max: .000238945 |
k: 0x1ff76e33f8e94831 m: +.500124 error max: 3.08405e-8 |
–2 | k: 0x5fe6ec85e7de30da m: n/a error max: .0342128 |
k: 0x5fe65423e81eece9 m: -.535103 error max: .00077328 |
k: 0x5fe6bbf0c11e182d m: -.501434 error max: 1.36764e-6 |
+3 | k: 0x2a9f76253119d328 m: n/a error max: .0315546 |
k: 0x2a9fdca8d39b1833 m: +.347251 error max: .000429969 |
k: 0x2a9f5317d3f76c27 m: +.333791 error max: 4.7027e-7 |
–3 | k: 0x553ef0ff289dd794 m: n/a error max: .0342405 |
k: 0x553e5fa2bf4bb94e m: -.364707 error max: .001027 |
k: 0x553eb1a359e5ec49 m: -.335169 error max: 3.77555e-6 |
+4 | k: 0x2ff366e9846f3cf9 m: n/a error max: .0342321 |
k: 0x2ff3daf850a16998 m: +.266598 error max: .00071393 |
k: 0x2ff3578de1c1dc42 m: +.250729 error max: 1.41358e-6 |
–4 | k: 0x4feb0c0b7fa996ad m: n/a error max: .0312107 |
k: 0x4fea8420dfe0c1b2 m: -.277446 error max: .0011083 |
k: 0x4feaff5406bb3437 m: -.251281 error max: 2.61417e-6 |
I decided to research fast roots for applications in signal processing and graphics rendering — and as a fun distraction from more intensive research work. I got in way over my head.
It is possible to use methods like the ones presented here to estimate any fixed exponent, such as y^(3/2)
or y^8
. It is also possible to create an approximate pow
function with a similar mechanism, and one can be found in the fastapprox library.
When using more than one Newtonian refinement, more accurate designs may be possible by varying the pseudo-Newtonian constant used for each refinement. I have not explored this possibility as 3-dimensional configurations or higher would require more sophisticated solvers or a lot of computation time.
0x5f3759df (Christian Plesner Hansen) on the theory behind the hack, derivation of the "magic constant" and generalizing the approximation to exponents between -1 and 1.
Understanding Quake's Fast Inverse Square Root (BetterExplained) includes an explanation of the algebra behind the Newtonian refinement used in the hack.
Inverse Square Root (University of Waterloo) describes how we can reduce worst-case error by using a modified Newtonian refinement with a baked-in multiplier.
Improving the Accuracy of the Fast Inverse Square Root Algorithm (Walczyk et al) discusses additional techniques for improving accuracy, beyond the ones presented here.
Copyright 2019 Evan Balster
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.