diff --git a/README.md b/README.md index d693ac8..1cb37e0 100644 --- a/README.md +++ b/README.md @@ -93,6 +93,7 @@ Currently DEFY collection includes: + [goto is fastest](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest): + [goto if select comparison 1](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_1); + [goto if select comparison 2](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_2); + + [goto if select comparison 3](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_3); + [goto if block comparison 1](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_block_comparison_1); + [powers naive definitions have overhead](https://github.com/szaghi/DEFY/tree/master/src/powers_naive_definitions_have_overhead): + [powers 1](https://github.com/szaghi/DEFY/tree/master/src/powers_naive_definitions_have_overhead/powers_1): diff --git a/src/goto_is_fastest/README.md b/src/goto_is_fastest/README.md index 5a5ee45..a4b980b 100644 --- a/src/goto_is_fastest/README.md +++ b/src/goto_is_fastest/README.md @@ -4,7 +4,7 @@ #### Myth example -The myth states that +The myth states that for a genric (possible randomic) select value, the `goto`-based branching-flow ```fortran goto (10, 20, 30), selector @@ -18,7 +18,7 @@ goto 40 40 continue ``` -is compiled into a **faster** branching-flow than +is compiled into a **faster** selector than ```fortran select case(selector) @@ -41,7 +41,14 @@ elseif (selector==3) end if ``` -The myth originates from the old-good days when other branching-flow models (e.g. `if elseif` and `select case`) were added to the language (the early Fortran 90 implementations) alongside `goto`: *probably* the early compilers implementations supporting the *new* (for those days) branching models were not able to optimized the compiled selection based on the models as well as they did for the very-well supported (computed) `goto` model. +The myth originates from the old-good days when other branching-flow models (e.g. `if elseif` and `select case`) were added to the language (the early Fortran 90 implementations) alongside `goto`: *probably* the early compilers implementations supporting the *new* (for those days) branching models were not able to optimized the compiled selection based on that models as well as they did for the very-well supported (computed) `goto` model. + +#### Variants + +The simple branching-flow afore described is analyzed for also some variants: + ++ *flushed* branching-flow: the selector is used to find only the first worker to call, but also all other subsequent workers are called; this is intended to flavor `goto` that follow this bias without the need of *nested checks*; ++ *probability-ordered* branching-flow: the selector values are (pre) ordered into a list from the most probable (to be called) selector value to the most improbable; this is intended to help the optimizer to guess (e.g. pre-fetching) the next most probable branch. ### Demystified @@ -63,8 +70,9 @@ The presupposed `goto` higher performance is a **myth** nowadays. Moreover, `got ### DEFY Tests DEFY provides the following tests for this myth demystification: -+ [goto if select comparison 1](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_1); -+ [goto if select comparison 2](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_2); -+ [goto if block comparison 1](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_block_comparison_1). ++ [goto if select comparison 1](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_1): the baseline test; ++ [goto if select comparison 2](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_2): a variant of the baseline test proposed by FortranFan; ++ [goto if select comparison 3](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_3): the baseline variation using pre-ordered most-probable selector values list; ++ [goto if block comparison 1](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_block_comparison_1): the baseline variation with *flushed flow* bias. See their README.md to see the results obtained. diff --git a/src/goto_is_fastest/goto_if_block_comparison_1/README.md b/src/goto_is_fastest/goto_if_block_comparison_1/README.md index 0f64a74..c736213 100644 --- a/src/goto_is_fastest/goto_if_block_comparison_1/README.md +++ b/src/goto_is_fastest/goto_if_block_comparison_1/README.md @@ -1,8 +1,16 @@ ### Goto-if elseif-select case performance comparison, test 1 -This test compare (computed) `goto` with `if` branching-flow construct. The selector for the branching-jump is computed pseudo-randomically and the *work* done inside the *workers* called by each branch is not uniform. +This test compare (computed) `goto` with `if` and `block (if)` branching-flow constructs. -This is a modification of [goto-if elseif-select case](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_1) test proposed by Ron Shepard (select case is not considered into this test, rather the `block` construct). Essentially, the branching-flow is now *flushed*: the selector selects *from which keyword* start to call the workers and call not only the worker corresponding to that keyword, but also all subsequent workers, e.g. +> The selector for the branching-jump is computed pseudo-randomically. + +> The *work* done inside the *workers* called by each branch is not uniform rather it depends on keywords value. + +This is a modification of [goto-if elseif-select case](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_1) test proposed by Ron Shepard and further improved by FortranFan. + +> Select case is not considered into this test (because it generates highly-nested branching-flow less clear than the others), rather the `block` construct. + +Essentially, the branching-flow is now *flushed*: the selector selects *from which keyword* to start to call the workers and call not only the worker corresponding to that keyword, but also all subsequent workers, e.g. ```fortran goto (1, 2, 3), keyword @@ -26,7 +34,7 @@ selector: block end block selector ``` -In this case the `goto` should actually be advantaged, although the tests performed confirm again that the performance are almost identical. +In this case the `goto` should actually be advantaged, although the tests performed confirm (again) that the performance are almost identical. ### Run test @@ -39,9 +47,11 @@ Four bash scripts are provided to run the test: ### Results obtained -|Compiler|Optimizations|Architecture | goto | if |block | -|--------|-------------|-----------------------------------------------------|-----------|-----------|-----------| -| GNU | yes |Intel Xeon X5650@2.67GHz, 24GB RAM, x86_64 Arch Linux|0.5480^10-4|0.5480^10-4|0.5480^10-4| -| GNU | no |Intel Xeon X5650@2.67GHz, 24GB RAM, x86_64 Arch Linux|0.7578^10-3|0.7578^10-3|0.7578^10-3| -| Intel | yes |Intel Xeon X5650@2.67GHz, 24GB RAM, x86_64 Arch Linux|0.5228^10-4|0.5237^10-4|0.5237^10-4| -| Intel | no |Intel Xeon X5650@2.67GHz, 24GB RAM, x86_64 Arch Linux|0.9449^10-3|0.9550^10-3|0.9550^10-3| +|Compiler |Optimizations|Architecture | goto | if |block | +|----------------------|-------------|-----------------------------------------------------|-----------|-----------|-----------| +| GNU (6.2.0, 64bit) | -O3 |Intel Xeon X5650@2.67GHz, 24GB RAM, x86_64 Arch Linux|0.5480^10-4|0.5480^10-4|0.5480^10-4| +| GNU (6.2.0, 64bit) | -Og |Intel Xeon X5650@2.67GHz, 24GB RAM, x86_64 Arch Linux|0.7578^10-3|0.7578^10-3|0.7578^10-3| +| Intel (16.0.3, 64bit)| -O3 |Intel Xeon X5650@2.67GHz, 24GB RAM, x86_64 Arch Linux|0.5228^10-4|0.5237^10-4|0.5237^10-4| +| Intel (16.0.3, 64bit)| -O0 |Intel Xeon X5650@2.67GHz, 24GB RAM, x86_64 Arch Linux|0.9449^10-3|0.9550^10-3|0.9550^10-3| +| GNU (7.0.0, 32bit) | -?? |Intel Core i5-2400@3.10GHz, 4GB RAM, Windows 64-bit |0.1357^10-3|0.1356^10-3|0.1356^10-3| +| Intel (17.0.0, 64bit)| -?? |Intel Core i5-2400@3.10GHz, 4GB RAM, Windows 64-bit |0.4650^10-4|0.4400^10-4|0.4400^10-4| diff --git a/src/goto_is_fastest/goto_if_select_comparison_1/README.md b/src/goto_is_fastest/goto_if_select_comparison_1/README.md index 93b799d..af0fe2b 100644 --- a/src/goto_is_fastest/goto_if_select_comparison_1/README.md +++ b/src/goto_is_fastest/goto_if_select_comparison_1/README.md @@ -1,6 +1,10 @@ ### Goto-if elseif-select case performance comparison, test 1 -This test compare (computed) `goto` with `if elseif` and `select case` branching-flow constructs. The selector for the branching-jump is computed pseudo-randomically and the *work* done inside the *workers* called by each branch is not uniform. +This test compare (computed) `goto` with `if elseif` and `select case` branching-flow constructs. + +> The selector for the branching-jump is computed pseudo-randomically. + +> The *work* done inside the *workers* called by each branch is not uniform rather it depends on keywords value. ### Run test @@ -13,9 +17,9 @@ Four bash scripts are provided to run the test: ### Results obtained -|Compiler|Optimizations|Architecture | goto | if elseif | select case | -|--------|-------------|-----------------------------------------------------|-----------|-----------|-------------| -| GNU | yes |Intel Xeon X5650@2.67GHz, 24GB RAM, x86_64 Arch Linux|0.3852^10-4|0.3856^10-4| 0.3857^10-4 | -| GNU | no |Intel Xeon X5650@2.67GHz, 24GB RAM, x86_64 Arch Linux|0.5788^10-3|0.5778^10-3| 0.5783^10-3 | -| Intel | yes |Intel Xeon X5650@2.67GHz, 24GB RAM, x86_64 Arch Linux|0.3896^10-4|0.3913^10-4| 0.3905^10-4 | -| Intel | no |Intel Xeon X5650@2.67GHz, 24GB RAM, x86_64 Arch Linux|0.5796^10-3|0.5785^10-3| 0.5810^10-3 | +|Compiler |Optimizations|Architecture | goto | if elseif | select case | +|---------------|-------------|-----------------------------------------------------|-----------|-----------|-------------| +| GNU (6.2.0) | -O3 |Intel Xeon X5650@2.67GHz, 24GB RAM, x86_64 Arch Linux|0.3852^10-4|0.3856^10-4| 0.3857^10-4 | +| GNU (6.2.0) | -Og |Intel Xeon X5650@2.67GHz, 24GB RAM, x86_64 Arch Linux|0.5788^10-3|0.5778^10-3| 0.5783^10-3 | +| Intel (16.0.3)| -O3 |Intel Xeon X5650@2.67GHz, 24GB RAM, x86_64 Arch Linux|0.3896^10-4|0.3913^10-4| 0.3905^10-4 | +| Intel (16.0.3)| -O0 |Intel Xeon X5650@2.67GHz, 24GB RAM, x86_64 Arch Linux|0.5796^10-3|0.5785^10-3| 0.5810^10-3 | diff --git a/src/goto_is_fastest/goto_if_select_comparison_2/README.md b/src/goto_is_fastest/goto_if_select_comparison_2/README.md index 76d507d..0218d3f 100644 --- a/src/goto_is_fastest/goto_if_select_comparison_2/README.md +++ b/src/goto_is_fastest/goto_if_select_comparison_2/README.md @@ -1,6 +1,6 @@ ### Goto-if elseif-select case performance comparison, test 1 -This test compare (computed) `goto` with `if elseif` and `select case` branching-flow constructs. +This test compare (computed) `goto` with `select case` branching-flow constructs. To be completed. @@ -15,4 +15,7 @@ Four bash scripts are provided to run the test: ### Results obtained -To be written. +|Compiler |Optimizations|Architecture | goto |select case | +|---------------|-------------|--------------------------------------------------|-----------|------------| +| Intel (16.0.3)| -O3 |Intel Core m5-6Y54@1.10GHz, 4GB RAM, x86_64 Ubuntu|2.0460^10-3|2.0394^10-3 | +| Intel (16.0.3)| -O0 |Intel Core m5-6Y54@1.10GHz, 4GB RAM, x86_64 Ubuntu|3.4972^10-3|4.0245^10-3 | diff --git a/src/goto_is_fastest/goto_if_select_comparison_3/README.md b/src/goto_is_fastest/goto_if_select_comparison_3/README.md new file mode 100644 index 0000000..58dde12 --- /dev/null +++ b/src/goto_is_fastest/goto_if_select_comparison_3/README.md @@ -0,0 +1,36 @@ +### Goto-if elseif-select case performance comparison, test 3 + +This test compare (computed) `goto` with `if elseif` and `select case` branching-flow constructs. + +The keywords are ordered as following: + ++ keys value: + + key(1) = 3 + + key(2) = 4 + + key(3) = 1 + + key(4) = 2 ++ keys probability: + + key(1) ~ 36% (10 matches on 28) + + key(2) ~ 29% (8 matches on 28) + + key(3) ~ 21% (6 matches on 28) + + key(4) ~ 14% (4 matches on 28) + +> The *work* done inside the *workers* called by each branch is not uniform rather it depends on keywords value. + +### Run test + +Four bash scripts are provided to run the test: + +1. `run_gnu.sh`, run the test with GNU gfortran compiler without optimizations; +2. `run_gnu_optimized.sh`, run the test with GNU gfortran compiler with optimizations; +3. `run_gnu.sh`, run the test with Intel Fortran Compiler without optimizations; +4. `run_gnu_optimized.sh`, run the test with Intel Fortran Compiler with optimizations; + +### Results obtained + +|Compiler |Optimizations|Architecture | goto | if elseif | select case | +|---------------|-------------|--------------------------------------------------|-----------|-----------|-------------| +| GNU (6.2.0) | -O3 |Intel Core m5-6Y54@1.10GHz, 4GB RAM, x86_64 Ubuntu|0.1111^10-3|0.1111^10-3|0.1111 ^10-3 | +| GNU (6.2.0) | -Og |Intel Core m5-6Y54@1.10GHz, 4GB RAM, x86_64 Ubuntu|0.2136^10-2|0.2135^10-2|0.2137 ^10-2 | +| Intel (16.0.3)| -O3 |Intel Core m5-6Y54@1.10GHz, 4GB RAM, x86_64 Ubuntu|0.1143^10-3|0.1143^10-3|0.1154 ^10-3 | +| Intel (16.0.3)| -O0 |Intel Core m5-6Y54@1.10GHz, 4GB RAM, x86_64 Ubuntu|0.2691^10-2|0.2691^10-2|0.2691 ^10-2 | diff --git a/src/goto_is_fastest/goto_if_select_comparison_3/defy.f90 b/src/goto_is_fastest/goto_if_select_comparison_3/defy.f90 new file mode 100644 index 0000000..ea1d7a3 --- /dev/null +++ b/src/goto_is_fastest/goto_if_select_comparison_3/defy.f90 @@ -0,0 +1,135 @@ +! A DEFY (DEmystyfy Fortran mYths) test. +! Author: Stefano Zaghi +! Date: 2016-10-22 +! +! License: this file is licensed under the Creative Commons Attribution 4.0 license, +! see http://creativecommons.org/licenses/by/4.0/ . + +program defy + use iso_fortran_env + implicit none + integer(int32), parameter :: tests_number = 3000 + integer(int32) :: keyword + integer(int32) :: keywords(1:4,1:2) + real(real64), allocatable :: key_work(:) + integer(int64) :: profiling(1:2) + integer(int64) :: count_rate + real(real64) :: system_clocks(1:3) + integer(int32) :: i + integer(int32) :: k + integer(int32) :: p + + keywords = 0 + ! keys value + keywords(1,1) = 3 + keywords(2,1) = 4 + keywords(3,1) = 1 + keywords(4,1) = 2 + ! keys probability + keywords(1,2) = 10 + keywords(2,2) = 8 + keywords(3,2) = 6 + keywords(4,2) = 4 + + system_clocks = 0._real64 + do i=1, tests_number + + do k=1, size(keywords, dim=1) + + keyword = keywords(k, 1) + + do p=1, keywords(k, 2) + + call system_clock(profiling(1), count_rate) + select case(keyword) + case(1) + call worker1(key=keyword, array=key_work) + case(2) + call worker2(key=keyword, array=key_work) + case(3) + call worker3(key=keyword, array=key_work) + case(4) + call worker4(key=keyword, array=key_work) + endselect + call system_clock(profiling(2), count_rate) + system_clocks(1) = system_clocks(1) + real(profiling(2) - profiling(1), kind=real64)/count_rate + + call system_clock(profiling(1), count_rate) + if (keyword==1) then + call worker1(key=keyword, array=key_work) + elseif (keyword==2) then + call worker2(key=keyword, array=key_work) + elseif (keyword==3) then + call worker3(key=keyword, array=key_work) + elseif (keyword==4) then + call worker4(key=keyword, array=key_work) + endif + call system_clock(profiling(2), count_rate) + system_clocks(2) = system_clocks(2) + real(profiling(2) - profiling(1), kind=real64)/count_rate + + call system_clock(profiling(1), count_rate) + goto (10, 20, 30, 40), keyword + goto 50 + 10 call worker1(key=keyword, array=key_work) ; goto 50 + 20 call worker2(key=keyword, array=key_work) ; goto 50 + 30 call worker3(key=keyword, array=key_work) ; goto 50 + 40 call worker4(key=keyword, array=key_work) ; goto 50 + 50 continue + call system_clock(profiling(2), count_rate) + system_clocks(3) = system_clocks(3) + real(profiling(2) - profiling(1), kind=real64)/count_rate + enddo + enddo + enddo + print '(A,E23.15)', ' select case average performance: ', system_clocks(1)/tests_number + print '(A,E23.15)', ' if elseif average performance: ', system_clocks(2)/tests_number + print '(A,E23.15)', ' goto average performance: ', system_clocks(3)/tests_number + + contains + pure subroutine worker1(key, array) + integer(int32), intent(in) :: key + real(real64), allocatable, intent(out) :: array(:) + integer(int32) :: j + + allocate(array(1:key*tests_number)) + array = 0._real64 + do j=1, key*tests_number + array(j) = key**2._real64 * tests_number * j + enddo + endsubroutine worker1 + + pure subroutine worker2(key, array) + integer(int32), intent(in) :: key + real(real64), allocatable, intent(out) :: array(:) + integer(int32) :: j + + allocate(array(1:key*tests_number)) + array = 0._real64 + do j=1, key*tests_number + array(j) = key**2._real64 * tests_number * j + enddo + endsubroutine worker2 + + pure subroutine worker3(key, array) + integer(int32), intent(in) :: key + real(real64), allocatable, intent(out) :: array(:) + integer(int32) :: j + + allocate(array(1:key*tests_number)) + array = 0._real64 + do j=1, key*tests_number + array(j) = key**2._real64 * tests_number * j + enddo + endsubroutine worker3 + + pure subroutine worker4(key, array) + integer(int32), intent(in) :: key + real(real64), allocatable, intent(out) :: array(:) + integer(int32) :: j + + allocate(array(1:key*tests_number)) + array = 0._real64 + do j=1, key*tests_number + array(j) = key**2._real64 * tests_number * j + enddo + endsubroutine worker4 +endprogram defy diff --git a/src/goto_is_fastest/goto_if_select_comparison_3/run_gnu.sh b/src/goto_is_fastest/goto_if_select_comparison_3/run_gnu.sh new file mode 100755 index 0000000..2d1e299 --- /dev/null +++ b/src/goto_is_fastest/goto_if_select_comparison_3/run_gnu.sh @@ -0,0 +1,11 @@ +#!/bin/bash +# script to build and run DEFY tests. +# +# License: this file is licensed under the Creative Commons Attribution 4.0 license, +# see http://creativecommons.org/licenses/by/4.0/ . + +test=$(basename $(pwd))/defy.f90 +echo "Build and run $test by means of 'gfortran -Og'" +gfortran -Og defy.f90 -o defy +./defy +rm -f defy diff --git a/src/goto_is_fastest/goto_if_select_comparison_3/run_gnu_optimized.sh b/src/goto_is_fastest/goto_if_select_comparison_3/run_gnu_optimized.sh new file mode 100755 index 0000000..15674a4 --- /dev/null +++ b/src/goto_is_fastest/goto_if_select_comparison_3/run_gnu_optimized.sh @@ -0,0 +1,11 @@ +#!/bin/bash +# script to build and run DEFY tests. +# +# License: this file is licensed under the Creative Commons Attribution 4.0 license, +# see http://creativecommons.org/licenses/by/4.0/ . + +test=$(basename $(pwd))/defy.f90 +echo "Build and run $test by means of 'gfortran -O3'" +gfortran -O3 defy.f90 -o defy +./defy +rm -f defy diff --git a/src/goto_is_fastest/goto_if_select_comparison_3/run_intel.sh b/src/goto_is_fastest/goto_if_select_comparison_3/run_intel.sh new file mode 100755 index 0000000..472d310 --- /dev/null +++ b/src/goto_is_fastest/goto_if_select_comparison_3/run_intel.sh @@ -0,0 +1,11 @@ +#!/bin/bash +# script to build and run DEFY tests. +# +# License: this file is licensed under the Creative Commons Attribution 4.0 license, +# see http://creativecommons.org/licenses/by/4.0/ . + +test=$(basename $(pwd))/defy.f90 +echo "Build and run $test by means of 'ifort -O0'" +ifort -O0 defy.f90 -o defy +./defy +rm -f defy diff --git a/src/goto_is_fastest/goto_if_select_comparison_3/run_intel_optimized.sh b/src/goto_is_fastest/goto_if_select_comparison_3/run_intel_optimized.sh new file mode 100755 index 0000000..43f647c --- /dev/null +++ b/src/goto_is_fastest/goto_if_select_comparison_3/run_intel_optimized.sh @@ -0,0 +1,11 @@ +#!/bin/bash +# script to build and run DEFY tests. +# +# License: this file is licensed under the Creative Commons Attribution 4.0 license, +# see http://creativecommons.org/licenses/by/4.0/ . + +test=$(basename $(pwd))/defy.f90 +echo "Build and run $test by means of 'ifort -O3'" +ifort -O3 defy.f90 -o defy +./defy +rm -f defy diff --git a/src/powers_naive_definitions_have_overhead/README.md b/src/powers_naive_definitions_have_overhead/README.md index c0b8c9c..548f861 100644 --- a/src/powers_naive_definitions_have_overhead/README.md +++ b/src/powers_naive_definitions_have_overhead/README.md @@ -1,10 +1,25 @@ ### (Naive) definitions of powers (elevation) could have relevant overhead -To be written. +> A lazy (naive) definition of power elevations can generate relevant overhead degrading the computational speed. + +Power elevations can be written in different form. Let us consider the square computation. It can be written as + ++ `a*a`, by means the multiplication operator; ++ `a**2`, by means of the power operator using the integer constant `2`; ++ `a**2.0`, by means of the power operator using the real constant `2.0` with the default kind; ++ `a**2.0_real64`, by means of the power operator using the real constant `2.0` with the 64 bits kind; + +> These definitions are not equivalent in terms of computational speed: they should be ordered form the fastest to the slowest. + +Similarly, the square root can be written as: + ++ `sqrt(a)`, by means the builtin `sqrt` function; ++ `a**0.5`, by means of the power operator using the real constant `0.5` with the default kind; ++ `a**0.5_real64`, by means of the power operator using the real constant `0.5` with the 64 bits kind; ### Not demystified -To be written. +> The *myth* is confirmed (not demystified), but overheads are somehow less than expected. ### DEFY Tests diff --git a/src/powers_naive_definitions_have_overhead/powers_1/README.md b/src/powers_naive_definitions_have_overhead/powers_1/README.md index a6b1faa..2c94be1 100644 --- a/src/powers_naive_definitions_have_overhead/powers_1/README.md +++ b/src/powers_naive_definitions_have_overhead/powers_1/README.md @@ -1,6 +1,21 @@ -### Compare peformances of different powers definitions +### Compare performances of different powers definitions -To be written. +Power elevations can be written in different form. Let us consider the square computation. It can be written as + ++ `a*a`, by means the multiplication operator; ++ `a**2`, by means of the power operator using the integer constant `2`; ++ `a**2.0`, by means of the power operator using the real constant `2.0` with the default kind; ++ `a**2.0_real64`, by means of the power operator using the real constant `2.0` with the 64 bits kind; + +> These definitions are not equivalent in terms of computational speed: they should be ordered form the fastest to the slowest. + +Similarly, the square root can be written as: + ++ `sqrt(a)`, by means the builtin `sqrt` function; ++ `a**0.5`, by means of the power operator using the real constant `0.5` with the default kind; ++ `a**0.5_real64`, by means of the power operator using the real constant `0.5` with the 64 bits kind; + +> The trend is confirmed, but overheads are somehow less than expected. ### Run test @@ -13,16 +28,28 @@ Four bash scripts are provided to run the test: ### Results obtained -|Compiler|Optimizations|Architecture | a*a | a**2 | a**2.0 |a**2.0_real64| -|--------|-------------|------------------------------------------------|--------|--------|--------|-------------| -| GNU | yes |Intel Xeon E5440@2.83GHz, 8GB RAM, x86_64 Debian|0.057749|0.061242|0.059180| 0.059963 | -| GNU | no |Intel Xeon E5440@2.83GHz, 8GB RAM, x86_64 Debian|0.075604|0.076985|0.188435| 0.176926 | -| Intel | yes |Intel Xeon E5440@2.83GHz, 8GB RAM, x86_64 Debian|0.074263|0.077053|0.066916| 0.066274 | -| Intel | no |Intel Xeon E5440@2.83GHz, 8GB RAM, x86_64 Debian|0.129806|0.148482|0.120632| 0.117112 | - -|Compiler|Optimizations|Architecture | sqrt(a)| a**0.5 | a**0.5_real64| -|--------|-------------|------------------------------------------------|--------|--------|--------| -| GNU | yes |Intel Xeon E5440@2.83GHz, 8GB RAM, x86_64 Debian|0.112347|1.094111|1.094577| -| GNU | no |Intel Xeon E5440@2.83GHz, 8GB RAM, x86_64 Debian|0.113749|1.097284|1.100870| -| Intel | yes |Intel Xeon E5440@2.83GHz, 8GB RAM, x86_64 Debian|0.077185|0.076015|0.074920| -| Intel | no |Intel Xeon E5440@2.83GHz, 8GB RAM, x86_64 Debian|0.170241|0.169863|0.169951| +#### Square + +|Compiler |Optimizations|Architecture | `a*a` | `a**2` |`a**2.0`|`a**2.0_real64`| +|---------------|-------------|--------------------------------------------------|--------|--------|--------|---------------| +| GNU (x.y.z) | -O3 |Intel Xeon E5440@2.83GHz, 8GB RAM, x86_64 Debian |0.057749|0.061242|0.059180|0.059963 | +| GNU (x.y.z) | -Og |Intel Xeon E5440@2.83GHz, 8GB RAM, x86_64 Debian |0.075604|0.076985|0.188435|0.176926 | +| Intel (x.y.z) | -O3 |Intel Xeon E5440@2.83GHz, 8GB RAM, x86_64 Debian |0.074263|0.077053|0.066916|0.066274 | +| Intel (x.y.z) | -O0 |Intel Xeon E5440@2.83GHz, 8GB RAM, x86_64 Debian |0.129806|0.148482|0.120632|0.117112 | +| GNU (6.2.0) | -O3 |Intel Core m5-6Y54@1.10GHz, 4GB RAM, x86_64 Ubuntu|0.014765|0.016203|0.016375|0.019350 | +| GNU (6.2.0) | -Og |Intel Core m5-6Y54@1.10GHz, 4GB RAM, x86_64 Ubuntu|0.044153|0.087174|0.123389|0.125654 | +| Intel (16.0.3)| -O3 |Intel Core m5-6Y54@1.10GHz, 4GB RAM, x86_64 Ubuntu|0.019858|0.020637|0.012919|0.015040 | +| Intel (16.0.3)| -O0 |Intel Core m5-6Y54@1.10GHz, 4GB RAM, x86_64 Ubuntu|0.068609|0.095346|0.078253|0.076377 | + +#### Square root + +|Compiler |Optimizations|Architecture |`sqrt(a)`|`a**0.5`|`a**0.5_real64`| +|---------------|-------------|--------------------------------------------------|---------|--------|---------------| +| GNU (x.y.z) | -O3 |Intel Xeon E5440@2.83GHz, 8GB RAM, x86_64 Debian |0.112347 |1.094111|1.094577 | +| GNU (x.y.z) | -Og |Intel Xeon E5440@2.83GHz, 8GB RAM, x86_64 Debian |0.113749 |1.097284|1.100870 | +| Intel (x.y.z) | -O3 |Intel Xeon E5440@2.83GHz, 8GB RAM, x86_64 Debian |0.077185 |0.076015|0.074920 | +| Intel (x.y.z) | -O0 |Intel Xeon E5440@2.83GHz, 8GB RAM, x86_64 Debian |0.170241 |0.169863|0.169951 | +| GNU (6.2.0) | -O3 |Intel Core m5-6Y54@1.10GHz, 4GB RAM, x86_64 Ubuntu|0.098565 |0.954313|0.984998 | +| GNU (6.2.0) | -Og |Intel Core m5-6Y54@1.10GHz, 4GB RAM, x86_64 Ubuntu|0.105412 |1.000769|1.011373 | +| Intel (16.0.3)| -O3 |Intel Core m5-6Y54@1.10GHz, 4GB RAM, x86_64 Ubuntu|0.015731 |0.021499|0.053096 | +| Intel (16.0.3)| -O0 |Intel Core m5-6Y54@1.10GHz, 4GB RAM, x86_64 Ubuntu|0.130754 |0.133923|0.196722 |