add goto flushed-flow test

szaghi · Oct 19, 2016 · ca11612 · ca11612
1 parent bc63530
commit ca11612
Show file tree

Hide file tree

Showing 8 changed files with 285 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -93,6 +93,7 @@ Currently DEFY collection includes:
 + [goto is fastest](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest):
   + [goto if select comparison 1](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_1);
   + [goto if select comparison 2](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_2);
+  + [goto if block comparison 1](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_block_comparison_1);
 + [powers naive definitions have overhead](https://github.com/szaghi/DEFY/tree/master/src/powers_naive_definitions_have_overhead):
   + [powers 1](https://github.com/szaghi/DEFY/tree/master/src/powers_naive_definitions_have_overhead/powers_1):
 + any new myth is more than welcome, feel free to open a [new issue](https://github.com/szaghi/DEFY/issues) or create a [pull request](https://github.com/szaghi/DEFY/pulls).

diff --git a/src/goto_is_fastest/README.md b/src/goto_is_fastest/README.md
@@ -64,6 +64,7 @@ The presupposed `goto` higher performance is a **myth** nowadays. Moreover, `got
 
 DEFY provides the following tests for this myth demystification:
 + [goto if select comparison 1](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_1);
-+ [goto if select comparison 2](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_2).
++ [goto if select comparison 2](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_2);
++ [goto if block comparison 1](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_block_comparison_1).
 
 See their README.md to see the results obtained.
diff --git a/src/goto_is_fastest/goto_if_block_comparison_1/README.md b/src/goto_is_fastest/goto_if_block_comparison_1/README.md
@@ -0,0 +1,47 @@
+### Goto-if elseif-select case performance comparison, test 1
+
+This test compare (computed) `goto` with `if` branching-flow construct. The selector for the branching-jump is computed pseudo-randomically and the *work* done inside the *workers* called by each branch is not uniform.
+
+This is a modification of [goto-if elseif-select case](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_1) test proposed by Ron Shepard (select case is not considered into this test, rather the `block` construct). Essentially, the branching-flow is now *flushed*: the selector selects *from which keyword* start to call the workers and call not only the worker corresponding to that keyword, but also all subsequent workers, e.g.
+
+```fortran
+goto (1, 2, 3), keyword
+1 call worker1(keyword)
+2 call worker2(keyword)
+3 call worker3(keyword)
+```
+if `keyword==1` all workers are called, while if `keyword==2` only worker 2 and 3 are called and finally if `keyword==3` only worker
+3 is called. This is compared with
+
+```fortran
+! if-based selector flow
+if (keyword<2) call worker1(keyword)
+if (keyword<3) call worker2(keyword)
+if (keyword<4) call worker3(keyword)
+! block-based selector flow (implies that the order of execution does not matter)
+selector: block
+  call worker3(keyword) ; if ((keyword==3)) exit selector
+  call worker2(keyword) ; if ((keyword>=2)) exit selector
+  call worker1(keyword) ;                   exit selector
+end block selector
+```
+
+In this case the `goto` should actually be advantaged, although the tests performed confirm again that the performance are almost identical.
+
+### Run test
+
+Four bash scripts are provided to run the test:
+
+1. `run_gnu.sh`, run the test with GNU gfortran compiler without optimizations;
+2. `run_gnu_optimized.sh`, run the test with GNU gfortran compiler with optimizations;
+3. `run_gnu.sh`, run the test with Intel Fortran Compiler without optimizations;
+4. `run_gnu_optimized.sh`, run the test with Intel Fortran Compiler with optimizations;
+
+### Results obtained
+
+|Compiler|Optimizations|Architecture                                         | goto      | if        |block      |
+|--------|-------------|-----------------------------------------------------|-----------|-----------|-----------|
+| GNU    |   yes       |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.5480^10-4|0.5480^10-4|0.5480^10-4|
+| GNU    |   no        |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.7578^10-3|0.7578^10-3|0.7578^10-3|
+| Intel  |   yes       |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.5228^10-4|0.5237^10-4|0.5237^10-4|
+| Intel  |   no        |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.9449^10-3|0.9550^10-3|0.9550^10-3|
diff --git a/src/goto_is_fastest/goto_if_block_comparison_1/defy.f90 b/src/goto_is_fastest/goto_if_block_comparison_1/defy.f90
@@ -0,0 +1,191 @@
+! A DEFY (DEmystyfy Fortran mYths) test.
+! Author: Stefano Zaghi & Ron Shepard & FortranFan
+! Date: 2016-10-19
+!
+! License: this file is licensed under the Creative Commons Attribution 4.0 license,
+! see http://creativecommons.org/licenses/by/4.0/ .
+
+program defy
+  use iso_fortran_env
+  implicit none
+  integer(int32), parameter :: tests_number = 4000
+  integer(int32)            :: keyword
+  real(real64), allocatable :: key_work(:)
+  real(real64)              :: random
+  integer(int64)            :: profiling(1:2)
+  integer(int64)            :: count_rate
+  real(real64)              :: system_clocks(1:3)
+  integer(int32)            :: key_registers(1:9)
+  integer(int32)            :: i
+
+  key_registers = 0
+  system_clocks = 0._real64
+  do i=1, tests_number
+    call random_number(random)
+    keyword = nint(random*9, int32)
+    if (keyword==1) key_registers(1) = key_registers(1) + 1
+    if (keyword==2) key_registers(2) = key_registers(2) + 1
+    if (keyword==3) key_registers(3) = key_registers(3) + 1
+    if (keyword==4) key_registers(4) = key_registers(4) + 1
+    if (keyword==5) key_registers(5) = key_registers(5) + 1
+    if (keyword==6) key_registers(6) = key_registers(6) + 1
+    if (keyword==7) key_registers(7) = key_registers(7) + 1
+    if (keyword==8) key_registers(8) = key_registers(8) + 1
+    if (keyword==9) key_registers(9) = key_registers(9) + 1
+
+    call system_clock(profiling(1), count_rate)
+    selector: block
+      call worker9(key=keyword, array=key_work) ; if ((keyword==9)) exit selector
+      call worker8(key=keyword, array=key_work) ; if ((keyword>=8)) exit selector
+      call worker7(key=keyword, array=key_work) ; if ((keyword>=7)) exit selector
+      call worker6(key=keyword, array=key_work) ; if ((keyword>=6)) exit selector
+      call worker5(key=keyword, array=key_work) ; if ((keyword>=5)) exit selector
+      call worker4(key=keyword, array=key_work) ; if ((keyword>=4)) exit selector
+      call worker3(key=keyword, array=key_work) ; if ((keyword>=3)) exit selector
+      call worker2(key=keyword, array=key_work) ; if ((keyword>=2)) exit selector
+      call worker1(key=keyword, array=key_work) ;                   exit selector
+    end block selector
+    call system_clock(profiling(2), count_rate)
+    system_clocks(1) = system_clocks(1) + real(profiling(2) - profiling(1), kind=real64)/count_rate
+
+    call system_clock(profiling(1), count_rate)
+    if (keyword<2)  call worker1(key=keyword, array=key_work)
+    if (keyword<3)  call worker2(key=keyword, array=key_work)
+    if (keyword<4)  call worker3(key=keyword, array=key_work)
+    if (keyword<5)  call worker4(key=keyword, array=key_work)
+    if (keyword<6)  call worker5(key=keyword, array=key_work)
+    if (keyword<7)  call worker6(key=keyword, array=key_work)
+    if (keyword<8)  call worker7(key=keyword, array=key_work)
+    if (keyword<9)  call worker8(key=keyword, array=key_work)
+    if (keyword<10) call worker9(key=keyword, array=key_work)
+    call system_clock(profiling(2), count_rate)
+    system_clocks(2) = system_clocks(2) + real(profiling(2) - profiling(1), kind=real64)/count_rate
+
+    call system_clock(profiling(1), count_rate)
+    goto (10, 20, 30, 40, 50, 60, 70, 80, 90), keyword
+    10 call worker1(key=keyword, array=key_work)
+    20 call worker2(key=keyword, array=key_work)
+    30 call worker3(key=keyword, array=key_work)
+    40 call worker4(key=keyword, array=key_work)
+    50 call worker5(key=keyword, array=key_work)
+    60 call worker6(key=keyword, array=key_work)
+    70 call worker7(key=keyword, array=key_work)
+    80 call worker8(key=keyword, array=key_work)
+    90 call worker9(key=keyword, array=key_work)
+    call system_clock(profiling(2), count_rate)
+    system_clocks(3) = system_clocks(3) + real(profiling(2) - profiling(1), kind=real64)/count_rate
+  enddo
+  print '(A,9F12.5)', ' keywords distribution (1,2,3): ', key_registers*1._real32/tests_number
+  print '(A,E23.15)', ' block average performance:     ', system_clocks(2)/tests_number
+  print '(A,E23.15)', ' if    average performance:     ', system_clocks(2)/tests_number
+  print '(A,E23.15)', ' goto  average performance:     ', system_clocks(3)/tests_number
+
+  contains
+    pure subroutine worker1(key, array)
+      integer(int32),            intent(in)  :: key
+      real(real64), allocatable, intent(out) :: array(:)
+      integer(int32)                         :: j
+
+      allocate(array(1:key*tests_number))
+      array = 0._real64
+      do j=1, key*tests_number
+        array(j) = key**2._real64 * tests_number * j
+      enddo
+    endsubroutine worker1
+
+    pure subroutine worker2(key, array)
+      integer(int32),            intent(in)  :: key
+      real(real64), allocatable, intent(out) :: array(:)
+      integer(int32)                         :: j
+
+      allocate(array(1:key*tests_number))
+      array = 0._real64
+      do j=1, key*tests_number
+        array(j) = key**2._real64 * tests_number * j
+      enddo
+    endsubroutine worker2
+
+    pure subroutine worker3(key, array)
+      integer(int32),            intent(in)  :: key
+      real(real64), allocatable, intent(out) :: array(:)
+      integer(int32)                         :: j
+
+      allocate(array(1:key*tests_number))
+      array = 0._real64
+      do j=1, key*tests_number
+        array(j) = key**2._real64 * tests_number * j
+      enddo
+    endsubroutine worker3
+
+    pure subroutine worker4(key, array)
+      integer(int32),            intent(in)  :: key
+      real(real64), allocatable, intent(out) :: array(:)
+      integer(int32)                         :: j
+
+      allocate(array(1:key*tests_number))
+      array = 0._real64
+      do j=1, key*tests_number
+        array(j) = key**2._real64 * tests_number * j
+      enddo
+    endsubroutine worker4
+
+    pure subroutine worker5(key, array)
+      integer(int32),            intent(in)  :: key
+      real(real64), allocatable, intent(out) :: array(:)
+      integer(int32)                         :: j
+
+      allocate(array(1:key*tests_number))
+      array = 0._real64
+      do j=1, key*tests_number
+        array(j) = key**2._real64 * tests_number * j
+      enddo
+    endsubroutine worker5
+
+    pure subroutine worker6(key, array)
+      integer(int32),            intent(in)  :: key
+      real(real64), allocatable, intent(out) :: array(:)
+      integer(int32)                         :: j
+
+      allocate(array(1:key*tests_number))
+      array = 0._real64
+      do j=1, key*tests_number
+        array(j) = key**2._real64 * tests_number * j
+      enddo
+    endsubroutine worker6
+
+    pure subroutine worker7(key, array)
+      integer(int32),            intent(in)  :: key
+      real(real64), allocatable, intent(out) :: array(:)
+      integer(int32)                         :: j
+
+      allocate(array(1:key*tests_number))
+      array = 0._real64
+      do j=1, key*tests_number
+        array(j) = key**2._real64 * tests_number * j
+      enddo
+    endsubroutine worker7
+
+    pure subroutine worker8(key, array)
+      integer(int32),            intent(in)  :: key
+      real(real64), allocatable, intent(out) :: array(:)
+      integer(int32)                         :: j
+
+      allocate(array(1:key*tests_number))
+      array = 0._real64
+      do j=1, key*tests_number
+        array(j) = key**2._real64 * tests_number * j
+      enddo
+    endsubroutine worker8
+
+    pure subroutine worker9(key, array)
+      integer(int32),            intent(in)  :: key
+      real(real64), allocatable, intent(out) :: array(:)
+      integer(int32)                         :: j
+
+      allocate(array(1:key*tests_number))
+      array = 0._real64
+      do j=1, key*tests_number
+        array(j) = key**2._real64 * tests_number * j
+      enddo
+    endsubroutine worker9
+endprogram defy
diff --git a/src/goto_is_fastest/goto_if_block_comparison_1/run_gnu.sh b/src/goto_is_fastest/goto_if_block_comparison_1/run_gnu.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+# script to build and run DEFY tests.
+#
+# License: this file is licensed under the Creative Commons Attribution 4.0 license,
+# see http://creativecommons.org/licenses/by/4.0/ .
+
+test=$(basename $(pwd))/defy.f90
+echo "Build and run $test by means of 'gfortran -Og'"
+gfortran -Og defy.f90 -o defy
+./defy
+rm -f defy
diff --git a/src/goto_is_fastest/goto_if_block_comparison_1/run_gnu_optimized.sh b/src/goto_is_fastest/goto_if_block_comparison_1/run_gnu_optimized.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+# script to build and run DEFY tests.
+#
+# License: this file is licensed under the Creative Commons Attribution 4.0 license,
+# see http://creativecommons.org/licenses/by/4.0/ .
+
+test=$(basename $(pwd))/defy.f90
+echo "Build and run $test by means of 'gfortran -O3'"
+gfortran -O3 defy.f90 -o defy
+./defy
+rm -f defy
diff --git a/src/goto_is_fastest/goto_if_block_comparison_1/run_intel.sh b/src/goto_is_fastest/goto_if_block_comparison_1/run_intel.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+# script to build and run DEFY tests.
+#
+# License: this file is licensed under the Creative Commons Attribution 4.0 license,
+# see http://creativecommons.org/licenses/by/4.0/ .
+
+test=$(basename $(pwd))/defy.f90
+echo "Build and run $test by means of 'ifort -O0'"
+ifort -O0 defy.f90 -o defy
+./defy
+rm -f defy
diff --git a/src/goto_is_fastest/goto_if_block_comparison_1/run_intel_optimized.sh b/src/goto_is_fastest/goto_if_block_comparison_1/run_intel_optimized.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+# script to build and run DEFY tests.
+#
+# License: this file is licensed under the Creative Commons Attribution 4.0 license,
+# see http://creativecommons.org/licenses/by/4.0/ .
+
+test=$(basename $(pwd))/defy.f90
+echo "Build and run $test by means of 'ifort -O3'"
+ifort -O3 defy.f90 -o defy
+./defy
+rm -f defy