Skip to content

Commit

Permalink
add goto flushed-flow test
Browse files Browse the repository at this point in the history
  • Loading branch information
szaghi committed Oct 19, 2016
1 parent bc63530 commit ca11612
Show file tree
Hide file tree
Showing 8 changed files with 285 additions and 1 deletion.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ Currently DEFY collection includes:
+ [goto is fastest](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest):
+ [goto if select comparison 1](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_1);
+ [goto if select comparison 2](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_2);
+ [goto if block comparison 1](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_block_comparison_1);
+ [powers naive definitions have overhead](https://github.com/szaghi/DEFY/tree/master/src/powers_naive_definitions_have_overhead):
+ [powers 1](https://github.com/szaghi/DEFY/tree/master/src/powers_naive_definitions_have_overhead/powers_1):
+ any new myth is more than welcome, feel free to open a [new issue](https://github.com/szaghi/DEFY/issues) or create a [pull request](https://github.com/szaghi/DEFY/pulls).
Expand Down
3 changes: 2 additions & 1 deletion src/goto_is_fastest/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ The presupposed `goto` higher performance is a **myth** nowadays. Moreover, `got

DEFY provides the following tests for this myth demystification:
+ [goto if select comparison 1](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_1);
+ [goto if select comparison 2](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_2).
+ [goto if select comparison 2](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_2);
+ [goto if block comparison 1](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_block_comparison_1).

See their README.md to see the results obtained.
47 changes: 47 additions & 0 deletions src/goto_is_fastest/goto_if_block_comparison_1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
### Goto-if elseif-select case performance comparison, test 1

This test compare (computed) `goto` with `if` branching-flow construct. The selector for the branching-jump is computed pseudo-randomically and the *work* done inside the *workers* called by each branch is not uniform.

This is a modification of [goto-if elseif-select case](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_1) test proposed by Ron Shepard (select case is not considered into this test, rather the `block` construct). Essentially, the branching-flow is now *flushed*: the selector selects *from which keyword* start to call the workers and call not only the worker corresponding to that keyword, but also all subsequent workers, e.g.

```fortran
goto (1, 2, 3), keyword
1 call worker1(keyword)
2 call worker2(keyword)
3 call worker3(keyword)
```
if `keyword==1` all workers are called, while if `keyword==2` only worker 2 and 3 are called and finally if `keyword==3` only worker
3 is called. This is compared with

```fortran
! if-based selector flow
if (keyword<2) call worker1(keyword)
if (keyword<3) call worker2(keyword)
if (keyword<4) call worker3(keyword)
! block-based selector flow (implies that the order of execution does not matter)
selector: block
call worker3(keyword) ; if ((keyword==3)) exit selector
call worker2(keyword) ; if ((keyword>=2)) exit selector
call worker1(keyword) ; exit selector
end block selector
```

In this case the `goto` should actually be advantaged, although the tests performed confirm again that the performance are almost identical.

### Run test

Four bash scripts are provided to run the test:

1. `run_gnu.sh`, run the test with GNU gfortran compiler without optimizations;
2. `run_gnu_optimized.sh`, run the test with GNU gfortran compiler with optimizations;
3. `run_gnu.sh`, run the test with Intel Fortran Compiler without optimizations;
4. `run_gnu_optimized.sh`, run the test with Intel Fortran Compiler with optimizations;

### Results obtained

|Compiler|Optimizations|Architecture | goto | if |block |
|--------|-------------|-----------------------------------------------------|-----------|-----------|-----------|
| GNU | yes |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.5480^10-4|0.5480^10-4|0.5480^10-4|
| GNU | no |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.7578^10-3|0.7578^10-3|0.7578^10-3|
| Intel | yes |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.5228^10-4|0.5237^10-4|0.5237^10-4|
| Intel | no |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.9449^10-3|0.9550^10-3|0.9550^10-3|
191 changes: 191 additions & 0 deletions src/goto_is_fastest/goto_if_block_comparison_1/defy.f90
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
! A DEFY (DEmystyfy Fortran mYths) test.
! Author: Stefano Zaghi & Ron Shepard & FortranFan
! Date: 2016-10-19
!
! License: this file is licensed under the Creative Commons Attribution 4.0 license,
! see http://creativecommons.org/licenses/by/4.0/ .

program defy
use iso_fortran_env
implicit none
integer(int32), parameter :: tests_number = 4000
integer(int32) :: keyword
real(real64), allocatable :: key_work(:)
real(real64) :: random
integer(int64) :: profiling(1:2)
integer(int64) :: count_rate
real(real64) :: system_clocks(1:3)
integer(int32) :: key_registers(1:9)
integer(int32) :: i

key_registers = 0
system_clocks = 0._real64
do i=1, tests_number
call random_number(random)
keyword = nint(random*9, int32)
if (keyword==1) key_registers(1) = key_registers(1) + 1
if (keyword==2) key_registers(2) = key_registers(2) + 1
if (keyword==3) key_registers(3) = key_registers(3) + 1
if (keyword==4) key_registers(4) = key_registers(4) + 1
if (keyword==5) key_registers(5) = key_registers(5) + 1
if (keyword==6) key_registers(6) = key_registers(6) + 1
if (keyword==7) key_registers(7) = key_registers(7) + 1
if (keyword==8) key_registers(8) = key_registers(8) + 1
if (keyword==9) key_registers(9) = key_registers(9) + 1

call system_clock(profiling(1), count_rate)
selector: block
call worker9(key=keyword, array=key_work) ; if ((keyword==9)) exit selector
call worker8(key=keyword, array=key_work) ; if ((keyword>=8)) exit selector
call worker7(key=keyword, array=key_work) ; if ((keyword>=7)) exit selector
call worker6(key=keyword, array=key_work) ; if ((keyword>=6)) exit selector
call worker5(key=keyword, array=key_work) ; if ((keyword>=5)) exit selector
call worker4(key=keyword, array=key_work) ; if ((keyword>=4)) exit selector
call worker3(key=keyword, array=key_work) ; if ((keyword>=3)) exit selector
call worker2(key=keyword, array=key_work) ; if ((keyword>=2)) exit selector
call worker1(key=keyword, array=key_work) ; exit selector
end block selector
call system_clock(profiling(2), count_rate)
system_clocks(1) = system_clocks(1) + real(profiling(2) - profiling(1), kind=real64)/count_rate

call system_clock(profiling(1), count_rate)
if (keyword<2) call worker1(key=keyword, array=key_work)
if (keyword<3) call worker2(key=keyword, array=key_work)
if (keyword<4) call worker3(key=keyword, array=key_work)
if (keyword<5) call worker4(key=keyword, array=key_work)
if (keyword<6) call worker5(key=keyword, array=key_work)
if (keyword<7) call worker6(key=keyword, array=key_work)
if (keyword<8) call worker7(key=keyword, array=key_work)
if (keyword<9) call worker8(key=keyword, array=key_work)
if (keyword<10) call worker9(key=keyword, array=key_work)
call system_clock(profiling(2), count_rate)
system_clocks(2) = system_clocks(2) + real(profiling(2) - profiling(1), kind=real64)/count_rate

call system_clock(profiling(1), count_rate)
goto (10, 20, 30, 40, 50, 60, 70, 80, 90), keyword
10 call worker1(key=keyword, array=key_work)
20 call worker2(key=keyword, array=key_work)
30 call worker3(key=keyword, array=key_work)
40 call worker4(key=keyword, array=key_work)
50 call worker5(key=keyword, array=key_work)
60 call worker6(key=keyword, array=key_work)
70 call worker7(key=keyword, array=key_work)
80 call worker8(key=keyword, array=key_work)
90 call worker9(key=keyword, array=key_work)
call system_clock(profiling(2), count_rate)
system_clocks(3) = system_clocks(3) + real(profiling(2) - profiling(1), kind=real64)/count_rate
enddo
print '(A,9F12.5)', ' keywords distribution (1,2,3): ', key_registers*1._real32/tests_number
print '(A,E23.15)', ' block average performance: ', system_clocks(2)/tests_number
print '(A,E23.15)', ' if average performance: ', system_clocks(2)/tests_number
print '(A,E23.15)', ' goto average performance: ', system_clocks(3)/tests_number

contains
pure subroutine worker1(key, array)
integer(int32), intent(in) :: key
real(real64), allocatable, intent(out) :: array(:)
integer(int32) :: j

allocate(array(1:key*tests_number))
array = 0._real64
do j=1, key*tests_number
array(j) = key**2._real64 * tests_number * j
enddo
endsubroutine worker1

pure subroutine worker2(key, array)
integer(int32), intent(in) :: key
real(real64), allocatable, intent(out) :: array(:)
integer(int32) :: j

allocate(array(1:key*tests_number))
array = 0._real64
do j=1, key*tests_number
array(j) = key**2._real64 * tests_number * j
enddo
endsubroutine worker2

pure subroutine worker3(key, array)
integer(int32), intent(in) :: key
real(real64), allocatable, intent(out) :: array(:)
integer(int32) :: j

allocate(array(1:key*tests_number))
array = 0._real64
do j=1, key*tests_number
array(j) = key**2._real64 * tests_number * j
enddo
endsubroutine worker3

pure subroutine worker4(key, array)
integer(int32), intent(in) :: key
real(real64), allocatable, intent(out) :: array(:)
integer(int32) :: j

allocate(array(1:key*tests_number))
array = 0._real64
do j=1, key*tests_number
array(j) = key**2._real64 * tests_number * j
enddo
endsubroutine worker4

pure subroutine worker5(key, array)
integer(int32), intent(in) :: key
real(real64), allocatable, intent(out) :: array(:)
integer(int32) :: j

allocate(array(1:key*tests_number))
array = 0._real64
do j=1, key*tests_number
array(j) = key**2._real64 * tests_number * j
enddo
endsubroutine worker5

pure subroutine worker6(key, array)
integer(int32), intent(in) :: key
real(real64), allocatable, intent(out) :: array(:)
integer(int32) :: j

allocate(array(1:key*tests_number))
array = 0._real64
do j=1, key*tests_number
array(j) = key**2._real64 * tests_number * j
enddo
endsubroutine worker6

pure subroutine worker7(key, array)
integer(int32), intent(in) :: key
real(real64), allocatable, intent(out) :: array(:)
integer(int32) :: j

allocate(array(1:key*tests_number))
array = 0._real64
do j=1, key*tests_number
array(j) = key**2._real64 * tests_number * j
enddo
endsubroutine worker7

pure subroutine worker8(key, array)
integer(int32), intent(in) :: key
real(real64), allocatable, intent(out) :: array(:)
integer(int32) :: j

allocate(array(1:key*tests_number))
array = 0._real64
do j=1, key*tests_number
array(j) = key**2._real64 * tests_number * j
enddo
endsubroutine worker8

pure subroutine worker9(key, array)
integer(int32), intent(in) :: key
real(real64), allocatable, intent(out) :: array(:)
integer(int32) :: j

allocate(array(1:key*tests_number))
array = 0._real64
do j=1, key*tests_number
array(j) = key**2._real64 * tests_number * j
enddo
endsubroutine worker9
endprogram defy
11 changes: 11 additions & 0 deletions src/goto_is_fastest/goto_if_block_comparison_1/run_gnu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash
# script to build and run DEFY tests.
#
# License: this file is licensed under the Creative Commons Attribution 4.0 license,
# see http://creativecommons.org/licenses/by/4.0/ .

test=$(basename $(pwd))/defy.f90
echo "Build and run $test by means of 'gfortran -Og'"
gfortran -Og defy.f90 -o defy
./defy
rm -f defy
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash
# script to build and run DEFY tests.
#
# License: this file is licensed under the Creative Commons Attribution 4.0 license,
# see http://creativecommons.org/licenses/by/4.0/ .

test=$(basename $(pwd))/defy.f90
echo "Build and run $test by means of 'gfortran -O3'"
gfortran -O3 defy.f90 -o defy
./defy
rm -f defy
11 changes: 11 additions & 0 deletions src/goto_is_fastest/goto_if_block_comparison_1/run_intel.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash
# script to build and run DEFY tests.
#
# License: this file is licensed under the Creative Commons Attribution 4.0 license,
# see http://creativecommons.org/licenses/by/4.0/ .

test=$(basename $(pwd))/defy.f90
echo "Build and run $test by means of 'ifort -O0'"
ifort -O0 defy.f90 -o defy
./defy
rm -f defy
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash
# script to build and run DEFY tests.
#
# License: this file is licensed under the Creative Commons Attribution 4.0 license,
# see http://creativecommons.org/licenses/by/4.0/ .

test=$(basename $(pwd))/defy.f90
echo "Build and run $test by means of 'ifort -O3'"
ifort -O3 defy.f90 -o defy
./defy
rm -f defy

0 comments on commit ca11612

Please sign in to comment.