-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add test of probability-ordered keys branching and update doc of othe…
…rs test with result for Core M5 skylake cpu
- Loading branch information
Showing
13 changed files
with
324 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,16 @@ | ||
### Goto-if elseif-select case performance comparison, test 1 | ||
|
||
This test compare (computed) `goto` with `if` branching-flow construct. The selector for the branching-jump is computed pseudo-randomically and the *work* done inside the *workers* called by each branch is not uniform. | ||
This test compare (computed) `goto` with `if` and `block (if)` branching-flow constructs. | ||
|
||
This is a modification of [goto-if elseif-select case](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_1) test proposed by Ron Shepard (select case is not considered into this test, rather the `block` construct). Essentially, the branching-flow is now *flushed*: the selector selects *from which keyword* start to call the workers and call not only the worker corresponding to that keyword, but also all subsequent workers, e.g. | ||
> The selector for the branching-jump is computed pseudo-randomically. | ||
> The *work* done inside the *workers* called by each branch is not uniform rather it depends on keywords value. | ||
This is a modification of [goto-if elseif-select case](https://github.com/szaghi/DEFY/tree/master/src/goto_is_fastest/goto_if_select_comparison_1) test proposed by Ron Shepard and further improved by FortranFan. | ||
|
||
> Select case is not considered into this test (because it generates highly-nested branching-flow less clear than the others), rather the `block` construct. | ||
Essentially, the branching-flow is now *flushed*: the selector selects *from which keyword* to start to call the workers and call not only the worker corresponding to that keyword, but also all subsequent workers, e.g. | ||
|
||
```fortran | ||
goto (1, 2, 3), keyword | ||
|
@@ -26,7 +34,7 @@ selector: block | |
end block selector | ||
``` | ||
|
||
In this case the `goto` should actually be advantaged, although the tests performed confirm again that the performance are almost identical. | ||
In this case the `goto` should actually be advantaged, although the tests performed confirm (again) that the performance are almost identical. | ||
|
||
### Run test | ||
|
||
|
@@ -39,9 +47,11 @@ Four bash scripts are provided to run the test: | |
|
||
### Results obtained | ||
|
||
|Compiler|Optimizations|Architecture | goto | if |block | | ||
|--------|-------------|-----------------------------------------------------|-----------|-----------|-----------| | ||
| GNU | yes |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.5480^10-4|0.5480^10-4|0.5480^10-4| | ||
| GNU | no |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.7578^10-3|0.7578^10-3|0.7578^10-3| | ||
| Intel | yes |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.5228^10-4|0.5237^10-4|0.5237^10-4| | ||
| Intel | no |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.9449^10-3|0.9550^10-3|0.9550^10-3| | ||
|Compiler |Optimizations|Architecture | goto | if |block | | ||
|----------------------|-------------|-----------------------------------------------------|-----------|-----------|-----------| | ||
| GNU (6.2.0, 64bit) | -O3 |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.5480^10-4|0.5480^10-4|0.5480^10-4| | ||
| GNU (6.2.0, 64bit) | -Og |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.7578^10-3|0.7578^10-3|0.7578^10-3| | ||
| Intel (16.0.3, 64bit)| -O3 |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.5228^10-4|0.5237^10-4|0.5237^10-4| | ||
| Intel (16.0.3, 64bit)| -O0 |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.9449^10-3|0.9550^10-3|0.9550^10-3| | ||
| GNU (7.0.0, 32bit) | -?? |Intel Core [email protected], 4GB RAM, Windows 64-bit |0.1357^10-3|0.1356^10-3|0.1356^10-3| | ||
| Intel (17.0.0, 64bit)| -?? |Intel Core [email protected], 4GB RAM, Windows 64-bit |0.4650^10-4|0.4400^10-4|0.4400^10-4| |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,10 @@ | ||
### Goto-if elseif-select case performance comparison, test 1 | ||
|
||
This test compare (computed) `goto` with `if elseif` and `select case` branching-flow constructs. The selector for the branching-jump is computed pseudo-randomically and the *work* done inside the *workers* called by each branch is not uniform. | ||
This test compare (computed) `goto` with `if elseif` and `select case` branching-flow constructs. | ||
|
||
> The selector for the branching-jump is computed pseudo-randomically. | ||
> The *work* done inside the *workers* called by each branch is not uniform rather it depends on keywords value. | ||
### Run test | ||
|
||
|
@@ -13,9 +17,9 @@ Four bash scripts are provided to run the test: | |
|
||
### Results obtained | ||
|
||
|Compiler|Optimizations|Architecture | goto | if elseif | select case | | ||
|--------|-------------|-----------------------------------------------------|-----------|-----------|-------------| | ||
| GNU | yes |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.3852^10-4|0.3856^10-4| 0.3857^10-4 | | ||
| GNU | no |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.5788^10-3|0.5778^10-3| 0.5783^10-3 | | ||
| Intel | yes |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.3896^10-4|0.3913^10-4| 0.3905^10-4 | | ||
| Intel | no |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.5796^10-3|0.5785^10-3| 0.5810^10-3 | | ||
|Compiler |Optimizations|Architecture | goto | if elseif | select case | | ||
|---------------|-------------|-----------------------------------------------------|-----------|-----------|-------------| | ||
| GNU (6.2.0) | -O3 |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.3852^10-4|0.3856^10-4| 0.3857^10-4 | | ||
| GNU (6.2.0) | -Og |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.5788^10-3|0.5778^10-3| 0.5783^10-3 | | ||
| Intel (16.0.3)| -O3 |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.3896^10-4|0.3913^10-4| 0.3905^10-4 | | ||
| Intel (16.0.3)| -O0 |Intel Xeon [email protected], 24GB RAM, x86_64 Arch Linux|0.5796^10-3|0.5785^10-3| 0.5810^10-3 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
### Goto-if elseif-select case performance comparison, test 1 | ||
|
||
This test compare (computed) `goto` with `if elseif` and `select case` branching-flow constructs. | ||
This test compare (computed) `goto` with `select case` branching-flow constructs. | ||
|
||
To be completed. | ||
|
||
|
@@ -15,4 +15,7 @@ Four bash scripts are provided to run the test: | |
|
||
### Results obtained | ||
|
||
To be written. | ||
|Compiler |Optimizations|Architecture | goto |select case | | ||
|---------------|-------------|--------------------------------------------------|-----------|------------| | ||
| Intel (16.0.3)| -O3 |Intel Core [email protected], 4GB RAM, x86_64 Ubuntu|2.0460^10-3|2.0394^10-3 | | ||
| Intel (16.0.3)| -O0 |Intel Core [email protected], 4GB RAM, x86_64 Ubuntu|3.4972^10-3|4.0245^10-3 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
### Goto-if elseif-select case performance comparison, test 3 | ||
|
||
This test compare (computed) `goto` with `if elseif` and `select case` branching-flow constructs. | ||
|
||
The keywords are ordered as following: | ||
|
||
+ keys value: | ||
+ key(1) = 3 | ||
+ key(2) = 4 | ||
+ key(3) = 1 | ||
+ key(4) = 2 | ||
+ keys probability: | ||
+ key(1) ~ 36% (10 matches on 28) | ||
+ key(2) ~ 29% (8 matches on 28) | ||
+ key(3) ~ 21% (6 matches on 28) | ||
+ key(4) ~ 14% (4 matches on 28) | ||
|
||
> The *work* done inside the *workers* called by each branch is not uniform rather it depends on keywords value. | ||
### Run test | ||
|
||
Four bash scripts are provided to run the test: | ||
|
||
1. `run_gnu.sh`, run the test with GNU gfortran compiler without optimizations; | ||
2. `run_gnu_optimized.sh`, run the test with GNU gfortran compiler with optimizations; | ||
3. `run_gnu.sh`, run the test with Intel Fortran Compiler without optimizations; | ||
4. `run_gnu_optimized.sh`, run the test with Intel Fortran Compiler with optimizations; | ||
|
||
### Results obtained | ||
|
||
|Compiler |Optimizations|Architecture | goto | if elseif | select case | | ||
|---------------|-------------|--------------------------------------------------|-----------|-----------|-------------| | ||
| GNU (6.2.0) | -O3 |Intel Core [email protected], 4GB RAM, x86_64 Ubuntu|0.1111^10-3|0.1111^10-3|0.1111 ^10-3 | | ||
| GNU (6.2.0) | -Og |Intel Core [email protected], 4GB RAM, x86_64 Ubuntu|0.2136^10-2|0.2135^10-2|0.2137 ^10-2 | | ||
| Intel (16.0.3)| -O3 |Intel Core [email protected], 4GB RAM, x86_64 Ubuntu|0.1143^10-3|0.1143^10-3|0.1154 ^10-3 | | ||
| Intel (16.0.3)| -O0 |Intel Core [email protected], 4GB RAM, x86_64 Ubuntu|0.2691^10-2|0.2691^10-2|0.2691 ^10-2 | |
135 changes: 135 additions & 0 deletions
135
src/goto_is_fastest/goto_if_select_comparison_3/defy.f90
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,135 @@ | ||
! A DEFY (DEmystyfy Fortran mYths) test. | ||
! Author: Stefano Zaghi | ||
! Date: 2016-10-22 | ||
! | ||
! License: this file is licensed under the Creative Commons Attribution 4.0 license, | ||
! see http://creativecommons.org/licenses/by/4.0/ . | ||
|
||
program defy | ||
use iso_fortran_env | ||
implicit none | ||
integer(int32), parameter :: tests_number = 3000 | ||
integer(int32) :: keyword | ||
integer(int32) :: keywords(1:4,1:2) | ||
real(real64), allocatable :: key_work(:) | ||
integer(int64) :: profiling(1:2) | ||
integer(int64) :: count_rate | ||
real(real64) :: system_clocks(1:3) | ||
integer(int32) :: i | ||
integer(int32) :: k | ||
integer(int32) :: p | ||
|
||
keywords = 0 | ||
! keys value | ||
keywords(1,1) = 3 | ||
keywords(2,1) = 4 | ||
keywords(3,1) = 1 | ||
keywords(4,1) = 2 | ||
! keys probability | ||
keywords(1,2) = 10 | ||
keywords(2,2) = 8 | ||
keywords(3,2) = 6 | ||
keywords(4,2) = 4 | ||
|
||
system_clocks = 0._real64 | ||
do i=1, tests_number | ||
|
||
do k=1, size(keywords, dim=1) | ||
|
||
keyword = keywords(k, 1) | ||
|
||
do p=1, keywords(k, 2) | ||
|
||
call system_clock(profiling(1), count_rate) | ||
select case(keyword) | ||
case(1) | ||
call worker1(key=keyword, array=key_work) | ||
case(2) | ||
call worker2(key=keyword, array=key_work) | ||
case(3) | ||
call worker3(key=keyword, array=key_work) | ||
case(4) | ||
call worker4(key=keyword, array=key_work) | ||
endselect | ||
call system_clock(profiling(2), count_rate) | ||
system_clocks(1) = system_clocks(1) + real(profiling(2) - profiling(1), kind=real64)/count_rate | ||
|
||
call system_clock(profiling(1), count_rate) | ||
if (keyword==1) then | ||
call worker1(key=keyword, array=key_work) | ||
elseif (keyword==2) then | ||
call worker2(key=keyword, array=key_work) | ||
elseif (keyword==3) then | ||
call worker3(key=keyword, array=key_work) | ||
elseif (keyword==4) then | ||
call worker4(key=keyword, array=key_work) | ||
endif | ||
call system_clock(profiling(2), count_rate) | ||
system_clocks(2) = system_clocks(2) + real(profiling(2) - profiling(1), kind=real64)/count_rate | ||
|
||
call system_clock(profiling(1), count_rate) | ||
goto (10, 20, 30, 40), keyword | ||
goto 50 | ||
10 call worker1(key=keyword, array=key_work) ; goto 50 | ||
20 call worker2(key=keyword, array=key_work) ; goto 50 | ||
30 call worker3(key=keyword, array=key_work) ; goto 50 | ||
40 call worker4(key=keyword, array=key_work) ; goto 50 | ||
50 continue | ||
call system_clock(profiling(2), count_rate) | ||
system_clocks(3) = system_clocks(3) + real(profiling(2) - profiling(1), kind=real64)/count_rate | ||
enddo | ||
enddo | ||
enddo | ||
print '(A,E23.15)', ' select case average performance: ', system_clocks(1)/tests_number | ||
print '(A,E23.15)', ' if elseif average performance: ', system_clocks(2)/tests_number | ||
print '(A,E23.15)', ' goto average performance: ', system_clocks(3)/tests_number | ||
|
||
contains | ||
pure subroutine worker1(key, array) | ||
integer(int32), intent(in) :: key | ||
real(real64), allocatable, intent(out) :: array(:) | ||
integer(int32) :: j | ||
|
||
allocate(array(1:key*tests_number)) | ||
array = 0._real64 | ||
do j=1, key*tests_number | ||
array(j) = key**2._real64 * tests_number * j | ||
enddo | ||
endsubroutine worker1 | ||
|
||
pure subroutine worker2(key, array) | ||
integer(int32), intent(in) :: key | ||
real(real64), allocatable, intent(out) :: array(:) | ||
integer(int32) :: j | ||
|
||
allocate(array(1:key*tests_number)) | ||
array = 0._real64 | ||
do j=1, key*tests_number | ||
array(j) = key**2._real64 * tests_number * j | ||
enddo | ||
endsubroutine worker2 | ||
|
||
pure subroutine worker3(key, array) | ||
integer(int32), intent(in) :: key | ||
real(real64), allocatable, intent(out) :: array(:) | ||
integer(int32) :: j | ||
|
||
allocate(array(1:key*tests_number)) | ||
array = 0._real64 | ||
do j=1, key*tests_number | ||
array(j) = key**2._real64 * tests_number * j | ||
enddo | ||
endsubroutine worker3 | ||
|
||
pure subroutine worker4(key, array) | ||
integer(int32), intent(in) :: key | ||
real(real64), allocatable, intent(out) :: array(:) | ||
integer(int32) :: j | ||
|
||
allocate(array(1:key*tests_number)) | ||
array = 0._real64 | ||
do j=1, key*tests_number | ||
array(j) = key**2._real64 * tests_number * j | ||
enddo | ||
endsubroutine worker4 | ||
endprogram defy |
11 changes: 11 additions & 0 deletions
11
src/goto_is_fastest/goto_if_select_comparison_3/run_gnu.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
#!/bin/bash | ||
# script to build and run DEFY tests. | ||
# | ||
# License: this file is licensed under the Creative Commons Attribution 4.0 license, | ||
# see http://creativecommons.org/licenses/by/4.0/ . | ||
|
||
test=$(basename $(pwd))/defy.f90 | ||
echo "Build and run $test by means of 'gfortran -Og'" | ||
gfortran -Og defy.f90 -o defy | ||
./defy | ||
rm -f defy |
11 changes: 11 additions & 0 deletions
11
src/goto_is_fastest/goto_if_select_comparison_3/run_gnu_optimized.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
#!/bin/bash | ||
# script to build and run DEFY tests. | ||
# | ||
# License: this file is licensed under the Creative Commons Attribution 4.0 license, | ||
# see http://creativecommons.org/licenses/by/4.0/ . | ||
|
||
test=$(basename $(pwd))/defy.f90 | ||
echo "Build and run $test by means of 'gfortran -O3'" | ||
gfortran -O3 defy.f90 -o defy | ||
./defy | ||
rm -f defy |
11 changes: 11 additions & 0 deletions
11
src/goto_is_fastest/goto_if_select_comparison_3/run_intel.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
#!/bin/bash | ||
# script to build and run DEFY tests. | ||
# | ||
# License: this file is licensed under the Creative Commons Attribution 4.0 license, | ||
# see http://creativecommons.org/licenses/by/4.0/ . | ||
|
||
test=$(basename $(pwd))/defy.f90 | ||
echo "Build and run $test by means of 'ifort -O0'" | ||
ifort -O0 defy.f90 -o defy | ||
./defy | ||
rm -f defy |
11 changes: 11 additions & 0 deletions
11
src/goto_is_fastest/goto_if_select_comparison_3/run_intel_optimized.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
#!/bin/bash | ||
# script to build and run DEFY tests. | ||
# | ||
# License: this file is licensed under the Creative Commons Attribution 4.0 license, | ||
# see http://creativecommons.org/licenses/by/4.0/ . | ||
|
||
test=$(basename $(pwd))/defy.f90 | ||
echo "Build and run $test by means of 'ifort -O3'" | ||
ifort -O3 defy.f90 -o defy | ||
./defy | ||
rm -f defy |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.