Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimize the use of floating point arithmetic #15

Open
Alexandre-M opened this issue Feb 7, 2017 · 5 comments
Open

Minimize the use of floating point arithmetic #15

Alexandre-M opened this issue Feb 7, 2017 · 5 comments

Comments

@Alexandre-M
Copy link

Hi,

We're currently developing on the new ARMADA380 architecture for FreeBSD.
During the checks of the L2 cache activation, we use your tool that is very useful.

However, FreeBSD 10.3 lack the support of hard-float, so all the computations of the memory bandwidth include the soft-float computation. It results bad value (and some stress :) ) and hours to search why the memory is so slow.

Can you look to minimize as possible the use of floating point arithmetic during the bench phases ?

Thank you

Alexandre Martins

@ssvb
Copy link
Owner

ssvb commented Feb 7, 2017

But the use of floating point calculations should be already pretty much minimal. They are also mostly done outside of the critical loops, except for the getime() function. Why do you think that it is a problem?

@ssvb
Copy link
Owner

ssvb commented Feb 7, 2017

Could you please share your current logs? It is quite possible that the reported values are actually normal. Also as a test, you could try to hack the gettime() function and change it to do twice (or even 10x) more work, then check whether this affects the reported values.

@Alexandre-M
Copy link
Author

Hi

I made a small patch to check if the soft-float is really performance cost.

patch.txt

We won some Mb/s but not so much.

Feel free to integrate that (or not :) )

@ssvb
Copy link
Owner

ssvb commented Feb 8, 2017

How big was the difference? I just don't like integers for this kind of calculations because they tend to overflow. And if it happens, then we get really bogus results.

Edit: BTW, you are using ualarm() in your patch and it is not exactly accurate. See https://linux.die.net/man/3/ualarm

The ualarm() function causes the signal SIGALRM to be sent to
the invoking process after (not less than) usecs microseconds.
The delay may be lengthened slightly by any system activity or
by the time spent processing the call or by the granularity of
system timers. 

@Alexandre-M
Copy link
Author

The difference was about ~3%. (go from 380 Mb/s to 390 Mb/s)

The fact that ualarm is not accurate is not an issue. In my case, the loop run about 0.6 seconds (far of 0.5 asked) in both case. The timer t1 and t2 give us the true start and stop so the value is still accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants