-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adopting xmr-stak for CryptoNight variant 2 #19
Comments
Same question from this side. Is this something you are looking at updating? 👍 |
I also waiting ^^ |
Yeah, I started working on this today. |
@nioroso-x3 I've got a working xmrig repo that I've been performance tuning the past few days alongside SChernykh. Which CPUs are you going to be tuning against? Unlike CNv7 we will need to decide which square root algorithm to use based on the expected utilization of an SMP4 core. |
Hi nioroso-x3, |
Same, multiple power smt4 sitting idle
…On Wed, Oct 17, 2018, 01:05 gcy000 ***@***.***> wrote:
Hi nioroso-x3,
I’m currently using power8 cpu for xmr...any repo for that?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#19 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AP_aRkrxnB6OKDpq-KxFJIHpw9KSi3lNks5ultbAgaJpZM4XXxGT>
.
|
@madscientist159 is it possible for you to share your working repo, even that it drops the performance 50%. Once the tuning is finished we can update again to get the best performance. Thanks in advance. |
https://github.com/nioroso-x3/xmr-stak Ok, updated it to just working, need to optimize now. |
Darn, you beat me to it! 😃 Here's my optimized xmrig variant, it gets basically the same speed as CNv7 on the > 8 core chips, and ~62% of CNv7 on the 4 and 8 core devices. |
A lookup table for sqrt? You didnt try to do two square roots at once using vec_sqrt? |
@nioroso-x3 Of course I did -- you can see vestiges of that in how the xmrig source was restructured to allow two square roots right up next to one another. I also tried many other arcane instructions and methods for getting the square root (even going so far as trying to fill multiple execution pipelines with combinations of xsrsqrtesp, xssqrtsp, and xssqrtqp) -- empirically, on POWER9, using a dual 8 core test machine, the given LUT was the fastest by a significant margin (over 10%). What it all comes down to is that square root, alongside division, is probably the weakest instruction in the entire POWER9 design. Not only is the issue time for the square root instruction terrible, but the VSX units are shared among SMT threads -- the LUT allows greater parallelism. Don't take my word alone for it -- run it through perf and see for yourself! 😉 |
Ok, I added your fast square root for the 1x threads. |
hi ****
it show Sith |
I tested it in the killallasics.moneroworld.com pool, you have to set cryptonight_v8 explicitly |
Power 8 +ubuntu 16.04+at11.0 Scanning dependencies of target xmr-stak-c |
You are missing some library, it compiles fine in Centos 7. |
Ugh, performance is pretty terrible. Power8, 74 threads, 1400h/s, was about 3k on cnv7 I can tell by the quiet machine room that the system is not working very hard. |
@rem260 Another Firestone user from the description of the sound? 😉 Can you give my xmrig version a try? On a 36 core POWER9 box it's running at around 68% of CNv7 and I'm actively working to try to boost that further. There's a tremendous amount of cache thrashing going on and POWER is very weak on both square root and division (basically the two worst instructions in the entire design), making things significantly worse. |
I haven't implemented the big endian aes optimization I had for cnv7, which was the bottleneck for power8. |
Hi |
Any updates on the optimization? |
i think no have any update and performance very poor ^^ |
Hello Nioroso,
as Monero is changing PoW on 18.10.2018 i was wondering:
Are you planning to adopt a new version of xmr-stak for Power supporting CryptoNight variant 2?
The text was updated successfully, but these errors were encountered: