SILENTARMY v5: Zcash miner, 115 sol/s on R9 Nano, 70 sol/s on GTX 1070 - page 67.

laik2

sr. member

Activity: 652

Merit: 266

Quote from: zawawa on November 11, 2016, 03:16:59 PM

I was able to build SILENTARMY v5 for Windows, but the performance is suboptimal.
If I manage to squeeze the advertised speed on my RX 480's, I will release Windows binaries.

Any project files would be helpful so I can strip my current code and optimize it for pre maxwell v2 cards.
Thank you.

Mugatu

member

Activity: 93

Merit: 10

Quote from: nerdralph on November 11, 2016, 03:54:56 PM

Quote from: yslyung on November 11, 2016, 03:23:09 PM

WINDOWS version please . . .

i'm sure you'll get donations instead of a closed source with fixed fees ..

thx for the great work mrb

Genoil already tried that, and said donations slowed to a trickle after Claymore released his miner.

Genoil's miner was/is unstable as hell

mrada1204

newbie

Activity: 28

Merit: 0

Quote from: zawawa on November 11, 2016, 03:53:51 PM

I am getting pretty good speeds with SILENTARMY v5 and 3 RX 480's on Windows 10.
This is amazing...

Code:

Total 277.6 sol/s [dev0 93.5, dev1 97.5, dev2 86.6] 0 shares
Total 251.1 sol/s [dev0 85.0, dev1 84.0, dev2 82.1] 1 share
Total 263.8 sol/s [dev0 88.8, dev1 90.8, dev2 84.2] 1 share
Total 261.8 sol/s [dev0 87.9, dev1 88.7, dev2 85.2] 1 share
Total 261.1 sol/s [dev0 84.9, dev1 88.8, dev2 87.4] 1 share
Total 263.7 sol/s [dev0 86.3, dev1 89.7, dev2 87.7] 1 share
Total 268.3 sol/s [dev0 87.3, dev1 93.2, dev2 87.8] 3 shares
Total 266.7 sol/s [dev0 86.7, dev1 91.7, dev2 88.4] 3 shares

could you please share windows version

mrb

legendary

Activity: 1512

Merit: 1028

Quote from: zawawa on November 11, 2016, 03:53:51 PM

I am getting pretty good speeds with SILENTARMY v5 and 3 RX 480's on Windows 10.

Please do submit your changes adding Windows support

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: yslyung on November 11, 2016, 03:23:09 PM

WINDOWS version please . . .

i'm sure you'll get donations instead of a closed source with fixed fees ..

thx for the great work mrb

Genoil already tried that, and said donations slowed to a trickle after Claymore released his miner.

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

I am getting pretty good speeds with SILENTARMY v5 and 3 RX 480's on Windows 10.
This is amazing...

Code:

Total 277.6 sol/s [dev0 93.5, dev1 97.5, dev2 86.6] 0 shares
Total 251.1 sol/s [dev0 85.0, dev1 84.0, dev2 82.1] 1 share
Total 263.8 sol/s [dev0 88.8, dev1 90.8, dev2 84.2] 1 share
Total 261.8 sol/s [dev0 87.9, dev1 88.7, dev2 85.2] 1 share
Total 261.1 sol/s [dev0 84.9, dev1 88.8, dev2 87.4] 1 share
Total 263.7 sol/s [dev0 86.3, dev1 89.7, dev2 87.7] 1 share
Total 268.3 sol/s [dev0 87.3, dev1 93.2, dev2 87.8] 3 shares
Total 266.7 sol/s [dev0 86.7, dev1 91.7, dev2 88.4] 3 shares

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: adaseb on November 11, 2016, 03:41:11 PM

Anyone running this on a Tahiti ?

No, but I'm getting 45-50 on Pitcairn clocked at 1100,1550 on the 1375 Samsung strap. It's looking like these cards will never be going back to eth mining...

nevermind41

full member

Activity: 186

Merit: 100

Great work. Thank you. The only problem I can't enable overclock feature with this driver. I use coolbits 8 but it didn't effect. Here is default speeds 5 X gtx 1070

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: mrb on November 11, 2016, 02:45:14 PM

The atomic row counters and branch divergence in equihash_solve have always been the main bottleneck.

So is the l2 cache thrashing in ht_store your next optimization target? I know eXtremal is already working on my idea to improve the read performance in equihash_round by using 4x256-byte strides.
A fully-optimized implementation should average one cache line read per equihash_round and 2-3 cache lines of read/write in ht_store. For a Rx 470 with 7Ghz RAM that's 78 itterations per second or ~13ms of time. Add ~1ms for the blake2b initialization for round 0 to get a total of 14ms or 71 itterations per second. If you are correct about 1.9 sols/itteration being optimal, that gives a theoretical 135 solutions/sec, or almost double the current speed.

mrb

legendary

Activity: 1512

Merit: 1028

Quote from: adamvp on November 11, 2016, 03:35:40 PM

for me eXtremal mode seems to be a little better for 380x r9 card..
With his 3moded files I have about 42s/s, your merge gives me about 39s/s...

Let it warm up. AMD cards are sensitive to temperature and seem to need a few minutes to stabilize.

mrb

legendary

Activity: 1512

Merit: 1028

Quote from: ioglnx on November 11, 2016, 03:15:44 PM

Quote from: mrb on November 11, 2016, 03:07:53 PM

Quote from: mrada1204 on November 11, 2016, 03:03:57 PM

dev said v5 will be a windows version .... any news about windows release Huh

Sorry I'm still working on more optimizations for now. Windows support has been delayed for now.

Why not merge the Genoil submitted changes to make windows build possible? The longer you postpone the merge the less is left from his efforts.

To my knowledge, his last pull request was breaking things. And neither he nor I had the time to fix them.

I would merge in a heartbeat if someone, anyone, provided a pull request that doesn't break silentarmy.

adaseb

legendary

Activity: 3808

Merit: 1723

Anyone running this on a Tahiti ?

mrb

legendary

Activity: 1512

Merit: 1028

Quote from: nerdralph on November 11, 2016, 03:35:45 PM

I just realized this uses eXtremal's 4-way first_words hack. When I previously tested it on AMD it didn't provide any speed increase.
I'm going to try going back the way it was with OPTIM_SIMPLIFY_ROUND to see if it is any faster with the latest changes.

Yes this loop unrolling does not increase perf. I only merged it in the interest of saving time.

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: adamvp on November 11, 2016, 03:35:40 PM

Quote from: mrb on November 11, 2016, 02:45:14 PM

Huge thanks to eXtremal for these optimizations. I merged them and released SILENTARMY v5: https://github.com/mbevand/silentarmy/blob/master/CHANGELOG.md I measured a 2x speedup on some cards like the R9 Nano:

102 sol/s on R9 Nano (up from 54 sol/s)
72 sol/s on RX 480
64 sol/s on GTX 1070

The atomic row counters and branch divergence in equihash_solve have always been the main bottleneck. I was working on packing 8 counters per uint, and reducing branch divergence, but eXtremal was done before me Wink

That's the benefit of open source; anyone can improve the code for all.

for me eXtremal mode seems to be a little better for 380x r9 card..
With his 3moded files I have about 42s/s, your merge gives me about 39s/s...

My 380x with modified Hynix timing gives me almost 50.

nerdralph

sr. member

Activity: 588

Merit: 251

I just realized this uses eXtremal's 4-way first_words hack. When I previously tested it on AMD it didn't provide any speed increase.
I'm going to try going back the way it was with OPTIM_SIMPLIFY_ROUND to see if it is any faster with the latest changes.

adamvp

hero member

Activity: 1246

Merit: 708

Quote from: mrb on November 11, 2016, 02:45:14 PM

Huge thanks to eXtremal for these optimizations. I merged them and released SILENTARMY v5: https://github.com/mbevand/silentarmy/blob/master/CHANGELOG.md I measured a 2x speedup on some cards like the R9 Nano:

102 sol/s on R9 Nano (up from 54 sol/s)
72 sol/s on RX 480
64 sol/s on GTX 1070

The atomic row counters and branch divergence in equihash_solve have always been the main bottleneck. I was working on packing 8 counters per uint, and reducing branch divergence, but eXtremal was done before me Wink

That's the benefit of open source; anyone can improve the code for all.

for me eXtremal mode seems to be a little better for 380x r9 card..
With his 3moded files I have about 42s/s, your merge gives me about 39s/s...

yslyung

legendary

Activity: 1500

Merit: 1002

Mine Mine Mine

WINDOWS version please . . .

i'm sure you'll get donations instead of a closed source with fixed fees ..

thx for the great work mrb

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: jiggytom on November 11, 2016, 01:57:32 PM

nerdralph, any way we can get a binary with the slow CPU fixes? I believe someone linked to it earlier. The current one is struggling on my celeron.

I can't test a fix for a problem I can't reproduce. Ubuntu 14.04 with fglrx has less than 2% CPU use for each sa-solver instance on my G1840.

hagie

hero member

Activity: 793

Merit: 501

Quote from: mrb on November 11, 2016, 02:45:14 PM

Huge thanks to eXtremal for these optimizations. I merged them and released SILENTARMY v5: https://github.com/mbevand/silentarmy/blob/master/CHANGELOG.md I measured a 2x speedup on some cards like the R9 Nano:

102 sol/s on R9 Nano (up from 54 sol/s)
72 sol/s on RX 480
64 sol/s on GTX 1070

The atomic row counters and branch divergence in equihash_solve have always been the main bottleneck. I was working on packing 8 counters per uint, and reducing branch divergence, but eXtremal was done before me Wink

That's the benefit of open source; anyone can improve the code for all.

Sorry in my case the new version does only 50% sols. Maybe a SM3.0 problem ?

Code:

./silentarmy --list
Devices on platform "NVIDIA CUDA":
  ID 0: GRID K520

V4 with only param.h changed:

Code:

~/silentarmy.v4$ ./silentarmy
Connecting to us1-zcash.flypool.org:3333
Stratum server sent us the first job
Mining on 1 device
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 18.0 sol/s [dev0 18.0] 0 shares
Total 15.5 sol/s [dev0 15.5] 0 shares
Total 14.3 sol/s [dev0 14.3] 0 shares
Total 14.2 sol/s [dev0 14.2] 0 shares
Total 16.0 sol/s [dev0 16.0] 0 shares
Total 13.8 sol/s [dev0 13.8] 0 shares
Total 14.1 sol/s [dev0 14.1] 0 shares
Total 14.1 sol/s [dev0 14.1] 0 shares

~/silentarmy.v4$ ./sa-solver
Solving default all-zero 140-byte header
Building program
Hash tables will use 805.3 MB
Running...
Nonce 0000000000000000000000000000000000000000000000000000000000000000: 2 sols
Total 2 solutions in 135.0 ms (14.8 Sol/s)

Code:

~/silentarmy$ ./silentarmy
Connecting to us1-zcash.flypool.org:3333
Stratum server sent us the first job
Mining on 1 device
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 7.0 sol/s [dev0 7.0] 0 shares
Total 6.5 sol/s [dev0 6.5] 0 shares
Total 6.3 sol/s [dev0 6.3] 0 shares
Total 8.0 sol/s [dev0 8.0] 0 shares
Total 7.2 sol/s [dev0 7.2] 0 shares
Total 6.8 sol/s [dev0 6.8] 0 shares
Total 7.3 sol/s [dev0 7.3] 0 shares

~/silentarmy$ ./sa-solver
Solving default all-zero 140-byte header
Building program
Hash tables will use 805.3 MB
Running...
Nonce 0000000000000000000000000000000000000000000000000000000000000000: 2 sols
Total 2 solutions in 220.8 ms (9.1 Sol/s)

Any Idea ?

Regards

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

I was able to build SILENTARMY v5 for Windows, but the performance is suboptimal.
If I manage to squeeze the advertised speed on my RX 480's, I will release Windows binaries.

Topic: SILENTARMY v5: Zcash miner, 115 sol/s on R9 Nano, 70 sol/s on GTX 1070 - page 67. (Read 209344 times)