Pages:
Author

Topic: SILENTARMY v5: Zcash miner, 115 sol/s on R9 Nano, 70 sol/s on GTX 1070 - page 67. (Read 209309 times)

sr. member
Activity: 652
Merit: 266
I was able to build SILENTARMY v5 for Windows, but the performance is suboptimal.
If I manage to squeeze the advertised speed on my RX 480's, I will release Windows binaries.

Any project files would be helpful so I can strip my current code and optimize it for pre maxwell v2 cards.
Thank you.
member
Activity: 93
Merit: 10
WINDOWS version please . . .

i'm sure you'll get donations instead of a closed source with fixed fees ..

thx for the great work mrb

Genoil already tried that, and said donations slowed to a trickle after Claymore released his miner.


Genoil's miner was/is unstable as hell
newbie
Activity: 28
Merit: 0
I am getting pretty good speeds with SILENTARMY v5 and 3 RX 480's on Windows 10.
This is amazing...

Code:
Total 277.6 sol/s [dev0 93.5, dev1 97.5, dev2 86.6] 0 shares
Total 251.1 sol/s [dev0 85.0, dev1 84.0, dev2 82.1] 1 share
Total 263.8 sol/s [dev0 88.8, dev1 90.8, dev2 84.2] 1 share
Total 261.8 sol/s [dev0 87.9, dev1 88.7, dev2 85.2] 1 share
Total 261.1 sol/s [dev0 84.9, dev1 88.8, dev2 87.4] 1 share
Total 263.7 sol/s [dev0 86.3, dev1 89.7, dev2 87.7] 1 share
Total 268.3 sol/s [dev0 87.3, dev1 93.2, dev2 87.8] 3 shares
Total 266.7 sol/s [dev0 86.7, dev1 91.7, dev2 88.4] 3 shares

could you please share windows version
mrb
legendary
Activity: 1512
Merit: 1028
I am getting pretty good speeds with SILENTARMY v5 and 3 RX 480's on Windows 10.

Please do submit your changes adding Windows support Smiley
sr. member
Activity: 588
Merit: 251
WINDOWS version please . . .

i'm sure you'll get donations instead of a closed source with fixed fees ..

thx for the great work mrb

Genoil already tried that, and said donations slowed to a trickle after Claymore released his miner.
sr. member
Activity: 728
Merit: 304
Miner Developer
I am getting pretty good speeds with SILENTARMY v5 and 3 RX 480's on Windows 10.
This is amazing...

Code:
Total 277.6 sol/s [dev0 93.5, dev1 97.5, dev2 86.6] 0 shares
Total 251.1 sol/s [dev0 85.0, dev1 84.0, dev2 82.1] 1 share
Total 263.8 sol/s [dev0 88.8, dev1 90.8, dev2 84.2] 1 share
Total 261.8 sol/s [dev0 87.9, dev1 88.7, dev2 85.2] 1 share
Total 261.1 sol/s [dev0 84.9, dev1 88.8, dev2 87.4] 1 share
Total 263.7 sol/s [dev0 86.3, dev1 89.7, dev2 87.7] 1 share
Total 268.3 sol/s [dev0 87.3, dev1 93.2, dev2 87.8] 3 shares
Total 266.7 sol/s [dev0 86.7, dev1 91.7, dev2 88.4] 3 shares
sr. member
Activity: 588
Merit: 251
Anyone running this on a Tahiti ?

No, but I'm getting 45-50 on Pitcairn clocked at 1100,1550 on the 1375 Samsung strap.  It's looking like these cards will never be going back to eth mining...
full member
Activity: 186
Merit: 100
Great work. Thank you. The only problem I can't enable overclock feature with this driver. I use coolbits 8 but it didn't effect. Here is default speeds 5 X gtx 1070
sr. member
Activity: 588
Merit: 251
The atomic row counters and branch divergence in equihash_solve have always been the main bottleneck.

So is the l2 cache thrashing in ht_store your next optimization target?  I know eXtremal is already working on my idea to improve the read performance in equihash_round by using 4x256-byte strides.
A fully-optimized implementation should average one cache line read per equihash_round and 2-3 cache lines of read/write in ht_store.  For a Rx 470 with 7Ghz RAM that's 78 itterations per second or ~13ms of time.  Add ~1ms for the blake2b initialization for round 0 to get a total of 14ms or 71 itterations per second.  If you are correct about 1.9 sols/itteration being optimal, that gives a theoretical 135 solutions/sec, or almost double the current speed.

mrb
legendary
Activity: 1512
Merit: 1028
for me eXtremal mode seems to be a little better for 380x r9 card..
With his 3moded files I have about 42s/s, your merge gives me about 39s/s...

Let it warm up. AMD cards are sensitive to temperature and seem to need a few minutes to stabilize.
mrb
legendary
Activity: 1512
Merit: 1028
dev said v5 will be a windows version ....  any news about windows release Huh

Sorry I'm still working on more optimizations for now. Windows support has been delayed for now.

Why not merge the Genoil submitted changes to make windows build possible? The longer  you postpone the merge the less is left from his efforts.

To my knowledge, his last pull request was breaking things. And neither he nor I had the time to fix them.

I would merge in a heartbeat if someone, anyone, provided a pull request that doesn't break silentarmy.
legendary
Activity: 3808
Merit: 1723
Anyone running this on a Tahiti ?
mrb
legendary
Activity: 1512
Merit: 1028
I just realized this uses eXtremal's 4-way first_words hack.  When I previously tested it on AMD it didn't provide any speed increase.
I'm going to try going back the way it was with OPTIM_SIMPLIFY_ROUND to see if it is any faster with the latest changes.

Yes this loop unrolling does not increase perf. I only merged it in the interest of saving time.
sr. member
Activity: 588
Merit: 251
Huge thanks to eXtremal for these optimizations. I merged them and released SILENTARMY v5: https://github.com/mbevand/silentarmy/blob/master/CHANGELOG.md I measured a 2x speedup on some cards like the R9 Nano:
  • 102 sol/s on R9 Nano (up from 54 sol/s)
  • 72 sol/s on RX 480
  • 64 sol/s on GTX 1070

The atomic row counters and branch divergence in equihash_solve have always been the main bottleneck. I was working on packing 8 counters per uint, and reducing branch divergence, but eXtremal was done before me Wink That's the benefit of open source; anyone can improve the code for all.
for me eXtremal mode seems to be a little better for 380x r9 card..
With his 3moded files I have about 42s/s, your merge gives me about 39s/s...

My 380x with modified Hynix timing gives me almost 50.
sr. member
Activity: 588
Merit: 251
I just realized this uses eXtremal's 4-way first_words hack.  When I previously tested it on AMD it didn't provide any speed increase.
I'm going to try going back the way it was with OPTIM_SIMPLIFY_ROUND to see if it is any faster with the latest changes.
hero member
Activity: 1246
Merit: 708
Huge thanks to eXtremal for these optimizations. I merged them and released SILENTARMY v5: https://github.com/mbevand/silentarmy/blob/master/CHANGELOG.md I measured a 2x speedup on some cards like the R9 Nano:
  • 102 sol/s on R9 Nano (up from 54 sol/s)
  • 72 sol/s on RX 480
  • 64 sol/s on GTX 1070

The atomic row counters and branch divergence in equihash_solve have always been the main bottleneck. I was working on packing 8 counters per uint, and reducing branch divergence, but eXtremal was done before me Wink That's the benefit of open source; anyone can improve the code for all.
for me eXtremal mode seems to be a little better for 380x r9 card..
With his 3moded files I have about 42s/s, your merge gives me about 39s/s...
legendary
Activity: 1500
Merit: 1002
Mine Mine Mine
WINDOWS version please . . .

i'm sure you'll get donations instead of a closed source with fixed fees ..

thx for the great work mrb
sr. member
Activity: 588
Merit: 251
nerdralph, any way we can get a binary with the slow CPU fixes?  I believe someone linked to it earlier.  The current one is struggling on my celeron. 

I can't test a fix for a problem I can't reproduce.  Ubuntu 14.04 with fglrx has less than 2% CPU use for each sa-solver instance on my G1840.
hero member
Activity: 792
Merit: 501
Huge thanks to eXtremal for these optimizations. I merged them and released SILENTARMY v5: https://github.com/mbevand/silentarmy/blob/master/CHANGELOG.md I measured a 2x speedup on some cards like the R9 Nano:
  • 102 sol/s on R9 Nano (up from 54 sol/s)
  • 72 sol/s on RX 480
  • 64 sol/s on GTX 1070

The atomic row counters and branch divergence in equihash_solve have always been the main bottleneck. I was working on packing 8 counters per uint, and reducing branch divergence, but eXtremal was done before me Wink That's the benefit of open source; anyone can improve the code for all.

Sorry in my case the new version does only 50% sols. Maybe a SM3.0 problem ?

Code:
./silentarmy --list
Devices on platform "NVIDIA CUDA":
  ID 0: GRID K520


V4 with only param.h changed:

Code:
~/silentarmy.v4$ ./silentarmy
Connecting to us1-zcash.flypool.org:3333
Stratum server sent us the first job
Mining on 1 device
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 18.0 sol/s [dev0 18.0] 0 shares
Total 15.5 sol/s [dev0 15.5] 0 shares
Total 14.3 sol/s [dev0 14.3] 0 shares
Total 14.2 sol/s [dev0 14.2] 0 shares
Total 16.0 sol/s [dev0 16.0] 0 shares
Total 13.8 sol/s [dev0 13.8] 0 shares
Total 14.1 sol/s [dev0 14.1] 0 shares
Total 14.1 sol/s [dev0 14.1] 0 shares

~/silentarmy.v4$ ./sa-solver
Solving default all-zero 140-byte header
Building program
Hash tables will use 805.3 MB
Running...
Nonce 0000000000000000000000000000000000000000000000000000000000000000: 2 sols
Total 2 solutions in 135.0 ms (14.8 Sol/s)

Code:
~/silentarmy$ ./silentarmy
Connecting to us1-zcash.flypool.org:3333
Stratum server sent us the first job
Mining on 1 device
Total 0.0 sol/s [dev0 0.0] 0 shares
Total 7.0 sol/s [dev0 7.0] 0 shares
Total 6.5 sol/s [dev0 6.5] 0 shares
Total 6.3 sol/s [dev0 6.3] 0 shares
Total 8.0 sol/s [dev0 8.0] 0 shares
Total 7.2 sol/s [dev0 7.2] 0 shares
Total 6.8 sol/s [dev0 6.8] 0 shares
Total 7.3 sol/s [dev0 7.3] 0 shares

~/silentarmy$ ./sa-solver
Solving default all-zero 140-byte header
Building program
Hash tables will use 805.3 MB
Running...
Nonce 0000000000000000000000000000000000000000000000000000000000000000: 2 sols
Total 2 solutions in 220.8 ms (9.1 Sol/s)


Any Idea ?

Regards
sr. member
Activity: 728
Merit: 304
Miner Developer
I was able to build SILENTARMY v5 for Windows, but the performance is suboptimal.
If I manage to squeeze the advertised speed on my RX 480's, I will release Windows binaries.
Pages:
Jump to: