Author

Topic: Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480! - page 190. (Read 214410 times)

full member
Activity: 243
Merit: 105
I just updated the repository. 170 sol/s on RX 480 and 162 sol/s on GTX 1060 3GB. kernel_sol gets split up and now 200% faster with a parallelized duplicate search. NR_ROWS_LOG=12 is finally usable on RX 480. More speedups are coming soon.

Code:
main.c:1434:12: error: too few arguments to function ‘solve_equihash’
   total += solve_equihash(dev_id, ctx, queue, k_init_ht, k_rounds, k_sols, buf_ht,

Code:
main.c:1489:3: error: too many arguments to function ‘mining_mode’
   mining_mode(*dev_id, program, ctx, queue, k_init_ht, k_rounds, k_potential_sols, k_sols, buf_ht,
sr. member
Activity: 728
Merit: 304
Miner Developer
I just updated the repository. 170 sol/s on RX 480 and 162 sol/s on GTX 1060 3GB. kernel_sol gets split up and now 200% faster with a parallelized duplicate search. NR_ROWS_LOG=12 is finally usable on RX 480. More speedups are coming soon.
sr. member
Activity: 728
Merit: 304
Miner Developer
6x1070

Code:
Total 1585.0 sol/s [dev0 265.9, dev1 268.7, dev2 270.8, dev3 263.0, dev4 260.2, dev5 267.8] 104 shares
Total 1585.4 sol/s [dev0 265.6, dev1 268.9, dev2 270.9, dev3 264.2, dev4 260.4, dev5 268.6] 104 shares
Total 1585.2 sol/s [dev0 264.9, dev1 269.8, dev2 270.2, dev3 263.2, dev4 260.0, dev5 267.9] 105 shares
Total 1585.1 sol/s [dev0 264.5, dev1 270.4, dev2 269.6, dev3 263.6, dev4 260.2, dev5 267.4] 105 shares
Total 1584.6 sol/s [dev0 265.5, dev1 270.4, dev2 268.8, dev3 261.8, dev4 259.0, dev5 267.9] 105 shares
Total 1584.9 sol/s [dev0 264.3, dev1 270.7, dev2 269.1, dev3 262.1, dev4 259.8, dev5 268.2] 106 shares
Total 1585.1 sol/s [dev0 262.9, dev1 271.3, dev2 268.6, dev3 262.4, dev4 261.0, dev5 269.3] 107 shares
Total 1584.8 sol/s [dev0 262.9, dev1 270.8, dev2 267.2, dev3 261.5, dev4 260.6, dev5 269.2] 107 shares

Note: the cpu fix for linux seems to be broken, I tested with dirty fix(LD_PRELOAD libtime.so), with cpu fix it will be faster 1-2%.

Those are very nice numbers to see... Thanks for your report.
I don't think the CPU fix for Linux was even applied to my fork to begin with.
I will fix this problem soon.
legendary
Activity: 1498
Merit: 1030
The slow speed is probably due either to the modded BIOS or to the driver. Mods for Claymore's do not necessarily work with Gateless Gate/SILENTARMY. I would try the stock BIOS first. Also, I only tested the miner with Crimson drivers. I suppose I need to be more clear about requirements...
This is memory timings patch. Not sure that it can be a reason for this low solrate.
But, on next few days I will try install latest crimson drivers and reflash stock bios. Will see what will change.
I just tried to install latest drivers (16.11.2) for 280x and test GG v0.0.2 but...miner doesn't show hashrate, here is screenshot what I got.
On GG v0.0.1 all works well and program shows hashrate.

 Try 15.12 drivers.

 The "Relive" junk does NOT work well for mining in particular, and is a major step backwards for pre-RX series cards in general while being a SERIOUSLY bad case of massive bloatware.
 It, like many of the Crimson 16.x series drivers, also refuses to work with modded BIOS on many cards.

 The Crimson stuff past 15.12 offered NO improvement from mining on pre-RX series cards.
sr. member
Activity: 728
Merit: 304
Miner Developer
The whole idea of splitting rounds into optimal sizes is fascinating...
I guess my Christmas break from programming is over already  Smiley

I can send you Claymore's kernel .il files.  email me if you want them.


Thanks for the offer, but I am pretty sure I can figure things out on my own at this point.
Besides, I greatly admire Claymore's coding skills.
Peeking into his code against his will would be disrespectful, and I don't want to do that.
Now I'm 100% confident that I can catch up with  Claymore's, Optiminer, and Eqminer.
It's just a matter of time and effort.
sr. member
Activity: 588
Merit: 251
The whole idea of splitting rounds into optimal sizes is fascinating...
I guess my Christmas break from programming is over already  Smiley

I can send you Claymore's kernel .il files.  email me if you want them.
member
Activity: 78
Merit: 10
The slow speed is probably due either to the modded BIOS or to the driver. Mods for Claymore's do not necessarily work with Gateless Gate/SILENTARMY. I would try the stock BIOS first. Also, I only tested the miner with Crimson drivers. I suppose I need to be more clear about requirements...
This is memory timings patch. Not sure that it can be a reason for this low solrate.
But, on next few days I will try install latest crimson drivers and reflash stock bios. Will see what will change.
I just tried to install latest drivers (16.11.2) for 280x and test GG v0.0.2 but...miner doesn't show hashrate, here is screenshot what I got.
On GG v0.0.1 all works well and program shows hashrate.
sr. member
Activity: 574
Merit: 250
Fighting mob law and inquisition in this forum
6x1070

Code:
Total 1585.0 sol/s [dev0 265.9, dev1 268.7, dev2 270.8, dev3 263.0, dev4 260.2, dev5 267.8] 104 shares
Total 1585.4 sol/s [dev0 265.6, dev1 268.9, dev2 270.9, dev3 264.2, dev4 260.4, dev5 268.6] 104 shares
Total 1585.2 sol/s [dev0 264.9, dev1 269.8, dev2 270.2, dev3 263.2, dev4 260.0, dev5 267.9] 105 shares
Total 1585.1 sol/s [dev0 264.5, dev1 270.4, dev2 269.6, dev3 263.6, dev4 260.2, dev5 267.4] 105 shares
Total 1584.6 sol/s [dev0 265.5, dev1 270.4, dev2 268.8, dev3 261.8, dev4 259.0, dev5 267.9] 105 shares
Total 1584.9 sol/s [dev0 264.3, dev1 270.7, dev2 269.1, dev3 262.1, dev4 259.8, dev5 268.2] 106 shares
Total 1585.1 sol/s [dev0 262.9, dev1 271.3, dev2 268.6, dev3 262.4, dev4 261.0, dev5 269.3] 107 shares
Total 1584.8 sol/s [dev0 262.9, dev1 270.8, dev2 267.2, dev3 261.5, dev4 260.6, dev5 269.2] 107 shares

Note: the cpu fix for linux seems to be broken, I tested with dirty fix(LD_PRELOAD libtime.so), with cpu fix it will be faster 1-2%.

That the current Windows build?

last git, linux

Speeds looking okay given the fact that its double from old SA5. Zawawa is on the right way I think.
full member
Activity: 243
Merit: 105
6x1070

Code:
Total 1585.0 sol/s [dev0 265.9, dev1 268.7, dev2 270.8, dev3 263.0, dev4 260.2, dev5 267.8] 104 shares
Total 1585.4 sol/s [dev0 265.6, dev1 268.9, dev2 270.9, dev3 264.2, dev4 260.4, dev5 268.6] 104 shares
Total 1585.2 sol/s [dev0 264.9, dev1 269.8, dev2 270.2, dev3 263.2, dev4 260.0, dev5 267.9] 105 shares
Total 1585.1 sol/s [dev0 264.5, dev1 270.4, dev2 269.6, dev3 263.6, dev4 260.2, dev5 267.4] 105 shares
Total 1584.6 sol/s [dev0 265.5, dev1 270.4, dev2 268.8, dev3 261.8, dev4 259.0, dev5 267.9] 105 shares
Total 1584.9 sol/s [dev0 264.3, dev1 270.7, dev2 269.1, dev3 262.1, dev4 259.8, dev5 268.2] 106 shares
Total 1585.1 sol/s [dev0 262.9, dev1 271.3, dev2 268.6, dev3 262.4, dev4 261.0, dev5 269.3] 107 shares
Total 1584.8 sol/s [dev0 262.9, dev1 270.8, dev2 267.2, dev3 261.5, dev4 260.6, dev5 269.2] 107 shares

Note: the cpu fix for linux seems to be broken, I tested with dirty fix(LD_PRELOAD libtime.so), with cpu fix it will be faster 1-2%.

That the current Windows build?

last git, linux
sr. member
Activity: 574
Merit: 250
Fighting mob law and inquisition in this forum
6x1070

Code:
Total 1585.0 sol/s [dev0 265.9, dev1 268.7, dev2 270.8, dev3 263.0, dev4 260.2, dev5 267.8] 104 shares
Total 1585.4 sol/s [dev0 265.6, dev1 268.9, dev2 270.9, dev3 264.2, dev4 260.4, dev5 268.6] 104 shares
Total 1585.2 sol/s [dev0 264.9, dev1 269.8, dev2 270.2, dev3 263.2, dev4 260.0, dev5 267.9] 105 shares
Total 1585.1 sol/s [dev0 264.5, dev1 270.4, dev2 269.6, dev3 263.6, dev4 260.2, dev5 267.4] 105 shares
Total 1584.6 sol/s [dev0 265.5, dev1 270.4, dev2 268.8, dev3 261.8, dev4 259.0, dev5 267.9] 105 shares
Total 1584.9 sol/s [dev0 264.3, dev1 270.7, dev2 269.1, dev3 262.1, dev4 259.8, dev5 268.2] 106 shares
Total 1585.1 sol/s [dev0 262.9, dev1 271.3, dev2 268.6, dev3 262.4, dev4 261.0, dev5 269.3] 107 shares
Total 1584.8 sol/s [dev0 262.9, dev1 270.8, dev2 267.2, dev3 261.5, dev4 260.6, dev5 269.2] 107 shares

Note: the cpu fix for linux seems to be broken, I tested with dirty fix(LD_PRELOAD libtime.so), with cpu fix it will be faster 1-2%.

That the current Windows build?
full member
Activity: 243
Merit: 105
6x1070

Code:
Total 1585.0 sol/s [dev0 265.9, dev1 268.7, dev2 270.8, dev3 263.0, dev4 260.2, dev5 267.8] 104 shares
Total 1585.4 sol/s [dev0 265.6, dev1 268.9, dev2 270.9, dev3 264.2, dev4 260.4, dev5 268.6] 104 shares
Total 1585.2 sol/s [dev0 264.9, dev1 269.8, dev2 270.2, dev3 263.2, dev4 260.0, dev5 267.9] 105 shares
Total 1585.1 sol/s [dev0 264.5, dev1 270.4, dev2 269.6, dev3 263.6, dev4 260.2, dev5 267.4] 105 shares
Total 1584.6 sol/s [dev0 265.5, dev1 270.4, dev2 268.8, dev3 261.8, dev4 259.0, dev5 267.9] 105 shares
Total 1584.9 sol/s [dev0 264.3, dev1 270.7, dev2 269.1, dev3 262.1, dev4 259.8, dev5 268.2] 106 shares
Total 1585.1 sol/s [dev0 262.9, dev1 271.3, dev2 268.6, dev3 262.4, dev4 261.0, dev5 269.3] 107 shares
Total 1584.8 sol/s [dev0 262.9, dev1 270.8, dev2 267.2, dev3 261.5, dev4 260.6, dev5 269.2] 107 shares

Note: the cpu fix for linux seems to be broken, I tested with dirty fix(LD_PRELOAD libtime.so), with cpu fix it will be faster 1-2%.
sr. member
Activity: 652
Merit: 266
Good luck...Christmas time can be more stressing then coding. Ahje
Indeed it is...my stomach is about to blow out. Instead of harvesting zcash, try to harvest some decent christmas meals Cheesy
Merry Christmas to you all!
sr. member
Activity: 574
Merit: 250
Fighting mob law and inquisition in this forum
Good luck...Christmas time can be more stressing then coding. Ahje
sr. member
Activity: 728
Merit: 304
Miner Developer
The whole idea of splitting rounds into optimal sizes is fascinating...
I guess my Christmas break from programming is over already  Smiley
sr. member
Activity: 728
Merit: 304
Miner Developer
Note to self before taking a Christmas break. NR_ROWS_LOG=12 still suffers from a low occupancy at Rounds 1 through 4 due to high shared memory usage:




Claymore's 9.2 kernel seems to have found an optimization to the equihash algorithm by using 11 rounds.  Up to round 3 uses 32K LDS, up to round 6 uses 21K and 68 VGPRs, rounds 7-9 use 16K and 56 VGPRs, and round 10 uses 7K.

Ah, that's how he overcame the limitations of the small amount of shared memory!
He must be splitting up some of the rounds of Wagner's so that all the slots in each row would fit into LDS.
That's very clever of him, I must say.
sr. member
Activity: 588
Merit: 251
Note to self before taking a Christmas break. NR_ROWS_LOG=12 still suffers from a low occupancy at Rounds 1 through 4 due to high shared memory usage:




Claymore's 9.2 kernel seems to have found an optimization to the equihash algorithm by using 11 rounds.  Up to round 3 uses 32K LDS, up to round 6 uses 21K and 68 VGPRs, rounds 7-9 use 16K and 56 VGPRs, and round 10 uses 7K.
sr. member
Activity: 728
Merit: 304
Miner Developer
Note to self before taking a Christmas break. NR_ROWS_LOG=12 still suffers from a low occupancy at Rounds 1 through 4 due to high shared memory usage:



It would be interesting to see what would happen when I reach the top of the occupancy hill:

sr. member
Activity: 728
Merit: 304
Miner Developer
Good work. Keep it on. Merry Christmas Zawawa. Thanks in the name of the community for all your efforts also on Christmas eve.
Now enjoy the time with your wife and don't mention you bought a GTx1060 :-D

That's very nice of you. Merry Christmas to you, too!
sr. member
Activity: 574
Merit: 250
Fighting mob law and inquisition in this forum
Good work. Keep it on. Merry Christmas Zawawa. Thanks in the name of the community for all your efforts also on Christmas eve.
Now enjoy the time with your wife and don't mention you bought a GTx1060 :-D
sr. member
Activity: 728
Merit: 304
Miner Developer
I just updated the repository. Although GTX 1060 3GB is already doing better with the new sorting algorithm at 155 sol/s, more work is needed for AMD cards to take advantage of it.
Jump to: