Pages:
Author

Topic: limits of ZEC mining - page 3. (Read 10069 times)

legendary
Activity: 1000
Merit: 1120
November 15, 2016, 09:21:20 AM
#22
On a more serious note: if you state the performance is now at 80% of theoretical maximum, we're basically there, right? ETH miners also peak at about 80-85% of the theoretical maximum. Does the same rule apply here?

I still have some work to do before I write my own miner from scratch.  I like to *really* understand the problem before I start writing code, and there's still some parts of the GCN architecture that I'm figuring out.


Hi, nerdralph! Would you be interested in going after my Cuckoo Cycle bounties?

https://github.com/tromp/cuckoo
legendary
Activity: 1498
Merit: 1030
November 15, 2016, 05:56:21 AM
#21
And the power usage on ETH for the 1070 vs the RX 480 is also very similar - pretty much a dead heat on a hash/watt basis.

 Unfortunately for ETH or ZEC miners the 1070 is almost twice the cost of the RX480 while not offering comparable hash/$.



full member
Activity: 243
Merit: 105
November 15, 2016, 02:54:28 AM
#20
Do you think, you can start developing miners for Nvidia?
There is a lack of skilled and honest developers.

Kind of hard when I don't have any Nvidia cards.  Even if I did, I'd stick to OpenCL, as I have no interest in learning CUDA.


No difference in speed between cuda and openCL implementations of silentarmy. Cuda can take advantage in computation algo, where its inline assembly can be used(LOP3 and other). In memory hard algo it does not matter use cuda or opencl.

I'm getting on hard overclocked 1070(samsung memory) ~590 s/s from 6 cards, it is near 97-98 from card. eXtremal got 90 s/s from rx480 bios tuned.
I don't have 480 , but have 470, in etherium they get 27 M/H, while overcloked 1070 with samsung memory - 31-32 MH. That is about 15-18% more, then amd. So 92-98 s/s on 1070 vs ~80-85 on 470 is proportional etherium hashrate difference.
legendary
Activity: 3248
Merit: 1072
November 15, 2016, 02:41:03 AM
#19
Do you think, you can start developing miners for Nvidia?
There is a lack of skilled and honest developers.

Kind of hard when I don't have any Nvidia cards.  Even if I did, I'd stick to OpenCL, as I have no interest in learning CUDA.


even if the potential might be higher? or maybe you know already that this will be never the case?
sr. member
Activity: 588
Merit: 251
November 14, 2016, 08:28:15 PM
#18
p.s. I also have another idea that should work on 4GB cards.  The miner could use 12-slot bins of 32 bytes, just like silentarmy, but use a new table every round instead of using 2 tables in a double-buffered fashion.  This would use 384MB * 9 =~ 3.5GB, but then your first write to any row could write 32-bytes of dummy data along with the 32-byte collision record.  This would avoid the read-before-write. You could do this with the 2nd through 6th write by filling the even slots before the odd ones.  This would reduce the average IO per round to 2^20 * 3 * 64-bytes, or 192MB per round and 1.728GB per iteration.  That would be a theoretical max of 130 iterations per second on a Rx 470 with a 7Gbps memory clock, which would be around 240 solutions per second.  Using 93% of the theoretical limit taken from eth mining, that would give real-world performance of 225 sols/s.


so the 225  would be max for the 4gb and the 8gb cards?

Yes.  I'm pretty sure with 3.5GB for the table data that the remaining 0.5GB on a 4GB card would be enough for the row counters and any other small data structures required.
sr. member
Activity: 588
Merit: 251
November 14, 2016, 08:23:59 PM
#17
Do you think, you can start developing miners for Nvidia?
There is a lack of skilled and honest developers.

Kind of hard when I don't have any Nvidia cards.  Even if I did, I'd stick to OpenCL, as I have no interest in learning CUDA.
legendary
Activity: 4354
Merit: 9201
'The right to privacy matters'
November 14, 2016, 08:15:30 PM
#16
On a more serious note: if you state the performance is now at 80% of theoretical maximum, we're basically there, right? ETH miners also peak at about 80-85% of the theoretical maximum. Does the same rule apply here?

I still have some work to do before I write my own miner from scratch.  I like to *really* understand the problem before I start writing code, and there's still some parts of the GCN architecture that I'm figuring out.

Eth miners max out at around 93% of the theoretical maximum.  24Mh/s is the theoretical max for a R9 380 with 6Gbps memory, and I've been able to get 22.3Mh out of a couple cards.  You'll never reach 100% due to the fact that refresh consumes some of the bandwdith, perhaps as much as 5%.

p.s. I also have another idea that should work on 4GB cards.  The miner could use 12-slot bins of 32 bytes, just like silentarmy, but use a new table every round instead of using 2 tables in a double-buffered fashion.  This would use 384MB * 9 =~ 3.5GB, but then your first write to any row could write 32-bytes of dummy data along with the 32-byte collision record.  This would avoid the read-before-write. You could do this with the 2nd through 6th write by filling the even slots before the odd ones.  This would reduce the average IO per round to 2^20 * 3 * 64-bytes, or 192MB per round and 1.728GB per iteration.  That would be a theoretical max of 130 iterations per second on a Rx 470 with a 7Gbps memory clock, which would be around 240 solutions per second.  Using 93% of the theoretical limit taken from eth mining, that would give real-world performance of 225 sols/s.


so the 225  would be max for the 4gb and the 8gb cards?
legendary
Activity: 1108
Merit: 1005
November 14, 2016, 06:23:42 PM
#15
Do you think, you can start developing miners for Nvidia?
There is a lack of skilled and honest developers.
hero member
Activity: 751
Merit: 517
Fail to plan, and you plan to fail.
November 14, 2016, 06:06:00 PM
#14
Fascinating stuff really. Thank you for trying to explain to us laymen how this stuff works. I'm not much of a programmer myself, but Ive always wanted to try and understand better how miners work, what kind of data is processed and how etc. Ill be following this thread closely Smiley
Also I have huge respect for people like you, genoil, mrvb etc who work hard on these complex problems and still release stuff for free.
sr. member
Activity: 588
Merit: 251
November 14, 2016, 03:30:40 PM
#13
On a more serious note: if you state the performance is now at 80% of theoretical maximum, we're basically there, right? ETH miners also peak at about 80-85% of the theoretical maximum. Does the same rule apply here?

I still have some work to do before I write my own miner from scratch.  I like to *really* understand the problem before I start writing code, and there's still some parts of the GCN architecture that I'm figuring out.

Eth miners max out at around 93% of the theoretical maximum.  24Mh/s is the theoretical max for a R9 380 with 6Gbps memory, and I've been able to get 22.3Mh out of a couple cards.  You'll never reach 100% due to the fact that refresh consumes some of the bandwdith, perhaps as much as 5%.

p.s. I also have another idea that should work on 4GB cards.  The miner could use 12-slot bins of 32 bytes, just like silentarmy, but use a new table every round instead of using 2 tables in a double-buffered fashion.  This would use 384MB * 9 =~ 3.5GB, but then your first write to any row could write 32-bytes of dummy data along with the 32-byte collision record.  This would avoid the read-before-write. You could do this with the 2nd through 6th write by filling the even slots before the odd ones.  This would reduce the average IO per round to 2^20 * 3 * 64-bytes, or 192MB per round and 1.728GB per iteration.  That would be a theoretical max of 130 iterations per second on a Rx 470 with a 7Gbps memory clock, which would be around 240 solutions per second.  Using 93% of the theoretical limit taken from eth mining, that would give real-world performance of 225 sols/s.
sr. member
Activity: 588
Merit: 251
November 14, 2016, 03:21:30 PM
#12
What role does the memory bus width play into regarding the speeds? Because many of these old 7950s are getting almost the same speeds as the 470/390.

That makes sense, since a 384-bit wide memory bus at 1.5Ghz (6Gbps) has a bit more bandwidth than a 256-bit wide bus at 8Gbps.
sr. member
Activity: 438
Merit: 250
November 14, 2016, 01:59:23 PM
#11
Dude where is your own miner Grin.

Next coin I expect you to be one of the top dogs in the pit  Kiss

On a more serious note: if you state the performance is now at 80% of theoretical maximum, we're basically there, right? ETH miners also peak at about 80-85% of the theoretical maximum. Does the same rule apply here?
newbie
Activity: 58
Merit: 0
November 14, 2016, 12:27:48 PM
#10
What role does the memory bus width play into regarding the speeds? Because many of these old 7950s are getting almost the same speeds as the 470/390.

I just bought an older 3gb version of 7950 to see how they perform with your optimized memory straps.
Also I have the 470 4gb Nitro wich makes 110-120 sols

I was wondering about the memory bus as well.
legendary
Activity: 3808
Merit: 1723
November 14, 2016, 12:22:14 PM
#9
What role does the memory bus width play into regarding the speeds? Because many of these old 7950s are getting almost the same speeds as the 470/390.

sr. member
Activity: 588
Merit: 251
November 14, 2016, 12:17:32 PM
#8
Does it mean the R9 390 which has 512 bit memory bus and 1500 Mhz, should be faster than the 470?

An optimal implementation should be faster.
newbie
Activity: 18
Merit: 0
November 14, 2016, 12:05:02 PM
#7
Does it mean the R9 390 which has 512 bit memory bus and 1500 Mhz, should be faster than the 470?
sr. member
Activity: 588
Merit: 251
November 14, 2016, 08:55:19 AM
#6
...

Therefore a reasonably efficient equihash implementation will do 5 * 64 * 1 million bytes (320MB) of IO per round.  With 9 rounds that means 2.88GB per itteration, or 77.8 itterations per second on a Rx 470 with RAM clocked at 7Gbps (224GB/s memory bandwidth).  At 1.88 solutions per iteration, that's an average of 146 solutions/second, or about 25% faster than Claymore v5.

The theoretical equihash performance limit on a Rx 470 is likely about 25% faster than 146 solutions, but it involves using 64-byte data structures that requires a lot more memory.  So much memory that I think it will not be possible with 4GB cards.  At least it will be something for owners of 8GB Rx 480 cards to be happy about.

A few noob questions if you don't mind.
What's the theoretical limit on the RX 470 8G Nitro cards with RAM clocked at 8Gbps (256GB/s)? Also, does overclocking the memory result in a linear increase in performance?
Does this all mean that equihash solving isn't GPU compute limited, but rather memory limited? If so, I wonder why GPU-Z shows 100% GPU load vs sub-40% memory controller load (whereas mining Eth fully loads both core and mem controller...)

Fascinating stuff. Thanks in advance.

A Rx 470 at 8Gbps would have a theoretical limit 8/7 times faster than one at 7Gbps.
The only part of equihash that is compute limited is the blake2b initialization.  The intention of the authors was for the algorithm to be limited by memory bandwidth.
https://www.internetsociety.org/sites/default/files/blogs-media/equihash-asymmetric-proof-of-work-based-generalized-birthday-problem.pdf

As for what GPU-z shows, you'll have to figure out how to correctly interpret what it reports on your own.  I do my OpenCL development on Linux, and even if there was a Linux version, I don't consider GPU-z a useful tool for kernel developers.
full member
Activity: 157
Merit: 100
November 14, 2016, 12:21:51 AM
#5
...

Therefore a reasonably efficient equihash implementation will do 5 * 64 * 1 million bytes (320MB) of IO per round.  With 9 rounds that means 2.88GB per itteration, or 77.8 itterations per second on a Rx 470 with RAM clocked at 7Gbps (224GB/s memory bandwidth).  At 1.88 solutions per iteration, that's an average of 146 solutions/second, or about 25% faster than Claymore v5.

The theoretical equihash performance limit on a Rx 470 is likely about 25% faster than 146 solutions, but it involves using 64-byte data structures that requires a lot more memory.  So much memory that I think it will not be possible with 4GB cards.  At least it will be something for owners of 8GB Rx 480 cards to be happy about.

A few noob questions if you don't mind.
What's the theoretical limit on the RX 470 8G Nitro cards with RAM clocked at 8Gbps (256GB/s)? Also, does overclocking the memory result in a linear increase in performance?
Does this all mean that equihash solving isn't GPU compute limited, but rather memory limited? If so, I wonder why GPU-Z shows 100% GPU load vs sub-40% memory controller load (whereas mining Eth fully loads both core and mem controller...)

Fascinating stuff. Thanks in advance.
sr. member
Activity: 588
Merit: 251
November 13, 2016, 10:22:34 PM
#4
"it involves using 64-byte data structures"

how much changes/coding transition to 64-byte data structures require?

Someone like eXtremal could probably do it in a week, re-using parts of silentarmy.  It would take me 2-3 times longer.  I can write top-quality code, but I don't pump it out as fast as some other coders.
hero member
Activity: 1008
Merit: 1000
November 13, 2016, 08:59:35 PM
#3
Interesting to see that after 2 weeks we are fairly close to the limits.
Pages:
Jump to: