limits of ZEC mining - page 3. | Bitcointalksearch.org

tromp

legendary

Activity: 1000

Merit: 1120

Quote from: nerdralph on November 14, 2016, 03:30:40 PM

Quote from: Genoil on November 14, 2016, 01:59:23 PM

On a more serious note: if you state the performance is now at 80% of theoretical maximum, we're basically there, right? ETH miners also peak at about 80-85% of the theoretical maximum. Does the same rule apply here?

I still have some work to do before I write my own miner from scratch. I like to *really* understand the problem before I start writing code, and there's still some parts of the GCN architecture that I'm figuring out.

Hi, nerdralph! Would you be interested in going after my Cuckoo Cycle bounties?

https://github.com/tromp/cuckoo

QuintLeo

legendary

Activity: 1498

Merit: 1030

And the power usage on ETH for the 1070 vs the RX 480 is also very similar - pretty much a dead heat on a hash/watt basis.

Unfortunately for ETH or ZEC miners the 1070 is almost twice the cost of the RX480 while not offering comparable hash/$.

krnlx

full member

Activity: 243

Merit: 105

Quote from: nerdralph on November 14, 2016, 08:23:59 PM

Quote from: mirny on November 14, 2016, 06:23:42 PM

Do you think, you can start developing miners for Nvidia?
There is a lack of skilled and honest developers.

Kind of hard when I don't have any Nvidia cards. Even if I did, I'd stick to OpenCL, as I have no interest in learning CUDA.

No difference in speed between cuda and openCL implementations of silentarmy. Cuda can take advantage in computation algo, where its inline assembly can be used(LOP3 and other). In memory hard algo it does not matter use cuda or opencl.

I'm getting on hard overclocked 1070(samsung memory) ~590 s/s from 6 cards, it is near 97-98 from card. eXtremal got 90 s/s from rx480 bios tuned.
I don't have 480 , but have 470, in etherium they get 27 M/H, while overcloked 1070 with samsung memory - 31-32 MH. That is about 15-18% more, then amd. So 92-98 s/s on 1070 vs ~80-85 on 470 is proportional etherium hashrate difference.

Amph

legendary

Activity: 3248

Merit: 1072

Quote from: nerdralph on November 14, 2016, 08:23:59 PM

Quote from: mirny on November 14, 2016, 06:23:42 PM

Do you think, you can start developing miners for Nvidia?
There is a lack of skilled and honest developers.

Kind of hard when I don't have any Nvidia cards. Even if I did, I'd stick to OpenCL, as I have no interest in learning CUDA.

even if the potential might be higher? or maybe you know already that this will be never the case?

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: philipma1957 on November 14, 2016, 08:15:30 PM

Quote from: nerdralph on November 14, 2016, 03:30:40 PM

p.s. I also have another idea that should work on 4GB cards. The miner could use 12-slot bins of 32 bytes, just like silentarmy, but use a new table every round instead of using 2 tables in a double-buffered fashion. This would use 384MB * 9 =~ 3.5GB, but then your first write to any row could write 32-bytes of dummy data along with the 32-byte collision record. This would avoid the read-before-write. You could do this with the 2nd through 6th write by filling the even slots before the odd ones. This would reduce the average IO per round to 2^20 * 3 * 64-bytes, or 192MB per round and 1.728GB per iteration. That would be a theoretical max of 130 iterations per second on a Rx 470 with a 7Gbps memory clock, which would be around 240 solutions per second. Using 93% of the theoretical limit taken from eth mining, that would give real-world performance of 225 sols/s.

so the 225 would be max for the 4gb and the 8gb cards?

Yes. I'm pretty sure with 3.5GB for the table data that the remaining 0.5GB on a 4GB card would be enough for the row counters and any other small data structures required.

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: mirny on November 14, 2016, 06:23:42 PM

Do you think, you can start developing miners for Nvidia?
There is a lack of skilled and honest developers.

Kind of hard when I don't have any Nvidia cards. Even if I did, I'd stick to OpenCL, as I have no interest in learning CUDA.

philipma1957

legendary

Activity: 4354

Merit: 9201

'The right to privacy matters'

Quote from: nerdralph on November 14, 2016, 03:30:40 PM

Quote from: Genoil on November 14, 2016, 01:59:23 PM

On a more serious note: if you state the performance is now at 80% of theoretical maximum, we're basically there, right? ETH miners also peak at about 80-85% of the theoretical maximum. Does the same rule apply here?

I still have some work to do before I write my own miner from scratch. I like to *really* understand the problem before I start writing code, and there's still some parts of the GCN architecture that I'm figuring out.

Eth miners max out at around 93% of the theoretical maximum. 24Mh/s is the theoretical max for a R9 380 with 6Gbps memory, and I've been able to get 22.3Mh out of a couple cards. You'll never reach 100% due to the fact that refresh consumes some of the bandwdith, perhaps as much as 5%.

p.s. I also have another idea that should work on 4GB cards. The miner could use 12-slot bins of 32 bytes, just like silentarmy, but use a new table every round instead of using 2 tables in a double-buffered fashion. This would use 384MB * 9 =~ 3.5GB, but then your first write to any row could write 32-bytes of dummy data along with the 32-byte collision record. This would avoid the read-before-write. You could do this with the 2nd through 6th write by filling the even slots before the odd ones. This would reduce the average IO per round to 2^20 * 3 * 64-bytes, or 192MB per round and 1.728GB per iteration. That would be a theoretical max of 130 iterations per second on a Rx 470 with a 7Gbps memory clock, which would be around 240 solutions per second. Using 93% of the theoretical limit taken from eth mining, that would give real-world performance of 225 sols/s.

so the 225 would be max for the 4gb and the 8gb cards?

mirny

legendary

Activity: 1108

Merit: 1005

Do you think, you can start developing miners for Nvidia?
There is a lack of skilled and honest developers.

deadsix

hero member

Activity: 751

Merit: 517

Fail to plan, and you plan to fail.

Fascinating stuff really. Thank you for trying to explain to us laymen how this stuff works. I'm not much of a programmer myself, but Ive always wanted to try and understand better how miners work, what kind of data is processed and how etc. Ill be following this thread closely

Also I have huge respect for people like you, genoil, mrvb etc who work hard on these complex problems and still release stuff for free.

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: Genoil on November 14, 2016, 01:59:23 PM

On a more serious note: if you state the performance is now at 80% of theoretical maximum, we're basically there, right? ETH miners also peak at about 80-85% of the theoretical maximum. Does the same rule apply here?

I still have some work to do before I write my own miner from scratch. I like to *really* understand the problem before I start writing code, and there's still some parts of the GCN architecture that I'm figuring out.

Eth miners max out at around 93% of the theoretical maximum. 24Mh/s is the theoretical max for a R9 380 with 6Gbps memory, and I've been able to get 22.3Mh out of a couple cards. You'll never reach 100% due to the fact that refresh consumes some of the bandwdith, perhaps as much as 5%.

p.s. I also have another idea that should work on 4GB cards. The miner could use 12-slot bins of 32 bytes, just like silentarmy, but use a new table every round instead of using 2 tables in a double-buffered fashion. This would use 384MB * 9 =~ 3.5GB, but then your first write to any row could write 32-bytes of dummy data along with the 32-byte collision record. This would avoid the read-before-write. You could do this with the 2nd through 6th write by filling the even slots before the odd ones. This would reduce the average IO per round to 2^20 * 3 * 64-bytes, or 192MB per round and 1.728GB per iteration. That would be a theoretical max of 130 iterations per second on a Rx 470 with a 7Gbps memory clock, which would be around 240 solutions per second. Using 93% of the theoretical limit taken from eth mining, that would give real-world performance of 225 sols/s.

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: adaseb on November 14, 2016, 12:22:14 PM

What role does the memory bus width play into regarding the speeds? Because many of these old 7950s are getting almost the same speeds as the 470/390.

That makes sense, since a 384-bit wide memory bus at 1.5Ghz (6Gbps) has a bit more bandwidth than a 256-bit wide bus at 8Gbps.

Genoil

sr. member

Activity: 438

Merit: 250

Dude where is your own miner Grin

.

Next coin I expect you to be one of the top dogs in the pit Kiss

On a more serious note: if you state the performance is now at 80% of theoretical maximum, we're basically there, right? ETH miners also peak at about 80-85% of the theoretical maximum. Does the same rule apply here?

CoinPro69

newbie

Activity: 58

Merit: 0

Quote from: adaseb on November 14, 2016, 12:22:14 PM

What role does the memory bus width play into regarding the speeds? Because many of these old 7950s are getting almost the same speeds as the 470/390.

I just bought an older 3gb version of 7950 to see how they perform with your optimized memory straps.
Also I have the 470 4gb Nitro wich makes 110-120 sols

I was wondering about the memory bus as well.

adaseb

legendary

Activity: 3808

Merit: 1723

What role does the memory bus width play into regarding the speeds? Because many of these old 7950s are getting almost the same speeds as the 470/390.

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: Katadin on November 14, 2016, 12:05:02 PM

Does it mean the R9 390 which has 512 bit memory bus and 1500 Mhz, should be faster than the 470?

An optimal implementation should be faster.

Katadin

newbie

Activity: 18

Merit: 0

Does it mean the R9 390 which has 512 bit memory bus and 1500 Mhz, should be faster than the 470?

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: TheRider on November 14, 2016, 12:21:51 AM

Quote from: nerdralph on November 13, 2016, 08:28:53 PM

...

Therefore a reasonably efficient equihash implementation will do 5 * 64 * 1 million bytes (320MB) of IO per round. With 9 rounds that means 2.88GB per itteration, or 77.8 itterations per second on a Rx 470 with RAM clocked at 7Gbps (224GB/s memory bandwidth). At 1.88 solutions per iteration, that's an average of 146 solutions/second, or about 25% faster than Claymore v5.

The theoretical equihash performance limit on a Rx 470 is likely about 25% faster than 146 solutions, but it involves using 64-byte data structures that requires a lot more memory. So much memory that I think it will not be possible with 4GB cards. At least it will be something for owners of 8GB Rx 480 cards to be happy about.

A few noob questions if you don't mind.
What's the theoretical limit on the RX 470 8G Nitro cards with RAM clocked at 8Gbps (256GB/s)? Also, does overclocking the memory result in a linear increase in performance?
Does this all mean that equihash solving isn't GPU compute limited, but rather memory limited? If so, I wonder why GPU-Z shows 100% GPU load vs sub-40% memory controller load (whereas mining Eth fully loads both core and mem controller...)

Fascinating stuff. Thanks in advance.

A Rx 470 at 8Gbps would have a theoretical limit 8/7 times faster than one at 7Gbps.
The only part of equihash that is compute limited is the blake2b initialization. The intention of the authors was for the algorithm to be limited by memory bandwidth.
https://www.internetsociety.org/sites/default/files/blogs-media/equihash-asymmetric-proof-of-work-based-generalized-birthday-problem.pdf

As for what GPU-z shows, you'll have to figure out how to correctly interpret what it reports on your own. I do my OpenCL development on Linux, and even if there was a Linux version, I don't consider GPU-z a useful tool for kernel developers.

TheRider

full member

Activity: 157

Merit: 100

Quote from: nerdralph on November 13, 2016, 08:28:53 PM

...

Therefore a reasonably efficient equihash implementation will do 5 * 64 * 1 million bytes (320MB) of IO per round. With 9 rounds that means 2.88GB per itteration, or 77.8 itterations per second on a Rx 470 with RAM clocked at 7Gbps (224GB/s memory bandwidth). At 1.88 solutions per iteration, that's an average of 146 solutions/second, or about 25% faster than Claymore v5.

The theoretical equihash performance limit on a Rx 470 is likely about 25% faster than 146 solutions, but it involves using 64-byte data structures that requires a lot more memory. So much memory that I think it will not be possible with 4GB cards. At least it will be something for owners of 8GB Rx 480 cards to be happy about.

A few noob questions if you don't mind.
What's the theoretical limit on the RX 470 8G Nitro cards with RAM clocked at 8Gbps (256GB/s)? Also, does overclocking the memory result in a linear increase in performance?
Does this all mean that equihash solving isn't GPU compute limited, but rather memory limited? If so, I wonder why GPU-Z shows 100% GPU load vs sub-40% memory controller load (whereas mining Eth fully loads both core and mem controller...)

Fascinating stuff. Thanks in advance.

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: Subw on November 13, 2016, 08:52:52 PM

"it involves using 64-byte data structures"

how much changes/coding transition to 64-byte data structures require?

Someone like eXtremal could probably do it in a week, re-using parts of silentarmy. It would take me 2-3 times longer. I can write top-quality code, but I don't pump it out as fast as some other coders.

Tmdz

hero member

Activity: 1008

Merit: 1000

Interesting to see that after 2 weeks we are fairly close to the limits.

Topic: limits of ZEC mining - page 3. (Read 10069 times)