Zcash Equihash PoW Released - page 2.

MaxDZ8

hero member

Activity: 672

Merit: 500

Quote from: TPTB_need_war on April 18, 2016, 02:26:37 AM

The GPU uses some coalescing to minimize latency and maximize utilization of memory bandwidth. Isn't that likely to merge accesses and thus amortize?

Excuse me we are getting in area of hardware that I am not well researched.

Coalescing patterns are variable across implementations a common point across most hardware is occupancy ratio or hardware-based, context-switch-free multi-threading of sort. Intel calls it HyperThreading®, it's guaranteed to work on two threads whereas GPUs accomodate as many threads they can fit. In my experience is considerably more consistent and easy to use than exploiting coalescing patterns.

TPTB_need_war

sr. member

Activity: 420

Merit: 262

Quote from: tromp on April 17, 2016, 08:56:29 PM

Quote from: TPTB_need_war on April 17, 2016, 01:49:58 PM

What about multiple processors sharing the same memory to amortize the memory electrical cost?

If they share the same memory bank then they likely cause more row switches and there is negligible amortization.

The GPU uses some coalescing to minimize latency and maximize utilization of memory bandwidth. Isn't that likely to merge accesses and thus amortize?

Excuse me we are getting in area of hardware that I am not well researched.

tromp

legendary

Activity: 1000

Merit: 1120

Quote from: TPTB_need_war on April 17, 2016, 01:49:58 PM

Quote from: tromp on April 17, 2016, 09:47:39 AM

Cuckoo spends only 1/3 of its time doing hash computations, which take little power compared to DRAM switching rows.
So the possible gain by ASIC hashing is less than 33%.

Sorry not trying to make more work for you, but would you happen to have a reference for that claim? I am curious how you measured that? I remember you told smooth and I that you don't have a Kill-A-Watt meter to measure GPU power consumption.

I didn't measure that. I estimated that doing 90 64-bit operations (one siphash call) is cheaper than operating the (typically 16k)
sense amplifiers to read a new bitline. But I could be wrong about that...
I'm assuming a Cuckoo size much larger than 16k of course.

Quote

What about multiple processors sharing the same memory to amortize the memory electrical cost?

If they share the same memory bank then they likely cause more row switches and there is negligible amortization.

Quote

I thought Wagner's algorithm sorted. I'll have to reread the paper more carefully.

That puzzled me too. I don't see a need for sorting, just for binning to find collisions on the next chunk of n/(k+1) bits...,
and I will only believe sorting is more performant when I see it (i.e. when optimised code for that is made public).

Quote

If lookup is in the static RAM cache, then ratio of computation to memory power consumption should increase.

Their stated goal is a 1GB memory footprint, so we can rule out fitting in cache.

coins101

legendary

Activity: 1456

Merit: 1000

I'm interested to give this a test drive and get this working on some rigs.

TPTB_need_war

sr. member

Activity: 420

Merit: 262

Quote from: tromp on April 17, 2016, 09:47:39 AM

Quote from: TPTB_need_war on April 17, 2016, 03:20:02 AM

1. You didn't address my electrical efficiency point, which I think is probably the most damning. I very much doubt that the memory power consumption will be even 1/10 of the computational power consumption. Thus an ASIC will be at least 10 times more power efficient, Don't I remember pointing out the same issue with your Cuckoo hash? What was your retort again?

Cuckoo spends only 1/3 of its time doing hash computations, which take little power compared to DRAM switching rows.
So the possible gain by ASIC hashing is less than 33%.

Sorry not trying to make more work for you, but would you happen to have a reference for that claim? I am curious how you measured that? I remember you told smooth and I that you don't have a Kill-A-Watt meter to measure GPU power consumption.

What about multiple processors sharing the same memory to amortize the memory electrical cost?

Quote from: tromp on April 17, 2016, 09:47:39 AM

Quote from: TPTB_need_war on April 17, 2016, 03:20:02 AM

2. I also very much doubt it will be latency bound (even after the 10X speed up of the computation), because optimized sorting doesn't have to be random access.

Again, we must wait to see the final parameters and optimized implementations. It seems likely they'll pick
n=144 and k=5, which means they need to find collisions on 16 bits at a time. That is just a lookup in a 64K entry table;
I don't expect to see any real sorting there. It's less clear how the actual items are represented, and if they are moved
themselves, or by pointer.

I thought Wagner's algorithm sorted. I'll have to reread the paper more carefully.

If lookup is in the static RAM cache, then ratio of computation to memory power consumption should increase.

Yeah we need to analyze the specifics, which is why I am thinking their white paper is myopic. They didn't present any of this analysis afaics on quick perusal. Perhaps they just assumed a memory bandwidth bound.

tromp

legendary

Activity: 1000

Merit: 1120

Quote from: TPTB_need_war on April 17, 2016, 03:20:02 AM

1. You didn't address my electrical efficiency point, which I think is probably the most damning. I very much doubt that the memory power consumption will be even 1/10 of the computational power consumption. Thus an ASIC will be at least 10 times more power efficient, Don't I remember pointing out the same issue with your Cuckoo hash? What was your retort again?

Cuckoo spends only 1/3 of its time doing hash computations, which take little power compared to DRAM switching rows.
So the possible gain by ASIC hashing is less than 33%.

Quote

2. I also very much doubt it will be latency bound (even after the 10X speed up of the computation), because optimized sorting doesn't have to be random access.

Again, we must wait to see the final parameters and optimized implementations. It seems likely they'll pick
n=144 and k=5, which means they need to find collisions on 16 bits at a time. That is just a lookup in a 64K entry table;
I don't expect to see any real sorting there. It's less clear how the actual items are represented, and if they are moved
themselves, or by pointer.

TPTB_need_war

sr. member

Activity: 420

Merit: 262

Quote from: tromp on April 16, 2016, 04:47:54 PM

Quote from: TPTB_need_war on April 16, 2016, 12:53:56 PM

Additionally I was thinking that this Equihash can be trivially sped up on an ASIC because the Equihash algorithm is not memory latency bound and thus is bound on the sorting and computation speed and/or the memory bandwidth, which can be optimized with specialized hardware.

There is no publicly available optimized code for Equihash, so we don't really know to what extent it
is bandwidth or latency bound. There may be subtle trade offs between the two. We really need to see the memory behaviour of actual running optimized code.

1. You didn't address my electrical efficiency point, which I think is probably the most damning. I very much doubt that the memory power consumption will be even 1/10 of the computational power consumption. Thus an ASIC will be at least 10 times more power efficient, Don't I remember pointing out the same issue with your Cuckoo hash? What was your retort again?

2. I also very much doubt it will be latency bound (even after the 10X speed up of the computation), because optimized sorting doesn't have to be random access.

For any memory bandwidth bound, there is this:

Quote from: Hueristic on April 16, 2016, 07:06:35 PM

WAT? Not very forward thinking.

https://en.wikipedia.org/wiki/High_Bandwidth_Memory

arielbit

legendary

Activity: 3444

Merit: 1061

Quote from: tromp on April 16, 2016, 04:47:54 PM

There is no publicly available optimized code for Equihash, so we don't really know to what extent it
is bandwidth or latency bound. There may be subtle trade offs between the two. We really need to see the memory behaviour of actual running optimized code.

Even once zcash gets around to publishing more optimized cpu and gpu code,
it won't be clear for a while whether further optimizations are possible, considering that the mining
algorithm is rather nontrivial.

I'm somewhat worried by this statement on their optimization page at https://github.com/zcash/zcash/issues/857

Quote

We don't have to do every possible optimization, but enough that it's feasible to run a miner with the right parameters.

Leaving optimizations on the table means that some miners will be able to run more efficiently than others.

calling our miner developers in the house Cool

tromp

legendary

Activity: 1000

Merit: 1120

Quote from: TPTB_need_war on April 16, 2016, 12:53:56 PM

Re: Zcash Equihash PoW Released
If you look on page 115 of Bernstein's Post Quantum Cryptography, it confirms a sqrt(N) speedup for Wagner's solution to the Generalized Birthday Problem for the hypothetical quantum computer.

You seem to have have misinterpreted the book. Page 115 says

Quote

If both lists have the same same size sqrt(N) , this means that the merging can be done in sqrt(N) operations instead of N , which is the same speed-up that can be achieved by Grover’s
algorithm. Thus, even with a quantum computer we can not expect to get attacks for FSB or CFS

That says there is *no* known quantum speedup for the Generalized Birthday Problem.

Quote

Additionally I was thinking that this Equihash can be trivially sped up on an ASIC because the Equihash algorithm is not memory latency bound and thus is bound on the sorting and computation speed and/or the memory bandwidth, which can be optimized with specialized hardware.

There is no publicly available optimized code for Equihash, so we don't really know to what extent it
is bandwidth or latency bound. There may be subtle trade offs between the two. We really need to see the memory behaviour of actual running optimized code.

Even once zcash gets around to publishing more optimized cpu and gpu code,
it won't be clear for a while whether further optimizations are possible, considering that the mining
algorithm is rather nontrivial.

I'm somewhat worried by this statement on their optimization page at https://github.com/zcash/zcash/issues/857

Quote

We don't have to do every possible optimization, but enough that it's feasible to run a miner with the right parameters.

Leaving optimizations on the table means that some miners will be able to run more efficiently than others.

TPTB_need_war

sr. member

Activity: 420

Merit: 262

Re: Zcash Equihash PoW Released

https://z.cash/blog/why-equihash.html

Well some Monero folks seem to very interested, so I decided to take a look because I suddenly remembered something from Iota's whitepaper:

http://188.138.57.93/tangle.pdf#page=20 (4.3 Resistance to quantum computations)

If you look on page 115 of Bernstein's Post Quantum Cryptography, it confirms a sqrt(N) speedup for Wagner's solution to the Generalized Birthday Problem for the hypothetical quantum computer. So if Equihash is using a large list of values to be sorted, then the speedup could be so great that a quantum computer could rewrite the entire block chain quite easily by redoing the past proof-of-work exponentially faster than it was originally done.

It appears that a memory hard algorithm such as Cryptonite would not have this problem.

I am just rushing and have not reread the Equihash white paper carefully, so I may have a mistake in my analysis.

Additionally I was thinking that this Equihash can be trivially sped up on an ASIC because the Equihash algorithm is not memory latency bound and thus is bound on the sorting and computation speed and/or the memory bandwidth, which can be optimized with specialized hardware.

Have I missed something in my haste Huh

Surely the Z.cash folks are not this myopic.

The following assumption from the Equihash white paper seems to be wrong:

Quote

For fixed memory size, memory chips with bandwidth
significantly higher than that of typical desktop memory (such
as DDR3) are rare. Assuming that a prover does not have
access to memory with bandwidth higher than certain
Bw
max
,
we can efficiently bound the time-memory (and thus the time-
area) product for such implementations.

...

To the best of our knowledge, the highest reported band-
width in commercial products does not exceed 200 GB/s
(Playstation 4 has 172 GB/s bandwidth [5]), whereas the
desktop DDR3 can have as high as 17 GB/s bandwidth [3].
Thus we conclude that the highest advantage a prover can
get from parallel hardware does not exceed the factor of 15.
This may mean that our proof-of-work is GPU- and desktop-
friendly, whereas it is also ASIC-resistant in the sense that an
ASIC implementation does not yield smaller time-area product.

I am thinking faster memory bandwidth can be obtained by reading and writing to multiple, redundant independent memory banks simultaneously that are interleaved differently to have disjoint collisions on banks.

Or even if not faster, then perhaps orders-of-magnitude more electrically efficient, since the electric consumption will be primarily in the computation and not in the memory. And computation can be radically optimized on an ASIC.

As I predicted when I wrote in the Monero thread about this the other day, it appears the white paper didn't consider electricity as the most important factor. Duh.

TPTB_need_war

sr. member

Activity: 420

Merit: 262

I can't right now. Sorry. tromp is also qualified, but he is opinionated about the value of asymmetric proof-of-work versus memory hard hash algorithms.

rdnkjdi

legendary

Activity: 1256

Merit: 1009

Quote from: bitcoin carpenter on April 15, 2016, 11:29:57 AM

Interesting..

Please elaborate.

I understand very little about PoW and I understand my capacity enough to know that learning will be less fruitful than reading from educated observers. Wolf0, tbtb_need_war, claymore, Gregory Maxwell, etc would all be able to give good insight

AngusCanine

legendary

Activity: 1414

Merit: 1001

To weird to live To rare to die

Well on page 6 part of the equation is infected with HIV and a first glance with out reading everything it seems they propose a band aid to solve the problem

bitcoin carpenter

legendary

Activity: 1582

Merit: 1001

Interesting..

Please elaborate.

rdnkjdi

legendary

Activity: 1256

Merit: 1009

https://www.internetsociety.org/sites/default/files/blogs-media/equihash-asymmetric-proof-of-work-based-generalized-birthday-problem.pdf

Topic: Zcash Equihash PoW Released - page 2. (Read 7791 times)