Theoretical minimum # of logic operations to perform double iterated SHA256?

jvanname

member

Activity: 691

Merit: 51

Here is completely reversible code in the completely reversible programming language Janus that solves a mining problem. The goal of the following POW problem is to find an input i where w=12321 after we perform the assignments w=i; x=((i+214)^(i+142211))+(w&1231); w-=(x&13321).

procedure proofofwork(int i,int w,int x)
w+=i
x^=((i+214)^(i+142211))+(w&1231)
w-=(x&13321)

procedure main()
int i
int w
int x

call proofofwork(i,w,x)
from i=0 do
uncall proofofwork(i,w,x)
i+=1
call proofofwork(i,w,x)
until (w=12321)
uncall proofofwork(i,w,x)

The output of this program for solving a POW problem (and this output includes all possible garbage information) is

i = 20513
w = 0
x = 0

and i=20513 is a solution to this POW problem.

You can test this code here at http://topps.diku.dk/pirc/?id=janusP

That being said, if we want to make cryptocurrency mining the very first free market application of reversible computing hardware, we need to use a much better algorithm than SHA-256. SHA-256 is somewhat reversibility friendly but only by accident (I have designed my own mining algorithm specifically to motivate the construction of reversible computers, but people still use SHA-256 or some other poor mining algorithm for some reason. They don't know any better.).

smolen

hero member

Activity: 524

Merit: 500

Quote from: Cryddit on April 18, 2015, 11:04:13 PM

SHA256D is an interesting choice, actually; usually you don't see it except in a context where someone is worried about an extension attack - which doesn't really apply to the way Bitcoin uses it.

Bitcoin mining would be vulnerable to "constant elimination attack"

were plain SHA256 used instead of double one.

EDIT:
Geremia, take a look at Bitfury trick
This line saves some memory at the expense of extra computation:

Code:

ds <= er - cr;

Geremia

sr. member

Activity: 507

Merit: 253

Quote from: btchris on April 19, 2015, 06:45:24 PM

that's a great diagram

These blog posts were very helpful for me:
http://www.righto.com/2014/09/mining-bitcoin-with-pencil-and-paper.html
http://www.righto.com/2014/02/bitcoin-mining-hard-way-algorithms.html

Very impressive that he shows SHA-256 with pen and pencil!

Geremia

sr. member

Activity: 507

Merit: 253

Quote from: 2112 on April 21, 2015, 11:01:01 PM

It is hard to guess what was the original intention of the question: mathematical/algebraic minimum or some sort of minimum for a defined implementation technology.

either

Quote from: 2112 on April 21, 2015, 11:01:01 PM

The typical goal for a implementation using silicon CMOS process would be timing closure, i.e. minimizing the time to compute the result. This is the reason why everyone is using carry-look-ahead adders that add 32 bits in parallel in a single clock cycle.

The other approach would be to use serial adders that add 32 bits in 32 clock cycles. Such an adder is just a pair of XOR gates. The problem with this approach is that the resultant clock rate is too high for the practical silicon-based semiconductor manufacturing processes. But if somebody considers e.g. GaAs manufacturing then the very deeply pipelined serial implementation is a viable alternative.

interesting
thanks

2112

legendary

Activity: 2128

Merit: 1074

Quote from: Sergio_Demian_Lerner on April 19, 2015, 11:03:59 PM

Cryddit gave an estimation on the number of standard gate building blocks required for a Bitcoin ASIC (adders, logic gates)
However, adders require more space than OR gates, so generally the number of gates will be dominated by the number of adders. Also adders can be implemented in several ways, with different delay/space trade-offs, so even if there could be a theoretical minimum number of gates, practically all implementations would use much more to reduce the delay.

More interesting, you can:

- Compute SHA^2 approximately, and get a better practically good SHA^2 ASIC for mining.
See https://bitslog.wordpress.com/2015/02/17/faster-sha-256-asics-using-carry-reduced-adders.

- Compute SHA^2 asynchronously (e.g. using asynchronous adders)

Last, it has not been proven that performing a complete SHA^2 evaluation is required on average to check that a changing header has a SHA^2 hash that is below the target value. In fact, several widely known optimizations have disprove it.

It is hard to guess what was the original intention of the question: mathematical/algebraic minimum or some sort of minimum for a defined implementation technology.

The typical goal for a implementation using silicon CMOS process would be timing closure, i.e. minimizing the time to compute the result. This is the reason why everyone is using carry-look-ahead adders that add 32 bits in parallel in a single clock cycle.

The other approach would be to use serial adders that add 32 bits in 32 clock cycles. Such an adder is just a pair of XOR gates. The problem with this approach is that the resultant clock rate is too high for the practical silicon-based semiconductor manufacturing processes. But if somebody considers e.g. GaAs manufacturing then the very deeply pipelined serial implementation is a viable alternative.

DeathAndTaxes

donator

Activity: 1218

Merit: 1080

Gerald Davis

Quote from: Cryddit on April 20, 2015, 06:27:44 PM

As I understand it, (and I could be wrong here) what is actually absolutely required to spend energy for, is the output. IOW, you could at least in theory design a system that answers the one-bit question, "is there a nonce meeting the difficulty target within " by actually spending the energy to write exactly one bit. Everything else can be reversible, so the greater the amount of computation you can do without any external effects required the more of it can be done "free" (albeit at ridiculously high complexity) but no matter what, you have to write the output.

In theory yes but even proponents of reversible computing don't believe leakage will be that low. There is the energy cost required to perfectly isolate the circuit from the outside environment so even if your raw circuit was perfectly reversible the total system energy cost will be much higher.

Also theory is just theory. In theory it is possible for someone to make a miner with 5,000,000 G/J (instead of 1 G/J) using plain boring classical computing. Granted you aren't going to do it with 20nm silicon but in theory it can be done. Now 5,000,0000 PH/s well we know that is not possible without a massive reduction in the "work" needed to complete a single hash. "In theory" is a nice way of saying nobody has proved it impossible.

DeathAndTaxes

donator

Activity: 1218

Merit: 1080

Gerald Davis

Quote from: solex on April 20, 2015, 02:16:33 PM

Indeed. It seems obvious that reversible computing is the future for Bitcoin mining ASICs, and could be the first major commercial application of the technology. I have been looking forward to seeing informed comment on it.

Maybe in 50-80 years. Nobody has successfully implemented a 32 bit adder using reversible computing. If you think quantum computing is in its infancy well reversible computing hasn't even been born yet. Despite the theory being published in 1973 to date there has been pretty much no practical demonstration of implementing the most trivial of reversible circuits.

Part of the problem is that the circuit must be very insulated from the outside environment in order to remain reversible. To date this means a lot of very expensive near zero superconductors but even esoteric problems like a stray cosmic ray striking your circuit can leak to large scale leakage. To say the future of mining obviously involves reversible computing is sort of like suggesting Honda should stop researching hybrids and start researching hyperdrives because obviously on a long enough timeline some form of faster than light travel is an obvious requirement for a spacefaring civilization.

There is a lot of potential improvement in classical computing. We something like a million times less efficient than the thermodynamic limit.

Also as Peter points out there is still an energy time tradeoff. If I gave you today a reversible miner which ran on near zero energy but had a hardware cost $10,000 of per GH would you be interested? If the technology is more expensive on an amortized per hash total lifecycle cost than classical computing well it doesn't really matter how little energy it uses.

Cryddit

legendary

Activity: 924

Merit: 1132

As I understand it, (and I could be wrong here) what is actually absolutely required to spend energy for, is the output. IOW, you could at least in theory design a system that answers the one-bit question, "is there a nonce meeting the difficulty target within " by actually spending the energy to write exactly one bit. Everything else can be reversible, so the greater the amount of computation you can do without any external effects required the more of it can be done "free" (albeit at ridiculously high complexity) but no matter what, you have to write the output.

solex

legendary

Activity: 1078

Merit: 1006

100 satoshis -> ISO code

Quote from: Peter R on April 20, 2015, 06:05:11 PM

Quote from: Geremia on April 20, 2015, 05:50:42 PM

Quote from: solex on April 20, 2015, 02:16:33 PM

It seems obvious that reversible computing is the future for Bitcoin mining ASICs, and could be the first major commercial application of the technology.

From the Reddit page you linked: "No. Hashing by definition is irreversible and results in loss of information and therefore increased entropy."

It's irreversible because the normal circuit for SHA256d discards information (and thus requires an energy input to move in the forward direction). For example, we hash the 640 bit blockheader to get a 256 bit digest; there's no way to work backwards with only 256 bits to reconstruct the 640 bit input (the output contains less information than the input). But I don't see why there wouldn't be another circuit that performs the hash function in a reversible way, by tracking all the information that is normally discarded. This circuit would output the hash value PLUS enough of other information that one could work backwards to reconstruct the inputs.

Yes, that's what I was thinking. The final hash is irreversible, but not if the results of each step are known. The efficiency is gained by recycling logic gate inputs as they occur. Certainly a challenging design problem though.

Peter R

legendary

Activity: 1162

Merit: 1010

Quote from: Geremia on April 20, 2015, 05:50:42 PM

Quote from: solex on April 20, 2015, 02:16:33 PM

It seems obvious that reversible computing is the future for Bitcoin mining ASICs, and could be the first major commercial application of the technology.

From the Reddit page you linked: "No. Hashing by definition is irreversible and results in loss of information and therefore increased entropy."

It's irreversible because the normal circuit for SHA256d discards information (and thus requires an energy input to move in the forward direction). For example, we hash the 640 bit blockheader to get a 256 bit digest; there's no way to work backwards with only 256 bits to reconstruct the 640 bit input (the output contains less information than the input). But I don't see why there wouldn't be another circuit that performs the hash function in a reversible way, by tracking all the information that is normally discarded. This circuit would output the hash value PLUS enough of other information that one could work backwards to reconstruct the inputs.

Geremia

sr. member

Activity: 507

Merit: 253

Quote from: solex on April 20, 2015, 02:16:33 PM

It seems obvious that reversible computing is the future for Bitcoin mining ASICs, and could be the first major commercial application of the technology.

From the Reddit page you linked: "No. Hashing by definition is irreversible and results in loss of information and therefore increased entropy."

solex

legendary

Activity: 1078

Merit: 1006

100 satoshis -> ISO code

Quote from: Meni Rosenfeld on April 19, 2015, 06:27:52 AM

You should read up on https://en.wikipedia.org/wiki/Reversible_computing. You can in theory construct your logic gates in a way that will not erase information, and make arbitrarily long computations with a bounded energy expenditure. The notion that the Landauer's principle places a lower limit on the energy cost of computation is a myth.

Indeed. It seems obvious that reversible computing is the future for Bitcoin mining ASICs, and could be the first major commercial application of the technology. I have been looking forward to seeing informed comment on it.

When I suggested this on reddit the idea got shot down.
http://www.reddit.com/r/Bitcoin/comments/22tegd/do_we_have_an_idea_of_the_power_in_watt_necessary/cgq6ydq

tl121

sr. member

Activity: 278

Merit: 254

Quote from: Peter R on April 20, 2015, 11:33:26 AM

Quote from: Meni Rosenfeld on April 19, 2015, 06:27:52 AM

You should read up on https://en.wikipedia.org/wiki/Reversible_computing. You can in theory construct your logic gates in a way that will not erase information, and make arbitrarily long computations with a bounded energy expenditure. The notion that the Landauer's principle places a lower limit on the energy cost of computation is a myth.

Interesting. Let's imagine that we made a circuit to test nonces for bitcoin mining using reversible gates.

Code:

        ______________
      -|            |-
      -|            |-
 i    -|            |-   o
 n    -|            |-   u
 p    -|  circuit   |-   t
 u    -|            |-   p
 t    -|            |-   u
      -|            |-   t
      -|            |-
       --------------

Assume that the input to the circuit is the blockheader which consists of 608 bits + the 32-bit nonce. So we need 640 wires (bits) coming into our circuit at the input side.

At the output, all we really care about is a yea or nay on whether the nonce satisfies the difficulty target, which could be represented by a single bit. But that's not reversible because from one bit we can't go backwards and reproduce the 640-bit input. To make the computation reversible, our circuit must also have at least 640 bits coming out of it.

To use the circuit in practice, we apply our first nonce to the input, wait for the output to settle, and determine if the difficulty target was satisfied. The answer is probably NO...

So, now comes the irreversible part. We need test another nonce, which means we must flip at least one bit in our input (the input includes the nonce). So, as per Landauer's principle, this costs us energy¹

E > kT ln 2.

We have to do this many, many times until we find a nonce that satisfies the difficult target, expending energy at each step.

Another interesting property of our reversible circuit above is that it only performs the computation for infinitesimal energy input if we're willing to wait an infinitely long time to observe the output. In general, even for reversible gates, there is an energy-time tradeoff² for performing a specific computation:

(energy consumed)*(computation time) > some constant

A successful Bitcoin miner is concerned with more than finding the correct nonce with the least energy expenditure. He also wants to find it quickly! And this will always cost him energy, regardless of whether he use reversible or irreversible gates.

¹If this causes M bits to get flipped at the output, does this then mean the circuit required an energy input E > M kT ln 2 to test the next nonce? I'm not sure...

²Discovered by Charles Bennet (working at IBM with Rolf Landauer), refer to Feynman Lectures on Computation, Chapter 5.

There is no need to output lots of nonce values for unsuccessful searches. For example, one could build a reversible engine that would fully search a defined range and output a single bit that indicates whether the range contains a solution. This information could be used in a binary search to zero in on an exact solution. Alternatively, the identity of promising ranges could be passed on to a conventional hash engine which would search only guaranteed successful ranges.

Engineering details await detailed cost/performance specs for the "unobtanium" reversible devices.

Peter R

legendary

Activity: 1162

Merit: 1010

Quote from: Meni Rosenfeld on April 19, 2015, 06:27:52 AM

You should read up on https://en.wikipedia.org/wiki/Reversible_computing. You can in theory construct your logic gates in a way that will not erase information, and make arbitrarily long computations with a bounded energy expenditure. The notion that the Landauer's principle places a lower limit on the energy cost of computation is a myth.

Interesting. Let's imagine that we made a circuit to test nonces for bitcoin mining using reversible gates.

Code:

        ______________
      -|            |-
      -|            |-
 i    -|            |-   o
 n    -|            |-   u
 p    -|  circuit   |-   t
 u    -|            |-   p
 t    -|            |-   u
      -|            |-   t
      -|            |-
       --------------

Assume that the input to the circuit is the blockheader which consists of 608 bits + the 32-bit nonce. So we need 640 wires (bits) coming into our circuit at the input side.

At the output, all we really care about is a yea or nay on whether the nonce satisfies the difficulty target, which could be represented by a single bit. But that's not reversible because from one bit we can't go backwards and reproduce the 640-bit input. To make the computation reversible, our circuit must also have at least 640 bits coming out of it.

To use the circuit in practice, we apply our first nonce to the input, wait for the output to settle, and determine if the difficulty target was satisfied. The answer is probably NO...

So, now comes the irreversible part. We need test another nonce, which means we must flip at least one bit in our input (the input includes the nonce). So, as per Landauer's principle, this costs us energy¹

E > kT ln 2.

We have to do this many, many times until we find a nonce that satisfies the difficult target, expending energy at each step.

Another interesting property of our reversible circuit above is that it only performs the computation for infinitesimal energy input if we're willing to wait an infinitely long time to observe the output. In general, even for reversible gates, there is an energy-time tradeoff² for performing a specific computation:

(energy consumed)*(computation time) > some constant

A successful Bitcoin miner is concerned with more than finding the correct nonce with the least energy expenditure. He also wants to find it quickly! And this will always cost him energy, regardless of whether he use reversible or irreversible gates.

¹If this causes M bits to get flipped at the output, does this then mean the circuit required an energy input E > M kT ln 2 to test the next nonce? I'm not sure...

²Discovered by Charles Bennet (working at IBM with Rolf Landauer), refer to Feynman Lectures on Computation, Chapter 5.

Geremia

sr. member

Activity: 507

Merit: 253

Quote from: Meni Rosenfeld on April 19, 2015, 06:27:52 AM

The notion that the Landauer's principle places a lower limit on the energy cost of computation is a myth.

It has been experimentally verified:

Bérut, Antoine, Artak Arakelyan, Artyom Petrosyan, Sergio Ciliberto, Raoul Dillenschneider, and Eric Lutz. “Experimental Verification of Landauer/’s Principle Linking Information and Thermodynamics.” Nature 483, no. 7388 (March 8, 2012): 187–89. doi:10.1038/nature10872. (non-paywalled version)

Sergio_Demian_Lerner

hero member

Activity: 555

Merit: 654

Quote from: Geremia on April 18, 2015, 05:55:55 PM

What is the theoretical minimum number of logical operations an ASIC needs to perform to compute double iterated SHA256, i.e., sha(sha(•))?

(cf. the Bitcoin StackExchange question)

Cryddit gave an estimation on the number of standard gate building blocks required for a Bitcoin ASIC (adders, logic gates)
However, adders require more space than OR gates, so generally the number of gates will be dominated by the number of adders. Also adders can be implemented in several ways, with different delay/space trade-offs, so even if there could be a theoretical minimum number of gates, practically all implementations would use much more to reduce the delay.

More interesting, you can:

- Compute SHA^2 approximately, and get a better practically good SHA^2 ASIC for mining.
See https://bitslog.wordpress.com/2015/02/17/faster-sha-256-asics-using-carry-reduced-adders.

- Compute SHA^2 asynchronously (e.g. using asynchronous adders)

Last, it has not been proven that performing a complete SHA^2 evaluation is required on average to check that a changing header has a SHA^2 hash that is below the target value. In fact, several widely known optimizations have disprove it.

Cryddit

legendary

Activity: 924

Merit: 1132

Quote from: btchris on April 19, 2015, 06:45:24 PM

Please excuse my ignorance, but does the plus to the right of Ch have three inputs, and is this considered a single operation on an ASIC? I think that's the 6 vs 7 additions per round discrepancy.

Yes, I think that's probably it. And yes, you can construct a circuit to do an addition of three values in one step on an ASIC. And no, this is not you being ignorant, this is fairly obscure stuff. ASICs follow their own weird set of rules and it's not quite the same ones that software follows.

Quote from: btchris on April 19, 2015, 06:45:24 PM

Also, should the message expansion step be included (creating W_t, the 48*3 from above)?
Should the final hash value creation be included (the 8 from above)?

I believe that the message expansion step can be a near-NOP like bitshifting on an ASIC; you are right about the final hash value though, so I was off by eight.

altcoinex

sr. member

Activity: 293

Merit: 272

Director - www.cubeform.io

Wanted to throw in my tip of the hat to Cryddit as well for such a well detailed response.

btchris

hero member

Activity: 672

Merit: 504

a.k.a. gurnec on GitHub

Quote from: Cryddit on April 19, 2015, 01:21:04 PM

Quote from: btchris on April 19, 2015, 10:14:43 AM

48*3 + 64*7 + 8 = 600 additions for a single SHA-256 block

Here's one round of SHA256, expressed as a pseudo-circuit diagram:

Thanks, that's a great diagram.

Please excuse my ignorance, but does the plus to the right of Ch have three inputs, and is this considered a single operation on an ASIC? I think that's the 6 vs 7 additions per round discrepancy.

Also, should the message expansion step be included (creating W_t, the 48*3 from above)?
Should the final hash value creation be included (the 8 from above)?

Cryddit

legendary

Activity: 924

Merit: 1132

Quote from: btchris on April 19, 2015, 10:14:43 AM

Quote from: Cryddit on April 18, 2015, 11:04:13 PM

SHA256 is sixty-four rounds comprising

384 32-bit additions (6 per round)
320 32-bit ORs (5 per round)
448 32-bit XORs (7 per round)

And a bunch of bit shifts, but bit shifts are free on an ASIC.

Are you sure of those numbers? For example in a naive implementation, I'm counting 48*3 + 64*7 + 8 = 600 additions for a single SHA-256 block. There could be better ways of doing SHA-256 that don't naively follow the standard though... I wouldn't know....

Here's one round of SHA256, expressed as a pseudo-circuit diagram:

where the red pluses indicate 32-bit addition and the dark-blue boxes are defined as

,

(and of course the plus inside the circle is xor).

It looks like the implementation you linked to is using extra additions to feed into Ch and Ma - which is an artifact of not being able to directly split the circuit traces that come off the EFG and ABC registers respectively. It, or something very like it, is probably the best you can do in software.

Topic: Theoretical minimum # of logic operations to perform double iterated SHA256? (Read 3951 times)