Author

Topic: Potentially faster method for mining on the CPU (Read 14431 times)

member
Activity: 152
Merit: 25
I wouldn't believe all the announcements in the news. There were a lot of such announcements, including from Intel, but mining equipment from Bitmain is the most widespread in this market and therefore the best. I remember only 1 case when a private company released an ASIC for Ethereum, which was very good, but the supply was limited.

I think this is a software based solution rather than ASIC hardware. With the advancement of AI and ML over recent years including things like chat GPT its certainly possible there is software which can feed better inputs to an ASIC to solve a block

perhaps now with bitcoin ETF and bitcoin becoming a bit more mainstream we may see companies outside of china/russia which step up in the ASIC game, would be nice to have a competitor to bitmain, also one that doesnt totally scalp their prices
legendary
Activity: 1834
Merit: 1136
I wouldn't believe all the announcements in the news. There were a lot of such announcements, including from Intel, but mining equipment from Bitmain is the most widespread in this market and therefore the best. I remember only 1 case when a private company released an ASIC for Ethereum, which was very good, but the supply was limited.
member
Activity: 152
Merit: 25
Quote
Quantum announced a new proprietary "Method C" for more efficient BTC mining, based on machine learning and predictive AI technology.
QBT
 said the new method was yielding consistent results, and had favourably demonstrated predictive ability in c. 30% of cases if an input to SHA-256 would produce a winning hash.

Quantum Blockchain has announced a new proprietary method of Bitcoin mining, this time implemented in hardware, promising significant energy savings compared to traditional methods. The company's "Method B" made headlines last year as it promised to increase the rate of successful Bitcoin mining by 2.6 times - entirely achieved in software.
Well how about starting a new thread on this in the Development & Technical Discussion area where it belongs ? Hijacking an ancient thread with off-topic information is just - pointless - and does nothing to expose your information to those who are better qualified to look into it and assess its viability.
Its not off topic, the topic is faster bitcoin mining (with CPU), the guy posted earlier this year mentioning quantum and asked for some more info on literature, unlike you instead of saying its not going to work i gave the info to the (maybe) 30% faster solution. Also, Kano said he was attempted something and ran out of ram quickly. Maybe faster RAM and more cores (from the next-gen threadripper workstation) can help solve this seemingly high-resource-demand software problem...

You are confusing what is being said with mining directly on the CPU. I think here is a software solution to feed potentially better input to ASICs.
legendary
Activity: 3822
Merit: 2703
Evil beware: We have waffles!
Quote
Quantum announced a new proprietary "Method C" for more efficient BTC mining, based on machine learning and predictive AI technology.
QBT
 said the new method was yielding consistent results, and had favourably demonstrated predictive ability in c. 30% of cases if an input to SHA-256 would produce a winning hash.

Quantum Blockchain has announced a new proprietary method of Bitcoin mining, this time implemented in hardware, promising significant energy savings compared to traditional methods. The company's "Method B" made headlines last year as it promised to increase the rate of successful Bitcoin mining by 2.6 times - entirely achieved in software.
Well how about starting a new thread on this in the Development & Technical Discussion area where it belongs ? Hijacking an ancient thread with off-topic information is just - pointless - and does nothing to expose your information to those who are better qualified to look into it and assess its viability.
member
Activity: 152
Merit: 25
I found the article


https://www.voxmarkets.co.uk/articles/quantum-blockchain-s-new-bitcoin-mining-method-promises-to-slash-energy-usage-8f70e57/


Quote
Think those new 96 core threadrippers pro workstations with 8 channel ddr5 2TB ram could handle it?  Grin the ram is ECC too
No it can't. Period.
There are no 'complex equations' or patterns to solve therefore applying possible mathematical shortcuts is not possible because there are none. What part of 'random' do you folks not get?

i would like to see your thoughts on this

Quote
Quantum announced a new proprietary "Method C" for more efficient BTC mining, based on machine learning and predictive AI technology.
QBT
 said the new method was yielding consistent results, and had favourably demonstrated predictive ability in c. 30% of cases if an input to SHA-256 would produce a winning hash.

Quantum Blockchain has announced a new proprietary method of Bitcoin mining, this time implemented in hardware, promising significant energy savings compared to traditional methods. The company's "Method B" made headlines last year as it promised to increase the rate of successful Bitcoin mining by 2.6 times - entirely achieved in software.

The new Method C uses predictive AI to accelerate the cumbersome BTC mining process. Specifically, the technology tries to predict whether an input to SHA-256, the core algorithm for Bitcoin mining, is likely to generate a winning hash, or not. The underlying assumption for Method C is that an "oracle" decision is materially less computationally demanding than the SHA-256 calculation for the same input string.
legendary
Activity: 3822
Merit: 2703
Evil beware: We have waffles!
Quote
Think those new 96 core threadrippers pro workstations with 8 channel ddr5 2TB ram could handle it?  Grin the ram is ECC too
No it can't. Period.
There are no 'complex equations' or patterns to solve therefore applying possible mathematical shortcuts is not possible because there are none. What part of 'random' do you folks not get?
member
Activity: 152
Merit: 25
I did recently see that a company with quantum in their name had figured a method to more efficiently brute force/ solve a block, i think they had 3 methods and had completed testing to prove that their nethods were up to something like 30% quicker than existing block solving

What ive said here isn't quite accurate as i cant remember exactly what they found out or if its even true but its interesting... This kind of tech could be worth alot of money and i bet its extremely well protected. Imagine if one of the big pools or players were using this. "30% more blocks? Damn!!!"

Wrote code to do the code generation, and optimise the results automatically at each step, long ago with 40 cores of CPU and 512GB of ram.
Ran out of ram real fast Smiley

Think those new 96 core threadrippers pro workstations with 8 channel ddr5 2TB ram could handle it?  Grin the ram is ECC too
legendary
Activity: 4634
Merit: 1851
Linux since 1997 RedHat 4
Wrote code to do the code generation, and optimise the results automatically at each step, long ago with 40 cores of CPU and 512GB of ram.
Ran out of ram real fast Smiley
legendary
Activity: 3822
Merit: 2703
Evil beware: We have waffles!
This thread and the link you gave are from the very early years of BTC and were long ago proven to be irrelevant when compared to using ASIC's to just brute-force the process.
newbie
Activity: 1
Merit: 0
hi! im kind of new on the subject and brings me to this post and i think that this is a very intresting discussion, any news about this? everyone disapear because they found the way? haha

I recently found that an algebraic fault attack could be effective to get a input of a sha256 .
Here is the link if anyone want to check.
https://archive.org/details/algebraic-fault-attack-on-the-sha-256-compression-function/page/n4/mode/1up

I hope you found something intresting in the papper and i hope to hear some news about you guys.

newbie
Activity: 46
Merit: 0
exploring which input bits effect which output bits (1, 2 or 3 at a time) might be interesting
I don't expect SHA2 to be broken by the ways discussed in this thread, but it's a good thing that someone tries it. A good result from such approach would be not analytical solution (seems impossible, but there always is a hope that equations system will collapse Smiley), but representation of double SHA2 better (by operations count or height) that ones currently used in ASICs. I think I know how to chop 10% (or so) off from ~120000 binary operations, but let's hope that this thread will bring some fresh ideas Smiley

logred_v8.zip ready - faster computation of !a which is the most expensive operation.

ri seems to have almost exactly 2i-1 terms.  So the last bit r31 will have 230 terms.  I've confirmed this with ri i=0..12 and it's bang on.  The Quine-McClumskey reduction sure does a number on the complexity but it's not enough.

botnet - to take this further, you need to demonstrate that one of the input bits of the sha256 compression function does not pass through the Least Significant Bit inputs of the adds.

Or show one of the compression function output bits do not depend on the Most Significant Bit outputs of the 32bit adds.

Or that chained 32-bit adds have some reductions.

Cheers
hero member
Activity: 524
Merit: 500
exploring which input bits effect which output bits (1, 2 or 3 at a time) might be interesting
I don't expect SHA2 to be broken by the ways discussed in this thread, but it's a good thing that someone tries it. A good result from such approach would be not analytical solution (seems impossible, but there always is a hope that equations system will collapse Smiley), but representation of double SHA2 better (by operations count or height) that ones currently used in ASICs. I think I know how to chop 10% (or so) off from ~120000 binary operations, but let's hope that this thread will bring some fresh ideas Smiley
newbie
Activity: 46
Merit: 0
But I still think you haven't quite grasped the magnitude of the P/NP problem yet! Then again, if the impossible ever gets done, it'll be as always, by someone who didn't know it was impossible.

Also botnet - If you've got the crypto-bug - exploring which input bits effect which output bits (1, 2 or 3 at a time) might be interesting.  This kind of differential analysis is what took down MD5 and cost me $10,000 (http://en.wikipedia.org/wiki/MD5CRK) Smiley

Eg:
Code:
SHA256(000000...0000) = A
SHA256(000000...0001) = B
SHA256(000000...0011) = C
SHA256(000000...0010) = D
SHA256(000000...0110) = E
SHA256(000000...0111) = F
SHA256(000000...0101) = G
...

then

Code:
A^B = ?
A^C = ?
A^D = ?
...
B^C = ?
B^D = ?
...
newbie
Activity: 46
Merit: 0
Note: working on Quine-McClusky now.  Smiley  Always wanted a high speed lib for logic reductions ...

Alright, got Quine-McCluskey implemented.  Also implements logred_set_equiv(a,b) which goes through all possible input permutations and confirms "a" behaves the same as "b".

Q-McC is an expensive operation (Q-McC on r05 takes a full second when compiled with -s -O6), so I have not included it in logred_set_reduce().

It reduces r05 (the 6th bit in 32bit addition) from
Code:
r05 (terms=  1120, symbols=   8842, uniqueSymbols=12)
to
Code:
r05 (terms=   156, symbols=   1080, uniqueSymbols=12)

It has no effect on c00,c01,..ci

Still don't think you'll find what you're looking for.  But have fun!

logred_v7.zip

hero member
Activity: 492
Merit: 503
Best of luck with this, botnet, but I feel it's a wild goose chase. I first heard about bitcoins in autumn 2011 and immediately set about trying to 'simplify' sha256(sha256(x)) in the same way as you, by trying to find boolean expressions for the first bits of the output in terms of the 640 bits of input. As you've found out, it's the seemingly simple addition of two 32-bit numbers that's the big problem. It simply doesn't lend itself to being simplfied, as it's recursive. As you crawl along the bits from right to left the expression balloons exponentially. All the other operations in SHA-256, the logical connectives and the rotations/shifts are trivial by comparison.

You might want to look into the structure of the block header to see how the 640 bits change at different rates. Fr'instance I believe the version number (first 32 bits of the header, but in stupid-endian) has been 1 for most of bitcoin's history but may be 2 now or soon. That's 30 zero bits right there. Then you have the first (or, goddammit, probably last) 32 bits of the previous hash guaranteed to be zero. The difficulty changes every two weeks, the merkle hash every block, and so on.

But I still think you haven't quite grasped the magnitude of the P/NP problem yet! Then again, if the impossible ever gets done, it'll be as always, by someone who didn't know it was impossible.
newbie
Activity: 35
Merit: 0
1)    logred_set_scanf(a, "a00 + a00");
   logred_set_reduce(a);
   logred_set_printf(a);

Shouldn't that always produce 0?   for me it outputs a00
Nope.  A or A == A.   A or !A == 1  and A!A = 0

Ahh, there was misunderstanding about the + operator.
I've been thinking about arithmetic operations mod 2, where "(a+b) mod 2" is equivalent to XOR
0+0 = 0, 0 mod 2 = 0
0+1 = 1, 1 mod 2 = 1
1+0 = 1, 1 mod 2 = 1
1+1 = 2, 2 mod 2 = 0
= XOR
newbie
Activity: 46
Merit: 0
Note: working on Quine-McClusky now.  Smiley  Always wanted a high speed lib for logic reductions ...

Been trying this out, so many questions...

1)    logred_set_scanf(a, "a00 + a00");
   logred_set_reduce(a);
   logred_set_printf(a);

Shouldn't that always produce 0?   for me it outputs a00

Nope.  A or A == A.   A or !A == 1  and A!A = 0

2) Similarly I would expect (a + !a) = 1
   logred_set_scanf(a, "a00");
   logred_set_not(b, a);
   logred_set_xor(c, a, b);
= crash

a = "a00"
b = "!a00"
c = "a00" ^ "!a00"
  = a00!(!a00) + !(a00)!a00
  = a00 + !a00
  = 1

Yup, it crashes.  Comments out "logred_set_reduce_4_A_nAB_n(src)" in "logred_set_reduce()" to fix this.

It doesn't resolve down to "1" yet.  soon.

3) Haven't debugged enough, but part 2 could be related to this?:
   logred_set_scanf(a, "!a00!b00"); = works
    logred_set_scanf(a, "!a00"); = crash

Version I'm working on doesn't crash on this. 

4) Actually, is there support for numeric literals (0, 1)?  c = a+b is purely symbolic, but during SHA256 you add in those K-Constants which can sometimes help reduce things further.

Humm, good point.  Will not be ready for this release. Soon.

4) x64 only?  This is giving me heap problems:
#define VAR_T           size_t
#define VAR_T_SIZE_BITS ( 8 * sizeof(VAR_T) )
#define VAR_A_BITS      256
#define VAR_A_LEN       4L   

(worked around setting VAR_A_BITS = 128... maybe this is the source of all my problems? Smiley)

The code should be x32/x64 agnostic.  logred_v6.zip is a snapshot of what I have so far.  Ignore the logred_set_reduce_5_Quine_McCluskey() functions.

The Makefile now lets you define a bunch of flags like VAR_A_BITS.  But it should work for values up to 960.  It may work beyond that with some tweeks to the variable string lenghts ("a960" and "a1024" are different lengths).

Cheers.
newbie
Activity: 35
Merit: 0
Playing a bit here, turns out the A=A+AB and A+B=A+!AB reductions have no effect in 32bit add.  Guess that's what makes it a nice 32bit bijection.

Download logred_v4.zip which will be faster for your application.

You can disable those reduction algorithms by commenting out logred_set_reduce_2_A_AB() and logred_set_reduce_4_A_nAB_n() inside logred_set_reduce()

Been trying this out, so many questions...

1)    logred_set_scanf(a, "a00 + a00");
   logred_set_reduce(a);
   logred_set_printf(a);

Shouldn't that always produce 0?   for me it outputs a00

2) Similarly I would expect (a + !a) = 1
   logred_set_scanf(a, "a00");
   logred_set_not(b, a);
   logred_set_xor(c, a, b);
= crash

3) Haven't debugged enough, but part 2 could be related to this?:
   logred_set_scanf(a, "!a00!b00"); = works
    logred_set_scanf(a, "!a00"); = crash

4) Actually, is there support for numeric literals (0, 1)?  c = a+b is purely symbolic, but during SHA256 you add in those K-Constants which can sometimes help reduce things further.

4) x64 only?  This is giving me heap problems:
#define VAR_T           size_t
#define VAR_T_SIZE_BITS ( 8 * sizeof(VAR_T) )
#define VAR_A_BITS      256
#define VAR_A_LEN       4L   

(worked around setting VAR_A_BITS = 128... maybe this is the source of all my problems? Smiley)
newbie
Activity: 46
Merit: 0
I'll PM you a link to my C code for logic reduction in a few mins.  You can compile C right?  And know how to code?  It should be quite a bit faster than Mathmatica or Maple.

It uses de Morgan's rules and some basic reduction rules to optimize the circuit.

This is excellent by the way Smiley I will start integrating with my code.   Right now my stuff is all C#, so I will have to compile your code into a lib and do marshaling -  though at first glance of your API I don't think marshaling will have too bad of a perf impact.   I'll try with c=a+b to test the runtime and validate the results match yours.

Do you know how your de Morgan approach compares to this Quine McCluskey one I keep reading about? http://en.wikipedia.org/wiki/Quine%E2%80%93McCluskey_algorithm


Quine-McCluskey is the analytical twin of Karnaugh Maps (http://en.wikipedia.org/wiki/Karnaugh_map)

Basically, QMC/KMap takes I/O and creates formulas.  My lib lets you build the formulas directly and simplify them.  You *could* create the formulas based on I/Os I guess.  But since we know the SHA-256 formulas you could skip that step.

In fact you really want to, otherwise you will never have the correct formula unless you iterate all possible inputs for the SHA256 compression function (2^256 + 2^(W-array inputs) ).

Even 2^32 addition is going to be massive.  The nth (0..31) output bit of 32bit-add will be 2^(n+1) terms long.  You're only hope is multiple 32bit-adds start simplifying.

Download logred_v3.zip - there was a bug in performing (A+BCD+EFG+...)(HI+!JK+...)(...)... operations.

Cheers

Edit:
Playing a bit here, turns out the A=A+AB and A+B=A+!AB reductions have no effect in 32bit add.  Guess that's what makes it a nice 32bit bijection.

Download logred_v4.zip which will be faster for your application.

You can disable those reduction algorithms by commenting out logred_set_reduce_2_A_AB() and logred_set_reduce_4_A_nAB_n() inside logred_set_reduce()
newbie
Activity: 35
Merit: 0
I'll PM you a link to my C code for logic reduction in a few mins.  You can compile C right?  And know how to code?  It should be quite a bit faster than Mathmatica or Maple.

It uses de Morgan's rules and some basic reduction rules to optimize the circuit.

This is excellent by the way Smiley I will start integrating with my code.   Right now my stuff is all C#, so I will have to compile your code into a lib and do marshaling -  though at first glance of your API I don't think marshaling will have too bad of a perf impact.   I'll try with c=a+b to test the runtime and validate the results match yours.

Do you know how your de Morgan approach compares to this Quine McCluskey one I keep reading about? http://en.wikipedia.org/wiki/Quine%E2%80%93McCluskey_algorithm
newbie
Activity: 46
Merit: 0
To generate formulas, I do an in-order walk of the tree, and simplify at each level going up... although I'm not sure I fully understand your quesiton.

Quick update, I've been trying out Maple as a potential replacement for Mathematica.  BooleanSimplify chokes on even 8 bits, but doing simplify( (arithmetic expression mod 2, {a*a-a, b*b-b, c*c-c, ....}) does way better.   Maple seems to evaluate these expressions an order of magnitude faster than mathematica, although the resulting equations aren't as compact.  It also consumed a modest 300 megs of memory for 8 bits.  Still learning the maple syntax so things might improve.


OK I think I see now.

I'll PM you a link to my C code for logic reduction in a few mins.  You can compile C right?  And know how to code?  It should be quite a bit faster than Mathmatica or Maple.

It uses de Morgan's rules and some basic reduction rules to optimize the circuit.
 
  • !(AB) = !A + !B,
  • A = A + AB, and
  • A + B = A + !AB

It takes about 4 hours to complete a single 32-bit addition.  The API has all the basic operations you want (AND, OR, XOR, NOT, etc).
newbie
Activity: 35
Merit: 0
Cool.  How are you generating the formulas?  Iterating through each input possibility?

The 8th bit "explosion" is probably due to triggering the use of a 2nd path of 32bit addition.

The formulas for 32bit add (64 in, 32 out) get very large very quickly (http://jlcooke.ca/btc/sha256_logred_formulas.html) due to carry bits.  Result r_31 depends on a_31 .. a_0 and b_31 .. b_0.

To generate formulas, I do an in-order walk of the tree, and simplify at each level going up... although I'm not sure I fully understand your quesiton.

Quick update, I've been trying out Maple as a potential replacement for Mathematica.  BooleanSimplify chokes on even 8 bits, but doing simplify( (arithmetic expression mod 2, {a*a-a, b*b-b, c*c-c, ....}) does way better.   Maple seems to evaluate these expressions an order of magnitude faster than mathematica, although the resulting equations aren't as compact.  It also consumed a modest 300 megs of memory for 8 bits.  Still learning the maple syntax so things might improve.
newbie
Activity: 46
Merit: 0
Any updates?

Memory is the big issue right now, trying to run simulations to determine how much ram will be needed.
I did runs with 4, 6, 7, and 8 symbolic input bits to see the peak working set of the mathematica kernel.  I picked bits from nonce to be symbolic, and the fixed bits came from a previously solved blockheader.

4 bits: 79,312 K
6 bits: 84,720 K
7 bits: 89,372 K
8 bits: 850,800 K

What's odd is it looked very linear until 8 bits were reached;  It's running now with 9 bits across 4 cores, I'll have those results in about 30 hours.

I also computed the equation length after PolynomialReduce at each node in the tree;  I trust all the data except for '7 bits' where I maybe have recorded it wrong:

4 bits: median: 25, max: 62
6 bits: median: 170, max:222
7 bits: median: 379, max:495
8 bits: median: 919, max:1218

There the growth does appear linear.

For 8 bits I also computed the number of nodes that contained two leaf node children.  This is essentially the number of equations that can be simplified in parallel (before any potential adjustments to the tree.) Is there a word for this type of node?

median: 17, max:96

I can post the output equations if anyone is interested.

Lastly, I would be open to try any other symbolic computation engines that support modulus->2 to compare speed and memory consumption.   Any suggestions? Smiley



Cool.  How are you generating the formulas?  Iterating through each input possibility?

The 8th bit "explosion" is probably due to triggering the use of a 2nd path of 32bit addition.

The formulas for 32bit add (64 in, 32 out) get very large very quickly (http://jlcooke.ca/btc/sha256_logred_formulas.html) due to carry bits.  Result r_31 depends on a_31 .. a_0 and b_31 .. b_0.
newbie
Activity: 35
Merit: 0
Any updates?

Memory is the big issue right now, trying to run simulations to determine how much ram will be needed.
I did runs with 4, 6, 7, and 8 symbolic input bits to see the peak working set of the mathematica kernel.  I picked bits from nonce to be symbolic, and the fixed bits came from a previously solved blockheader.

4 bits: 79,312 K
6 bits: 84,720 K
7 bits: 89,372 K
8 bits: 850,800 K

What's odd is it looked very linear until 8 bits were reached;  It's running now with 9 bits across 4 cores, I'll have those results in about 30 hours.

I also computed the equation length after PolynomialReduce at each node in the tree;  I trust all the data except for '7 bits' where I maybe have recorded it wrong:

4 bits: median: 25, max: 62
6 bits: median: 170, max:222
7 bits: median: 379, max:495
8 bits: median: 919, max:1218

There the growth does appear linear.

For 8 bits I also computed the number of nodes that contained two leaf node children.  This is essentially the number of equations that can be simplified in parallel (before any potential adjustments to the tree.) Is there a word for this type of node?

median: 17, max:96

I can post the output equations if anyone is interested.

Lastly, I would be open to try any other symbolic computation engines that support modulus->2 to compare speed and memory consumption.   Any suggestions? Smiley

newbie
Activity: 46
Merit: 0
Thoughts? Make sure you have 32GB of RAM installed   Grin

Do you guys know how often 'Version' changes in the block header?  Or more specifically, can any bits in the block header can be treated as constant for some extended period of time, for the purposes of mining?

I am running into some memory concerns, mostly due to all the parallelization.  The fewer variables in these equations the better Smiley

Any updates?

hero member
Activity: 524
Merit: 500
newbie
Activity: 46
Merit: 0
I am running into some memory concerns, mostly due to all the parallelization.  The fewer variables in these equations the better Smiley

http://jheusser.github.io/2013/02/03/satcoin.html

SAT solvers tried, didn't work.  But I'd still like to see a set of 256 monster logic formula with 2048 bit inputs for the SHA256 round function.  Smiley
newbie
Activity: 46
Merit: 0
Thoughts? Make sure you have 32GB of RAM installed   Grin

Do you guys know how often 'Version' changes in the block header?  Or more specifically, can any bits in the block header can be treated as constant for some extended period of time, for the purposes of mining?

I am running into some memory concerns, mostly due to all the parallelization.  The fewer variables in these equations the better Smiley

The last version change IIRC was at the last 50%+1 blockchain split earlier in the year.
newbie
Activity: 46
Merit: 0
your c1...c31 bits can be reduced by one instruction Smiley

c1 == a1b1 + a1c0 + b1c0
   == a1(b1+c0) + b1c0

Yes, but at the expense of 1 extra depth in the circuit.  The form on the site is for shortest path - ideal for implementation.


c1 == a1b1 + a1c0 + b1c0
   == a1(b1+c0) + b1c0
or
   == a1b1 + c0(a1 + b1)
or
   == b1(a1 + c0) + a1c0
hero member
Activity: 524
Merit: 500
can any bits in the block header can be treated as constant
hashMerkleRoot Grin
newbie
Activity: 35
Merit: 0
Thoughts? Make sure you have 32GB of RAM installed   Grin

Do you guys know how often 'Version' changes in the block header?  Or more specifically, can any bits in the block header can be treated as constant for some extended period of time, for the purposes of mining?

I am running into some memory concerns, mostly due to all the parallelization.  The fewer variables in these equations the better Smiley
sr. member
Activity: 420
Merit: 250
★☆★777Coin★☆★
Minning could return to normal people, minning Undecided
full member
Activity: 286
Merit: 100
Hehe.  I like encouraging people to explore this.  Good learning "exorcize" (pun).  32 bit addition logic for botnet: http://jlcooke.ca/btc/sha256_logred.html

Just do that about 768 times and reduce!  Tongue

your c1...c31 bits can be reduced by one instruction Smiley

c1 == a1b1 + a1c0 + b1c0
   == a1(b1+c0) + b1c0




Replacing ADD with MUL makes no difference. It does indeed, since I didn't see that a1b1 was a multiplication, not a value.

Matthew:out
newbie
Activity: 35
Merit: 0
Hehe.  I like encouraging people to explore this.  Good learning "exorcize" (pun).  32 bit addition logic for botnet: http://jlcooke.ca/btc/sha256_logred.html

Just do that about 768 times and reduce!  Tongue

your c1...c31 bits can be reduced by one instruction Smiley

c1 == a1b1 + a1c0 + b1c0
   == a1(b1+c0) + b1c0


full member
Activity: 238
Merit: 100
Bitcoin For All
Well if you just want more parallel processors to tackle this...
http://www.parallella.org/board/

You'll wait till October (Sound familiar?) -- but what the heck -- you can build up a stack -- 16 cores at a time.

I've worked with NP optimization programs for many years -- there is usually someway to organize the problem to get faster results -- I own some IP in this area and I'll leave it at that.
full member
Activity: 196
Merit: 100
ooooHHHH Math, we meet again Undecided

Goodness if that Pic is you, We need to meet...   Shocked
sr. member
Activity: 354
Merit: 250
ooooHHHH Math, we meet again Undecided
full member
Activity: 196
Merit: 100
Interesting..  The Math is over my head..   Shocked
hero member
Activity: 524
Merit: 500
I rewrote the code so that it performs the double SHA256 first, constructing an in-memory tree of all the operations and variables (without actually evaluating them, arithmetically or symbolically.)
I did similar thing for GPU miner, but on 32bit words level. First compiled with my own class instead of int, thus stripping off all the preprocessing and inlining stuff, then generated C code, compiled it with -O3 -S -masm=intel, lifted assembly code back to C, gave a final test run on CPU and fed resulting mess to OpenCL compiler  Grin Grin Grin
hero member
Activity: 524
Merit: 500
32 bit addition logic for botnet: http://jlcooke.ca/btc/sha256_logred.html
Just do that about 768 times and reduce!  Tongue
You seems to be inspired by my signature Smiley Yse, skipping optimized n-ary additions while in 32 bit world would keep established ASIC manufacturers happy Smiley
newbie
Activity: 46
Merit: 0
Are you saying that an algorithm that cannot be simplified cannot exist?
I'd be happy with good approximation of double SHA-2 Smiley

Even if you find that a (SHA-256)^2 function can be implemented with depth=2 and width=1,000,000,000 - it would be useful work that could be built upon.
Quite unrealistic, but if done - SAT-solvers will step in.

Hehe.  I like encouraging people to explore this.  Good learning "exorcize" (pun).  32 bit addition logic for botnet: http://jlcooke.ca/btc/sha256_logred.html

Just do that about 768 times and reduce!  Tongue
hero member
Activity: 524
Merit: 500
Are you saying that an algorithm that cannot be simplified cannot exist?
I'd be happy with good approximation of double SHA-2 Smiley

Even if you find that a (SHA-256)^2 function can be implemented with depth=2 and width=1,000,000,000 - it would be useful work that could be built upon.
Quite unrealistic, but if done - SAT-solvers will step in.
newbie
Activity: 46
Merit: 0
Let's let the OP try to write up the logic formula for SHA256's first 32 bits of input and 1 bit of output. 

Thanks for the links.

I rewrote the code so that it performs the double SHA256 first, constructing an in-memory tree of all the operations and variables (without actually evaluating them, arithmetically or symbolically.)  So basically for each bit in the final hash, I have a tree representing all the operations that have to occur (using an in-order walk) to compute that bit.

...

If everything looks good, the longer term plan is to assign branch simplification to different datacenter machines, so that I can have a few thousand of them simplifying in parallel instead of 8 on my desktop.


Fun!  This work could be published in crypto journals, you know.  Even if you find that a (SHA-256)^2 function can be implemented with depth=2 and width=1,000,000,000 - it would be useful work that could be built upon.  And you code to do this could be modified to examine SHA-3, whirlpool, and other hash functions.

For the record - (SHA-256)^2 has 64+64=128 rounds with 6 32-bit additions per round.
newbie
Activity: 35
Merit: 0
Let's let the OP try to write up the logic formula for SHA256's first 32 bits of input and 1 bit of output. 

Thanks for the links.

I rewrote the code so that it performs the double SHA256 first, constructing an in-memory tree of all the operations and variables (without actually evaluating them, arithmetically or symbolically.)  So basically for each bit in the final hash, I have a tree representing all the operations that have to occur (using an in-order walk) to compute that bit.

What this allows me to do is simplify different branches of the trees independently, and in parallel on different threads.  The other thing it gives me is the ability to see the current depth of the tree I'm trying to simplify, which will give me some "order of magnitude" guess of how large the final equation will be. 
There's thousands of these branches to simplify and they do take a very long time, but watching them in the debugger, things are reducing and it's not ballooning out of control (yet.)

If everything looks good, the longer term plan is to assign branch simplification to different datacenter machines, so that I can have a few thousand of them simplifying in parallel instead of 8 on my desktop.
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
There HAS to be a mathematical shortcut that simplifies the encryption algorithm instead of randomly cracking with brute force. It wouldn't surprise me if there was already a solution out there.
The greatest mathematical minds on the planet specifically designed the algorithm so that there wouldn't be. Are you saying that an algorithm that cannot be simplified cannot exist? (If you think about it, that's obviously incorrect.)
legendary
Activity: 1039
Merit: 1005
There HAS to be a mathematical shortcut...

Wishing does not work well in reality unless you address someone who can fulfill your wish. Adressing the universe wishing for a mathematical solution to a problem that's *designed* to have no easily computable mathematical solution won't work.

Onkel Paul
legendary
Activity: 2062
Merit: 1035
Fill Your Barrel with Bitcoins!
There HAS to be a mathematical shortcut that simplifies the encryption algorithm instead of randomly cracking with brute force. It wouldn't surprise me if there was already a solution out there.
newbie
Activity: 46
Merit: 0
The big problem is that there isn't enough space on all the computers in the universe to store the equation for just one of the bits, even if fully simplified. That and even if every particle in the universe were a supercomputer, the age of the universe wouldn't be long enough to solve the equation. The equations are not linear and so they don't simplify very much.

Cryptographers know of this symbolic reduction technique - and SHA-256 and AES and RSA and about eveyr single ASIC design process uses SymRed to optimize their circuits.

Let's let the OP try to write up the logic formula for SHA256's first 32 bits of input and 1 bit of output.  Then try to use Karnaugh Maps (http://en.wikipedia.org/wiki/Karnaugh_map) or other technique to simplify it.  I predict it will educate them on how hard a problem this is and what kinds of things cryptographers think of when designing crypto functions.

Edit: Also look at: http://en.wikipedia.org/wiki/Circuit_minimization
legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
The big problem is that there isn't enough space on all the computers in the universe to store the equation for just one of the bits, even if fully simplified. That and even if every particle in the universe were a supercomputer, the age of the universe wouldn't be long enough to solve the equation. The equations are not linear and so they don't simplify very much.
newbie
Activity: 28
Merit: 0
I have to say on a CPU brute force mode mining will always be king in my eyes. I like that you're trying to innovate, never stop looking for ways to improve. But your method may need some tweaking if it's going to be the best way to mine for you. You raise some extremely interesting points though, we should chat some time.
newbie
Activity: 35
Merit: 0
a colleague of mine noted the symbolic addition follows a recurrence relation pattern and hence might be simplified by these means: http://en.wikipedia.org/wiki/Recurrence_relation
full member
Activity: 159
Merit: 100
@botnet
Now I understand your idea better. But you are possibly wrong: The main complexity of your approach is not because of the degree of the polynomials but because of the referencing of the literals (you called them 'variables') to each other.

At this point you have just two choices:
1. Try random input (you know this solution under the (wrong) name 'brute force')
2. Try to approximate to a good solution (but then your result would only be partly correct: Who is interested in a 90% correct nonce to the given block and targeted result?) and it is not proven in this case that an approximation exists.

I think there is some misunderstanding of the technique that I will try and clear up with an example.  It should in fact give an exact solution, not an approximate one.  
What I saw is that your problem is possibly some sort of NP-hard. So I referenced to the only viable solution to get feasible results out of your algorithm: random input or approximation.

In my proposed approach, this is done symbolically.   First, symbolically do the addition and Xor operations to produce a set of equations that represents each of the 8 bits of 'target' in terms of the 8 bits of 'nonce'

t0: (!n0)
t1: (Xor[1, n0, n1])
t2: (Xor[n2, n1 && n0])
t3: (Xor[1, !n3, n2 || (n1 && n0)])
t4: (Xor[n4, n3 || n2 || (n1 && n0)])
t5: (Xor[1, n4 && (n3 || n2 || (n1 && n0)), !n5])
t6: (Xor[1, n6, n5 || (n4 && (n3 || n2 || (n1 && n0)))])
t7: (Xor[n6 && (n5 || (n4 && (n3 || n2 || (n1 && n0)))), !n7])
I do not have the time right now to write a proof. So I feel not fully confident about the hardness of your problem at the moment. But I see at the first glance a starting point to prove that your algorithm could solve the 'Linear Equations Problem' (often referenced as 'LIN'), too.

In this case your algorithm would be at least NP-hard and you are out of luck to find a good exact solution for it.

This kind of algorithms (if your algorithm turns out to be NP-hard) is a little bit creepy: If you have only few (testing) data to solve they feel like some sort of good solution. But if you feed the 'real data' workload to them then they will not terminate in about a couple of years (or sometimes millions of years). Surely they will deliver a result (if correctly planned and programmed, i.e. they terminate) but nobody will ever profit from the output.

A good example is the Travelling Salesperson Problem (TSP) (yeah, I know that TSP is NP-complete and not only NP-hard, but this doesn't make a difference at this point). If you have just five or six cities in your input then it has good solutions and is acceptable fast. But if you give all the cities of a small country to it then it will not terminate within some hundred of years waiting time.

Please keep in mind that there is a tiny chance that I could have overseen anything and your algorithm is in fact not NP-hard. Only a polynomial reduction from LIN to your algorithm (or from SAT to your algorithm) can exclude all possibility of doubt.
full member
Activity: 286
Merit: 100
OK, I see what you are trying to do here.

There are still some problems, though, namely:

a) Parsing is going to be a b****
b) Not everyone happens to have a copy of Wolfram Mathematica lying around
c) Precomputation is still very computationally intensive.

In response to a) and b), IMO, you might as well include the symbolic solver in the miner, since you really don't need the computational power of Mathematica to check if a number is 1 or 0.

Matthew:out
newbie
Activity: 35
Merit: 0
Now I understand your idea better. But you are possibly wrong: The main complexity of your approach is not because of the degree of the polynomials but because of the referencing of the literals (you called them 'variables') to each other.

At this point you have just two choices:
1. Try random input (you know this solution under the (wrong) name 'brute force')
2. Try to approximate to a good solution (but then your result would only be partly correct: Who is interested in a 90% correct nonce to the given block and targeted result?) and it is not proven in this case that an approximation exists.

I think there is some misunderstanding of the technique that I will try and clear up with an example.  It should in fact give an exact solution, not an approximate one.   

As mentioned, traditional bitcoin mining performs target=SHA256(SHA256(header+nonce)), looking for a very specific solution (top 4 bytes of 'target' are 0.)    Mining today uses a brute force approach, iterating through every possible nonce value in the 32 bit address space, performing the double SHA256, and looking at the result to check if those top 4 bytes are 0.

In my example here, rather than the function target=SHA256(SHA256(header+nonce)), let's start with a very simple function:

target = (nonce + 0xAD) ^ 0x6E;

(Assume we're working with 8 bit numbers for this example.)


Just like mining, let's try to find a value for 'nonce' such that target = 0;  Mining today would try and brute force try all possible values for 'nonce' until that condition is satisfied, e.g:

for(nonce = 0; nonce <= 255; nonce++)
{
   target = (nonce + 0xAD) ^ 0x6E;
   if(target == 0)
      return nonce;
}


In my proposed approach, this is done symbolically.   First, symbolically do the addition and Xor operations to produce a set of equations that represents each of the 8 bits of 'target' in terms of the 8 bits of 'nonce'

t0: (!n0)
t1: (Xor[1, n0, n1])
t2: (Xor[n2, n1 && n0])
t3: (Xor[1, !n3, n2 || (n1 && n0)])
t4: (Xor[n4, n3 || n2 || (n1 && n0)])
t5: (Xor[1, n4 && (n3 || n2 || (n1 && n0)), !n5])
t6: (Xor[1, n6, n5 || (n4 && (n3 || n2 || (n1 && n0)))])
t7: (Xor[n6 && (n5 || (n4 && (n3 || n2 || (n1 && n0)))), !n7])

Now we can solve the system of equations, which will give us values for n0...n7 which satisfy the condition that t0...t7 == 0;  This will not be an approximation, we will get exact values for each bit.

As mentioned earlier, we convert the logical operations to arithmetic operations, modulo 2:

t0: (1+n0)
t1: (1+n0+n1)
t2: (n0*n1+n2)
t3: (n2+n0*n1*(1+n2)+n3)
t4: (n2+n3+n2*n3+n0*n1*(1+n2)*(1+n3)+n4)
t5: ((n2+n3+n2*n3+n0*n1*(1+n2)*(1+n3))*n4+n5)
t6: (1+n5+(n2+n3+n2*n3+n0*n1*(1+n2)*(1+n3))*n4*(1+n5)+n6)
t7: (1+(n5+(n2+n3+n2*n3+n0*n1*(1+n2)*(1+n3))*n4*(1+n5))*n6+n7)

And now we can solve (done using wolfram mathematica):

In[1] := Solve[(1 + n0) == 0 && (1 + n0 + n1) == 0 && (n0*n1 + n2) == 0 && (n2 + n0*n1*(1 + n2) + n3) == 0 && (n2 + n3 + n2*n3 + n0*n1*(1 + n2)*(1 + n3) + n4) == 0 && ((n2 + n3 + n2*n3 + n0*n1*(1 + n2)*(1 + n3))*n4 + n5) == 0 && (1 + n5 + (n2 + n3 + n2*n3 + n0*n1*(1 + n2)*(1 + n3))*n4*(1 + n5) + n6) == 0 && (1 + (n5 + (n2 + n3 + n2*n3 + n0*n1*(1 + n2)*(1 + n3))*n4*(1 + n5))*n6 + n7) == 0 && n0*n0 - n0 == 0 && n1*n1 - n1 == 0 && n2*n2 - n2 == 0 && n3*n3 - n3 == 0 && n4*n4 - n4 == 0 && n5*n5 - n5 == 0 && n6*n6 - n6 == 0 && n7*n7 - n7 == 0, {n0, n1, n2, n3, n4, n5, n6, n7}, Modulus -> 2]

Out[1] = {{n0 -> 1, n1 -> 0, n2 -> 0, n3 -> 0, n4 -> 0, n5 -> 0, n6 -> 1, n7 -> 1}}

You can see this produced the exact solution:  11000001 == 0xC1 == 193,  which does in fact validate:

(0xC1 + 0xAD) ^ 0x6E == 0


Now we just expand this for bitcoin mining.   Do everything I did above, but with the symbolic version of SHA256(SHA256(block_header)) rather than my simple example.  Take the first 32 equations from this result set (corresponding to the 4 bytes we want to be 0), and fill in literal values from the block header for everything except for the 32 bits of nonce we want to solve for.    Now solve the system of equations; we get the exact bits for nonce that produces a target with  the first 4 bytes zeroed.

Again, the symbolic SHA256 stuff can all be computed offline and saved - the computation part of mining now will just be solving that system of 32 equations, which mathematica (or another  symbolic math module) can do incredibly fast.   Much faster than my GPU can brute-force every possible nonce combination.
full member
Activity: 159
Merit: 100
@botnet
What do you mean with the word PolynomialReduce? Do you mean polynomial reduction?

In my case I literally mean the PolynomialReduce function in Wolfram Mathematica:  http://reference.wolfram.com/mathematica/ref/PolynomialReduce.html

So my TUInt32 class creates a symbolic representation of every mathematical or logical operator that is applied to it, and after each operation, it feeds that equation through Wolfram Mathematica to reduce and simplify the equation, for example: FullSimplify[PolynomialReduce[a*b, a0^2 - a0, b0^2 - b0, Modulus -> 2]]

(recall that logical operations are being transformed to numeric operations in the modulo 2 number ring)
Now I understand your idea better. But you are possibly wrong: The main complexity of your approach is not because of the degree of the polynomials but because of the referencing of the literals (you called them 'variables') to each other.

At this point you have just two choices:
1. Try random input (you know this solution under the (wrong) name 'brute force')
2. Try to approximate to a good solution (but then your result would only be partly correct: Who is interested in a 90% correct nonce to the given block and targeted result?) and it is not proven in this case that an approximation exists.

@c789
1.21 Gigawatts!

I thank you. My tip address is below.
YMMD! Grin

Really, what level of course material are you guys talking here? I only had the first level of CS classes and I'm having a tough time following. Looks cool though.
In my case: Halting problem, P, NP, some other part of the complexity zoo and SAT were basic stuff. Second year.
ILP came in higher classes.

Are you trying to get a degree in computer science?

@cheater123
I am just new here and just watching all about mining what is this but currently so confused here
Welcome to the forum!

A very short introduction to mining
To find a new block the network gives every miner a challenge (i.e. a target value). The higher the difficulty the lower the desired value. Now every miner reads the last block and tries to guess a nonce for it. If the computed result is lower than the target value then a new block has been found and the next challenge uses the newly found block.

Maybe one could try addition as basic algorithm for mining:
lastblock + nonce = result
But that would be pointless: Subtraction would find the nonce by simple computation.
result - lastblock = nonce
That is because of the fact that addition and subtraction are each fast to compute.

So the chosen basic algorithm is SHA256 which is known for a speciality: computing a result by
sha256(original_input)=result
is fast to compute.
But computing the original_input by using only the result is really time consuming.

These algorithms are known as one way function (with trap door).

For reversing to the nonce in a feasible time one would need a theoretical machine which guesses the right value in the first step.  Unfortunately there is no known way to produce such a machine in reality.

So every miner uses a random value as nonce and computes:
sha256(sha256(nonce, last_block))=result
and evaluates whether result < target_value.
If so then a new block was found. Otherwise the miner tries another random value as nonce and evaluates again.
newbie
Activity: 28
Merit: 0
I am just new here and just watching all about mining what is this but currently so confused here
hero member
Activity: 850
Merit: 1000
1.21 Gigawatts!

I thank you. My tip address is below.


Really, what level of course material are you guys talking here? I only had the first level of CS classes and I'm having a tough time following. Looks cool though.
newbie
Activity: 35
Merit: 0
What do you mean with the word PolynomialReduce? Do you mean polynomial reduction?

In my case I literally mean the PolynomialReduce function in Wolfram Mathematica:  http://reference.wolfram.com/mathematica/ref/PolynomialReduce.html

So my TUInt32 class creates a symbolic representation of every mathematical or logical operator that is applied to it, and after each operation, it feeds that equation through Wolfram Mathematica to reduce and simplify the equation, for example: FullSimplify[PolynomialReduce[a*b, a0^2 - a0, b0^2 - b0, Modulus -> 2]]

(recall that logical operations are being transformed to numeric operations in the modulo 2 number ring)
full member
Activity: 159
Merit: 100
@botnet
These will obviously be huge, but you can PolynomialReduce after each step of the hashing to reduce the equation size.
What do you mean with the word PolynomialReduce? Do you mean polynomial reduction?

I did not take the tour to understand your thoughts fully because it looks like some sort of ILP problem to me (at least there seems to be a polynomial reduction from your problem to the ILP).

Generally speaking: If there would be a known feasible solution for something like:
sha256^(-1)(sha256^(-1)(result)) = (nonce, previous_block)
then it would be silly to call sha2 an one way function.

SAT and ILP have feasible solutions only under the assumption that P=NP.
full member
Activity: 286
Merit: 100
Right. A 32-bit x86 add operation takes 1 cycle.

Your currently 7-bit symbolic add operation has (roughly - don't trust my counting skills) 66 XOR ops and 67 AND ops. You could have calculated the 133rd fibonacci number by the time you had added 7 bits together.

Another advantage brute-force has is parallelism. Because of the varying brackets, you can't even execute most of your instructions in parallel.

Matthew:out

legendary
Activity: 1596
Merit: 1012
Democracy is vulnerable to a 51% attack.
This will be many, many orders of magnitude harder than brute force.
newbie
Activity: 35
Merit: 0
Clever implementation.  Lends itself well to complete regression testing / provability.

On first inspection, I'm not convinced that the reduced form is correct.  By definition, doesn't c31 depend on a30 if b30 = 1?

Looking at it another way, the third term of c31 will be zero in all but 2 out of 2^32 cases for all possible values of b and in all cases if a0 = 0.


Thank you dentldir, it turns out there was a bug in the module feeding stuff to wolfram,

new bits for (c = a + b) are looking like this:

c0: (a0^b0)
c1: (a1^a0&b0^b1)
c2: (a2^a1&b1^a0&b0&(a1^b1)^b2)
c3: (a3^a2&b2^a1&b1&(a2^b2)^a0&b0&(a1^b1)&(a2^b2)^b3)
c4: (a4^a2&a3&b2^a3&b3^a2&b2&b3^a1&b1&(a2^b2)&(a3^b3)^a0&b0&(a1^b1)&(a2^b2)&(a3^b3)^b4)
c5: (a5^a4&(a3&b3^a2&b2&(a3^b3))^(a4^a3&b3^a2&b2&(a3^b3))&b4^a1&b1&(a2^b2)&(a3^b3)&(a4^b4)^a0&b0&(a1^b1)&(a2^b2)&(a3^b3)&(a4^b4)^b5)
c6: (a6^a4&a5&(a3&b3^a2&b2&(a3^b3))^a5&(a4^a3&b3^a2&b2&(a3^b3))&b4^(a5^a4&b4^a3&b3&(a4^b4)^a2&b2&(a3^b3)&(a4^b4))&b5^a1&b1&(a2^b2)&(a3^b3)&(a4^b4)&(a5^b5)^a0&b0&(a1^b1)&(a2^b2)&(a3^b3)&(a4^b4)&(a5^b5)^b6)

at first glance they verified in an excel sheet, other bits are still simplifying.   
sr. member
Activity: 333
Merit: 250
Ants Rock
Thoughts? Make sure you have 32GB of RAM installed   Grin
member
Activity: 73
Merit: 10
As you've shown it becomes monstrously complex very quickly and so the only viable solution is a brute force. The only hope is that you manage to cancel some terms out so that you effectively find a short cut but crypto-analysts spend an awful lot of time studying all the common crypto and hashing algorithms and I don't believe they've found anything that indicates a weakness with SHA256 yet - so I wouldn't hold your breath on this one.

Probably a better hope would be to have an analog circuit that operates backwards so that for a given hash it could indicate a list of possible inputs but, I suspect, that having built this it won't indicate any weakness - these things have been thoroughly studied using a range of techniques already.

Good luck, though.
full member
Activity: 168
Merit: 100
I am sooooo confused
JLM
full member
Activity: 164
Merit: 100
Mathematical Minning could return to normal people, minning.

Watching.
Go foward!!!
sr. member
Activity: 333
Merit: 250
Clever implementation.  Lends itself well to complete regression testing / provability.

On first inspection, I'm not convinced that the reduced form is correct.  By definition, doesn't c31 depend on a30 if b30 = 1?

Looking at it another way, the third term of c31 will be zero in all but 2 out of 2^32 cases for all possible values of b and in all cases if a0 = 0.












newbie
Activity: 35
Merit: 0
Quote
Next, perform the double SHA256, and produce a set of equations that represents each bit in the final hash.

Wouldn't the stateful 32-bit operations (ROTR, + mod 32) radically complicate creating a reduceable linear polynomial?  ROTR I can see since its just assignment in your model, but where do all the carry flags go in + mod 32?


Here is what my symbolic carry bit code looks like for + (note, TUInt32 is my class which acts as an UInt32 but symbolically tracks all operations applied to it.)

      public static TUInt32 operator +(TUInt32 b1, TUInt32 b2)
      {
         var r = new TUInt32((uint)0);

         TBit carryBit = new TBit("0");
         for (int i = 0; i < 32; i++)
         {
            var tmp = new TBit(b1._bits, "^", b2._bits);
            r._bits = new TBit(tmp, "^", carryBit);

            var op1 = new TBit(b2._bits, "|", carryBit);
            var op2 = new TBit(b1._bits, "&", op1);
            var op3 = new TBit(b2._bits, "&", carryBit);
            carryBit = new TBit(op2, "|", op3);
         }

         r._value = (uint)(b1._value + b2._value);

         return r;
      }

The equations do get quite complicated, but as I mentioned previously, you can PolynomialReduce after each step of the procedure;

For symbolic unsigned integers "a" and "b", having symbolic bits a0-a31 and b0-b31, here is the result for each bit of "c" in the operation "c = (a + b)" after PolynomialReduce:


bit0:  (a0^b0)
bit1:  (a1^a0&b0^b1)
bit2:  (a2^a0&b0&b1^b2)
bit3:  (a3^a0&b0&b1&b2^b3)
bit4:  (a4^a0&b0&b1&b2&b3^b4)
bit5:  (a5^a0&b0&b1&b2&b3&b4^b5)
bit6:  (a6^a0&b0&b1&b2&b3&b4&b5^b6)
bit7:  (a7^a0&b0&b1&b2&b3&b4&b5&b6^b7)
bit8:  (a8^a0&b0&b1&b2&b3&b4&b5&b6&b7^b8)
bit9:  (a9^a0&b0&b1&b2&b3&b4&b5&b6&b7&b8^b9)
bit10: (a10^b10^a0&b0&b1&b2&b3&b4&b5&b6&b7&b8&b9)
bit11: (a11^b11^a0&b0&b1&b10&b2&b3&b4&b5&b6&b7&b8&b9)
bit12: (a12^b12^a0&b0&b1&b10&b11&b2&b3&b4&b5&b6&b7&b8&b9)
bit13: (a13^b13^a0&b0&b1&b10&b11&b12&b2&b3&b4&b5&b6&b7&b8&b9)
bit14: (a14^b14^a0&b0&b1&b10&b11&b12&b13&b2&b3&b4&b5&b6&b7&b8&b9)
bit15: (a15^b15^a0&b0&b1&b10&b11&b12&b13&b14&b2&b3&b4&b5&b6&b7&b8&b9)
bit16: (a16^b16^a0&b0&b1&b10&b11&b12&b13&b14&b15&b2&b3&b4&b5&b6&b7&b8&b9)
bit17: (a17^b17^a0&b0&b1&b10&b11&b12&b13&b14&b15&b16&b2&b3&b4&b5&b6&b7&b8&b9)
bit18: (a18^b18^a0&b0&b1&b10&b11&b12&b13&b14&b15&b16&b17&b2&b3&b4&b5&b6&b7&b8&b9)
bit19: (a19^b19^a0&b0&b1&b10&b11&b12&b13&b14&b15&b16&b17&b18&b2&b3&b4&b5&b6&b7&b8&b9)
bit20: (a20^b20^a0&b0&b1&b10&b11&b12&b13&b14&b15&b16&b17&b18&b19&b2&b3&b4&b5&b6&b7&b8&b9)
bit21: (a21^b21^a0&b0&b1&b10&b11&b12&b13&b14&b15&b16&b17&b18&b19&b2&b20&b3&b4&b5&b6&b7&b8&b9)
bit22: (a22^b22^a0&b0&b1&b10&b11&b12&b13&b14&b15&b16&b17&b18&b19&b2&b20&b21&b3&b4&b5&b6&b7&b8&b9)
bit23: (a23^b23^a0&b0&b1&b10&b11&b12&b13&b14&b15&b16&b17&b18&b19&b2&b20&b21&b22&b3&b4&b5&b6&b7&b8&b9)
bit24: (a24^b24^a0&b0&b1&b10&b11&b12&b13&b14&b15&b16&b17&b18&b19&b2&b20&b21&b22&b23&b3&b4&b5&b6&b7&b8&b9)
bit25: (a25^b25^a0&b0&b1&b10&b11&b12&b13&b14&b15&b16&b17&b18&b19&b2&b20&b21&b22&b23&b24&b3&b4&b5&b6&b7&b8&b9)
bit26: (a26^b26^a0&b0&b1&b10&b11&b12&b13&b14&b15&b16&b17&b18&b19&b2&b20&b21&b22&b23&b24&b25&b3&b4&b5&b6&b7&b8&b9)
bit27: (a27^b27^a0&b0&b1&b10&b11&b12&b13&b14&b15&b16&b17&b18&b19&b2&b20&b21&b22&b23&b24&b25&b26&b3&b4&b5&b6&b7&b8&b9)
bit28: (a28^b28^a0&b0&b1&b10&b11&b12&b13&b14&b15&b16&b17&b18&b19&b2&b20&b21&b22&b23&b24&b25&b26&b27&b3&b4&b5&b6&b7&b8&b9)
bit29: (a29^b29^a0&b0&b1&b10&b11&b12&b13&b14&b15&b16&b17&b18&b19&b2&b20&b21&b22&b23&b24&b25&b26&b27&b28&b3&b4&b5&b6&b7&b8&b9)
bit30: (a30^b30^a0&b0&b1&b10&b11&b12&b13&b14&b15&b16&b17&b18&b19&b2&b20&b21&b22&b23&b24&b25&b26&b27&b28&b29&b3&b4&b5&b6&b7&b8&b9)
bit31: (a31^b31^a0&b0&b1&b10&b11&b12&b13&b14&b15&b16&b17&b18&b19&b2&b20&b21&b22&b23&b24&b25&b26&b27&b28&b29&b3&b30&b4&b5&b6&b7&b8&b9)

legendary
Activity: 2058
Merit: 1452
Thoughts?
I'll get back to you when you can show a working implementation that is faster than the current SSE2 one.
sr. member
Activity: 333
Merit: 250
Quote
Next, perform the double SHA256, and produce a set of equations that represents each bit in the final hash.

Wouldn't the stateful 32-bit operations (ROTR, + mod 32) radically complicate creating a reduceable linear polynomial?  ROTR I can see since its just assignment in your model, but where do all the carry flags go in + mod 32?

BTW, this topic is a great match for your nickname.  Cool

full member
Activity: 140
Merit: 100
It wont be faster than brute force method. read this for more details: http://jheusser.github.io/2013/02/03/satcoin.html

This is different than a SAT solver approach in my view,

What I'm proposing is coming up with equations that represent doing the double SHA256 symbolically.   Each equation will have no more logical operations than the double hash itself (sans the symbolic carry bits from addition) and no more than 640 input variables corresponding to the input bits of the block header.

All of this can be computed offline.   Then it's about solving a system of 32 equations with 32 unknowns to derive the correct nonce.  I propose this can be done faster than trying to hash the full nonce range on the CPU, maybe faster than doing it on a GPU;  Wolfram Mathematica for example can do it in a few seconds depending on the equation complexity.


Found this old talk for you:
https://bitcointalksearch.org/topic/sha-256-as-a-boolean-function-55888
newbie
Activity: 35
Merit: 0
It wont be faster than brute force method. read this for more details: http://jheusser.github.io/2013/02/03/satcoin.html

This is different than a SAT solver approach in my view,

What I'm proposing is coming up with equations that represent doing the double SHA256 symbolically.   Each equation will have no more logical operations than the double hash itself (sans the symbolic carry bits from addition) and no more than 640 input variables corresponding to the input bits of the block header.

All of this can be computed offline.   Then it's about solving a system of 32 equations with 32 unknowns to derive the correct nonce.  I propose this can be done faster than trying to hash the full nonce range on the CPU, maybe faster than doing it on a GPU;  Wolfram Mathematica for example can do it in a few seconds depending on the equation complexity.

full member
Activity: 140
Merit: 100
It wont be faster than brute force method. read this for more details: http://jheusser.github.io/2013/02/03/satcoin.html
newbie
Activity: 35
Merit: 0
Rather than computationally solving for SHA256(SHA256(block_header)), the approach is to solve it symbolically.   What I mean by this is, rather than treating the block header input bits as 0 or 1, treat each input bit as an individual symbolic variable (640 in total).   Next, perform the double SHA256, and produce a set of equations that represents each bit in the final hash.  These will obviously be huge, but you can PolynomialReduce after each step of the hashing to reduce the equation size.

The SHA256 output is 32 bytes, so what you end up with is a system of 256 equations, consisting of 640 variables each (which include all the logical operators applied from the double hash.)  For the purposes of mining at difficulty 1, you can discard all but the first 32 equations.  

Now to mine using this method:  You have to collapse the equations and simplify.   Take the current, real block_header in question, and fill in actual values in your 32 equations for every bit except for the 32-bit nonce (608 in total.)  Reduce.  

Now you have a system of 32 equations, and 32 unknowns.   Set all 32 equations equal to one another, and solve the system of equations.    

What you end up with are 3 possible solutions: A nonce that produces 32 "0" bits (what we want), a nonce that produces all "1" bits (discard) or no solution (increment extraNonce.)

Note to make reduction and solving the system of equations simpler, convert the logical operators (xor, or, and) into arithmetic operations and do the reduction and solving in the modulo 2 number ring, for example:

a^b == (a+b)%2  
(a | b) == ( (a*b)%2 + a%2 + b%2 );  
(a & b) == (a * b) % 2  

Thoughts?
Jump to: