Pages:
Author

Topic: Potentially faster method for mining on the CPU - page 2. (Read 14431 times)

newbie
Activity: 46
Merit: 0
To generate formulas, I do an in-order walk of the tree, and simplify at each level going up... although I'm not sure I fully understand your quesiton.

Quick update, I've been trying out Maple as a potential replacement for Mathematica.  BooleanSimplify chokes on even 8 bits, but doing simplify( (arithmetic expression mod 2, {a*a-a, b*b-b, c*c-c, ....}) does way better.   Maple seems to evaluate these expressions an order of magnitude faster than mathematica, although the resulting equations aren't as compact.  It also consumed a modest 300 megs of memory for 8 bits.  Still learning the maple syntax so things might improve.


OK I think I see now.

I'll PM you a link to my C code for logic reduction in a few mins.  You can compile C right?  And know how to code?  It should be quite a bit faster than Mathmatica or Maple.

It uses de Morgan's rules and some basic reduction rules to optimize the circuit.
 
  • !(AB) = !A + !B,
  • A = A + AB, and
  • A + B = A + !AB

It takes about 4 hours to complete a single 32-bit addition.  The API has all the basic operations you want (AND, OR, XOR, NOT, etc).
newbie
Activity: 35
Merit: 0
Cool.  How are you generating the formulas?  Iterating through each input possibility?

The 8th bit "explosion" is probably due to triggering the use of a 2nd path of 32bit addition.

The formulas for 32bit add (64 in, 32 out) get very large very quickly (http://jlcooke.ca/btc/sha256_logred_formulas.html) due to carry bits.  Result r_31 depends on a_31 .. a_0 and b_31 .. b_0.

To generate formulas, I do an in-order walk of the tree, and simplify at each level going up... although I'm not sure I fully understand your quesiton.

Quick update, I've been trying out Maple as a potential replacement for Mathematica.  BooleanSimplify chokes on even 8 bits, but doing simplify( (arithmetic expression mod 2, {a*a-a, b*b-b, c*c-c, ....}) does way better.   Maple seems to evaluate these expressions an order of magnitude faster than mathematica, although the resulting equations aren't as compact.  It also consumed a modest 300 megs of memory for 8 bits.  Still learning the maple syntax so things might improve.
newbie
Activity: 46
Merit: 0
Any updates?

Memory is the big issue right now, trying to run simulations to determine how much ram will be needed.
I did runs with 4, 6, 7, and 8 symbolic input bits to see the peak working set of the mathematica kernel.  I picked bits from nonce to be symbolic, and the fixed bits came from a previously solved blockheader.

4 bits: 79,312 K
6 bits: 84,720 K
7 bits: 89,372 K
8 bits: 850,800 K

What's odd is it looked very linear until 8 bits were reached;  It's running now with 9 bits across 4 cores, I'll have those results in about 30 hours.

I also computed the equation length after PolynomialReduce at each node in the tree;  I trust all the data except for '7 bits' where I maybe have recorded it wrong:

4 bits: median: 25, max: 62
6 bits: median: 170, max:222
7 bits: median: 379, max:495
8 bits: median: 919, max:1218

There the growth does appear linear.

For 8 bits I also computed the number of nodes that contained two leaf node children.  This is essentially the number of equations that can be simplified in parallel (before any potential adjustments to the tree.) Is there a word for this type of node?

median: 17, max:96

I can post the output equations if anyone is interested.

Lastly, I would be open to try any other symbolic computation engines that support modulus->2 to compare speed and memory consumption.   Any suggestions? Smiley



Cool.  How are you generating the formulas?  Iterating through each input possibility?

The 8th bit "explosion" is probably due to triggering the use of a 2nd path of 32bit addition.

The formulas for 32bit add (64 in, 32 out) get very large very quickly (http://jlcooke.ca/btc/sha256_logred_formulas.html) due to carry bits.  Result r_31 depends on a_31 .. a_0 and b_31 .. b_0.
newbie
Activity: 35
Merit: 0
Any updates?

Memory is the big issue right now, trying to run simulations to determine how much ram will be needed.
I did runs with 4, 6, 7, and 8 symbolic input bits to see the peak working set of the mathematica kernel.  I picked bits from nonce to be symbolic, and the fixed bits came from a previously solved blockheader.

4 bits: 79,312 K
6 bits: 84,720 K
7 bits: 89,372 K
8 bits: 850,800 K

What's odd is it looked very linear until 8 bits were reached;  It's running now with 9 bits across 4 cores, I'll have those results in about 30 hours.

I also computed the equation length after PolynomialReduce at each node in the tree;  I trust all the data except for '7 bits' where I maybe have recorded it wrong:

4 bits: median: 25, max: 62
6 bits: median: 170, max:222
7 bits: median: 379, max:495
8 bits: median: 919, max:1218

There the growth does appear linear.

For 8 bits I also computed the number of nodes that contained two leaf node children.  This is essentially the number of equations that can be simplified in parallel (before any potential adjustments to the tree.) Is there a word for this type of node?

median: 17, max:96

I can post the output equations if anyone is interested.

Lastly, I would be open to try any other symbolic computation engines that support modulus->2 to compare speed and memory consumption.   Any suggestions? Smiley

newbie
Activity: 46
Merit: 0
Thoughts? Make sure you have 32GB of RAM installed   Grin

Do you guys know how often 'Version' changes in the block header?  Or more specifically, can any bits in the block header can be treated as constant for some extended period of time, for the purposes of mining?

I am running into some memory concerns, mostly due to all the parallelization.  The fewer variables in these equations the better Smiley

Any updates?

hero member
Activity: 524
Merit: 500
newbie
Activity: 46
Merit: 0
I am running into some memory concerns, mostly due to all the parallelization.  The fewer variables in these equations the better Smiley

http://jheusser.github.io/2013/02/03/satcoin.html

SAT solvers tried, didn't work.  But I'd still like to see a set of 256 monster logic formula with 2048 bit inputs for the SHA256 round function.  Smiley
newbie
Activity: 46
Merit: 0
Thoughts? Make sure you have 32GB of RAM installed   Grin

Do you guys know how often 'Version' changes in the block header?  Or more specifically, can any bits in the block header can be treated as constant for some extended period of time, for the purposes of mining?

I am running into some memory concerns, mostly due to all the parallelization.  The fewer variables in these equations the better Smiley

The last version change IIRC was at the last 50%+1 blockchain split earlier in the year.
newbie
Activity: 46
Merit: 0
your c1...c31 bits can be reduced by one instruction Smiley

c1 == a1b1 + a1c0 + b1c0
   == a1(b1+c0) + b1c0

Yes, but at the expense of 1 extra depth in the circuit.  The form on the site is for shortest path - ideal for implementation.


c1 == a1b1 + a1c0 + b1c0
   == a1(b1+c0) + b1c0
or
   == a1b1 + c0(a1 + b1)
or
   == b1(a1 + c0) + a1c0
hero member
Activity: 524
Merit: 500
can any bits in the block header can be treated as constant
hashMerkleRoot Grin
newbie
Activity: 35
Merit: 0
Thoughts? Make sure you have 32GB of RAM installed   Grin

Do you guys know how often 'Version' changes in the block header?  Or more specifically, can any bits in the block header can be treated as constant for some extended period of time, for the purposes of mining?

I am running into some memory concerns, mostly due to all the parallelization.  The fewer variables in these equations the better Smiley
sr. member
Activity: 420
Merit: 250
★☆★777Coin★☆★
Minning could return to normal people, minning Undecided
full member
Activity: 286
Merit: 100
Hehe.  I like encouraging people to explore this.  Good learning "exorcize" (pun).  32 bit addition logic for botnet: http://jlcooke.ca/btc/sha256_logred.html

Just do that about 768 times and reduce!  Tongue

your c1...c31 bits can be reduced by one instruction Smiley

c1 == a1b1 + a1c0 + b1c0
   == a1(b1+c0) + b1c0




Replacing ADD with MUL makes no difference. It does indeed, since I didn't see that a1b1 was a multiplication, not a value.

Matthew:out
newbie
Activity: 35
Merit: 0
Hehe.  I like encouraging people to explore this.  Good learning "exorcize" (pun).  32 bit addition logic for botnet: http://jlcooke.ca/btc/sha256_logred.html

Just do that about 768 times and reduce!  Tongue

your c1...c31 bits can be reduced by one instruction Smiley

c1 == a1b1 + a1c0 + b1c0
   == a1(b1+c0) + b1c0


full member
Activity: 238
Merit: 100
Bitcoin For All
Well if you just want more parallel processors to tackle this...
http://www.parallella.org/board/

You'll wait till October (Sound familiar?) -- but what the heck -- you can build up a stack -- 16 cores at a time.

I've worked with NP optimization programs for many years -- there is usually someway to organize the problem to get faster results -- I own some IP in this area and I'll leave it at that.
full member
Activity: 196
Merit: 100
ooooHHHH Math, we meet again Undecided

Goodness if that Pic is you, We need to meet...   Shocked
sr. member
Activity: 354
Merit: 250
ooooHHHH Math, we meet again Undecided
full member
Activity: 196
Merit: 100
Interesting..  The Math is over my head..   Shocked
hero member
Activity: 524
Merit: 500
I rewrote the code so that it performs the double SHA256 first, constructing an in-memory tree of all the operations and variables (without actually evaluating them, arithmetically or symbolically.)
I did similar thing for GPU miner, but on 32bit words level. First compiled with my own class instead of int, thus stripping off all the preprocessing and inlining stuff, then generated C code, compiled it with -O3 -S -masm=intel, lifted assembly code back to C, gave a final test run on CPU and fed resulting mess to OpenCL compiler  Grin Grin Grin
hero member
Activity: 524
Merit: 500
32 bit addition logic for botnet: http://jlcooke.ca/btc/sha256_logred.html
Just do that about 768 times and reduce!  Tongue
You seems to be inspired by my signature Smiley Yse, skipping optimized n-ary additions while in 32 bit world would keep established ASIC manufacturers happy Smiley
Pages:
Jump to: