Potentially faster method for mining on the CPU - page 2.

jlcooke

newbie

Activity: 46

Merit: 0

Quote from: botnet on August 26, 2013, 01:40:30 AM

To generate formulas, I do an in-order walk of the tree, and simplify at each level going up... although I'm not sure I fully understand your quesiton.

Quick update, I've been trying out Maple as a potential replacement for Mathematica. BooleanSimplify chokes on even 8 bits, but doing simplify( (arithmetic expression mod 2, {a*a-a, b*b-b, c*c-c, ....}) does way better. Maple seems to evaluate these expressions an order of magnitude faster than mathematica, although the resulting equations aren't as compact. It also consumed a modest 300 megs of memory for 8 bits. Still learning the maple syntax so things might improve.

OK I think I see now.

I'll PM you a link to my C code for logic reduction in a few mins. You can compile C right? And know how to code? It should be quite a bit faster than Mathmatica or Maple.

It uses de Morgan's rules and some basic reduction rules to optimize the circuit.

!(AB) = !A + !B,
A = A + AB, and
A + B = A + !AB

It takes about 4 hours to complete a single 32-bit addition. The API has all the basic operations you want (AND, OR, XOR, NOT, etc).

botnet

newbie

Activity: 35

Merit: 0

Quote from: jlcooke on August 23, 2013, 12:06:21 PM

Cool. How are you generating the formulas? Iterating through each input possibility?

The 8th bit "explosion" is probably due to triggering the use of a 2nd path of 32bit addition.

The formulas for 32bit add (64 in, 32 out) get very large very quickly (http://jlcooke.ca/btc/sha256_logred_formulas.html) due to carry bits. Result r_31 depends on a_31 .. a_0 and b_31 .. b_0.

To generate formulas, I do an in-order walk of the tree, and simplify at each level going up... although I'm not sure I fully understand your quesiton.

Quick update, I've been trying out Maple as a potential replacement for Mathematica. BooleanSimplify chokes on even 8 bits, but doing simplify( (arithmetic expression mod 2, {a*a-a, b*b-b, c*c-c, ....}) does way better. Maple seems to evaluate these expressions an order of magnitude faster than mathematica, although the resulting equations aren't as compact. It also consumed a modest 300 megs of memory for 8 bits. Still learning the maple syntax so things might improve.

jlcooke

newbie

Activity: 46

Merit: 0

Quote from: botnet on August 23, 2013, 11:51:08 AM

Quote from: jlcooke on August 22, 2013, 01:57:07 PM

Any updates?

Memory is the big issue right now, trying to run simulations to determine how much ram will be needed.
I did runs with 4, 6, 7, and 8 symbolic input bits to see the peak working set of the mathematica kernel. I picked bits from nonce to be symbolic, and the fixed bits came from a previously solved blockheader.

4 bits: 79,312 K
6 bits: 84,720 K
7 bits: 89,372 K
8 bits: 850,800 K

What's odd is it looked very linear until 8 bits were reached; It's running now with 9 bits across 4 cores, I'll have those results in about 30 hours.

I also computed the equation length after PolynomialReduce at each node in the tree; I trust all the data except for '7 bits' where I maybe have recorded it wrong:

4 bits: median: 25, max: 62
6 bits: median: 170, max:222
7 bits: median: 379, max:495
8 bits: median: 919, max:1218

There the growth does appear linear.

For 8 bits I also computed the number of nodes that contained two leaf node children. This is essentially the number of equations that can be simplified in parallel (before any potential adjustments to the tree.) Is there a word for this type of node?

median: 17, max:96

I can post the output equations if anyone is interested.

Lastly, I would be open to try any other symbolic computation engines that support modulus->2 to compare speed and memory consumption. Any suggestions?

Cool. How are you generating the formulas? Iterating through each input possibility?

The 8th bit "explosion" is probably due to triggering the use of a 2nd path of 32bit addition.

The formulas for 32bit add (64 in, 32 out) get very large very quickly (http://jlcooke.ca/btc/sha256_logred_formulas.html) due to carry bits. Result r_31 depends on a_31 .. a_0 and b_31 .. b_0.

botnet

newbie

Activity: 35

Merit: 0

Quote from: jlcooke on August 22, 2013, 01:57:07 PM

Any updates?

Memory is the big issue right now, trying to run simulations to determine how much ram will be needed.
I did runs with 4, 6, 7, and 8 symbolic input bits to see the peak working set of the mathematica kernel. I picked bits from nonce to be symbolic, and the fixed bits came from a previously solved blockheader.

4 bits: 79,312 K
6 bits: 84,720 K
7 bits: 89,372 K
8 bits: 850,800 K

What's odd is it looked very linear until 8 bits were reached; It's running now with 9 bits across 4 cores, I'll have those results in about 30 hours.

I also computed the equation length after PolynomialReduce at each node in the tree; I trust all the data except for '7 bits' where I maybe have recorded it wrong:

4 bits: median: 25, max: 62
6 bits: median: 170, max:222
7 bits: median: 379, max:495
8 bits: median: 919, max:1218

There the growth does appear linear.

For 8 bits I also computed the number of nodes that contained two leaf node children. This is essentially the number of equations that can be simplified in parallel (before any potential adjustments to the tree.) Is there a word for this type of node?

median: 17, max:96

I can post the output equations if anyone is interested.

Lastly, I would be open to try any other symbolic computation engines that support modulus->2 to compare speed and memory consumption. Any suggestions?

jlcooke

newbie

Activity: 46

Merit: 0

Quote from: botnet on August 19, 2013, 11:03:09 AM

Quote from: jesse11 on August 02, 2013, 04:54:00 PM

Thoughts? Make sure you have 32GB of RAM installed Grin

Do you guys know how often 'Version' changes in the block header? Or more specifically, can any bits in the block header can be treated as constant for some extended period of time, for the purposes of mining?

I am running into some memory concerns, mostly due to all the parallelization. The fewer variables in these equations the better

Any updates?

smolen

hero member

Activity: 524

Merit: 500

Quote from: jlcooke on August 20, 2013, 04:12:13 PM

http://jheusser.github.io/2013/02/03/satcoin.html
SAT solvers tried, didn't work

...in such a naïve way

jlcooke

newbie

Activity: 46

Merit: 0

Quote from: botnet on August 19, 2013, 11:03:09 AM

I am running into some memory concerns, mostly due to all the parallelization. The fewer variables in these equations the better

http://jheusser.github.io/2013/02/03/satcoin.html

SAT solvers tried, didn't work. But I'd still like to see a set of 256 monster logic formula with 2048 bit inputs for the SHA256 round function.

jlcooke

newbie

Activity: 46

Merit: 0

Quote from: botnet on August 19, 2013, 11:03:09 AM

Quote from: jesse11 on August 02, 2013, 04:54:00 PM

Thoughts? Make sure you have 32GB of RAM installed Grin

Do you guys know how often 'Version' changes in the block header? Or more specifically, can any bits in the block header can be treated as constant for some extended period of time, for the purposes of mining?

I am running into some memory concerns, mostly due to all the parallelization. The fewer variables in these equations the better

The last version change IIRC was at the last 50%+1 blockchain split earlier in the year.

jlcooke

newbie

Activity: 46

Merit: 0

Quote from: botnet on August 17, 2013, 04:31:37 AM

your c1...c31 bits can be reduced by one instruction

c1 == a1b1 + a1c0 + b1c0
== a1(b1+c0) + b1c0

Yes, but at the expense of 1 extra depth in the circuit. The form on the site is for shortest path - ideal for implementation.

c1 == a1b1 + a1c0 + b1c0
== a1(b1+c0) + b1c0
or
== a1b1 + c0(a1 + b1)
or
== b1(a1 + c0) + a1c0

smolen

hero member

Activity: 524

Merit: 500

Quote from: botnet on August 19, 2013, 11:03:09 AM

can any bits in the block header can be treated as constant

hashMerkleRoot Grin

botnet

newbie

Activity: 35

Merit: 0

Quote from: jesse11 on August 02, 2013, 04:54:00 PM

Thoughts? Make sure you have 32GB of RAM installed Grin

Do you guys know how often 'Version' changes in the block header? Or more specifically, can any bits in the block header can be treated as constant for some extended period of time, for the purposes of mining?

I am running into some memory concerns, mostly due to all the parallelization. The fewer variables in these equations the better

faiza1990

sr. member

Activity: 420

Merit: 250

★☆★777Coin★☆★

Minning could return to normal people, minning Undecided

ZirconiumX

full member

Activity: 286

Merit: 100

Quote from: botnet on August 17, 2013, 04:31:37 AM

Quote from: jlcooke on August 15, 2013, 03:41:58 PM

Hehe. I like encouraging people to explore this. Good learning "exorcize" (pun). 32 bit addition logic for botnet: http://jlcooke.ca/btc/sha256_logred.html

Just do that about 768 times and reduce! Tongue

your c1...c31 bits can be reduced by one instruction

c1 == a1b1 + a1c0 + b1c0
== a1(b1+c0) + b1c0

~~Replacing ADD with MUL makes no difference.~~ It does indeed, since I didn't see that a₁b₁ was a multiplication, not a value.

Matthew:out

botnet

newbie

Activity: 35

Merit: 0

Quote from: jlcooke on August 15, 2013, 03:41:58 PM

Hehe. I like encouraging people to explore this. Good learning "exorcize" (pun). 32 bit addition logic for botnet: http://jlcooke.ca/btc/sha256_logred.html

Just do that about 768 times and reduce! Tongue

your c1...c31 bits can be reduced by one instruction

c1 == a1b1 + a1c0 + b1c0
== a1(b1+c0) + b1c0

davewr2013

full member

Activity: 238

Merit: 100

Bitcoin For All

Well if you just want more parallel processors to tackle this...
http://www.parallella.org/board/

You'll wait till October (Sound familiar?) -- but what the heck -- you can build up a stack -- 16 cores at a time.

I've worked with NP optimization programs for many years -- there is usually someway to organize the problem to get faster results -- I own some IP in this area and I'll leave it at that.

Kuroth

full member

Activity: 196

Merit: 100

Quote from: cassieheart on August 16, 2013, 09:14:32 AM

ooooHHHH Math, we meet again Undecided

Goodness if that Pic is you, We need to meet... Shocked

cassieheart

sr. member

Activity: 354

Merit: 250

ooooHHHH Math, we meet again Undecided

Kuroth

full member

Activity: 196

Merit: 100

Interesting.. The Math is over my head.. Shocked

smolen

hero member

Activity: 524

Merit: 500

Quote from: botnet on August 15, 2013, 01:02:03 AM

I rewrote the code so that it performs the double SHA256 first, constructing an in-memory tree of all the operations and variables (without actually evaluating them, arithmetically or symbolically.)

I did similar thing for GPU miner, but on 32bit words level. First compiled with my own class instead of int, thus stripping off all the preprocessing and inlining stuff, then generated C code, compiled it with -O3 -S -masm=intel, lifted assembly code back to C, gave a final test run on CPU and fed resulting mess to OpenCL compiler Grin

smolen

hero member

Activity: 524

Merit: 500

Quote from: jlcooke on August 15, 2013, 03:41:58 PM

32 bit addition logic for botnet: http://jlcooke.ca/btc/sha256_logred.html
Just do that about 768 times and reduce! Tongue

You seems to be inspired by my signature

Yse, skipping optimized n-ary additions while in 32 bit world would keep established ASIC manufacturers happy

Topic: Potentially faster method for mining on the CPU - page 2. (Read 14431 times)