4 hashes parallel on SSE2 CPUs for 0.3.6 - page 5.

tcatm

sr. member

Activity: 337

Merit: 285

Quote from: satoshi on July 30, 2010, 07:29:20 PM

That's amazing...

So are you saying you use 128-bit registers to SIMD four 32-bit data at once? I've wondered about that for a long time, but I didn't think it would be possible due to addition carrying into the neighbour's value.

That's how it works. Four 32 bit values in a 128 bit vector. They're calculated independently, but at the same time.

Btw. Why are you using this alignup<16> function when __attribute__ ((aligned (16))) will tell the compiler to align at compiletime?

knightmb

sr. member

Activity: 308

Merit: 258

Darn, it means the next release, the difficulty is going to have to increase to 1000 or so to keep up, LOL Grin

satoshi

founder

Activity: 364

Merit: 7553

That's amazing...

So are you saying you use 128-bit registers to SIMD four 32-bit data at once? I've wondered about that for a long time, but I didn't think it would be possible due to addition carrying into the neighbour's value.

tcatm

sr. member

Activity: 337

Merit: 285

Tell me if it works

Donations are welcome. 17asVKkzRGTFvvGH9dMGQaHe78xzfvgSSA

knightmb

sr. member

Activity: 308

Merit: 258

Awesome, I'll have to give it a try myself then. Shocked

tcatm

sr. member

Activity: 337

Merit: 285

Performance of stock code (as measured by my test/benchmark program) is about 1500khash/s.
My code does 3500khash/s. Both figures are for one core. It scales well because I do 128 hashes at once and keep the datastructures small enough to fit in the CPU cache.

I have two local collision attacks which will squeeze another 300khash/s out, but they are not stable yet.

knightmb

sr. member

Activity: 308

Merit: 258

I take it that you've already tested the hash limit before performance starts to suffer against the stock code? I'm just curious myself.

tcatm

sr. member

Activity: 337

Merit: 285

This patch will calculate four hashes on one core using vector instructions. There's a test programm included that validates the new hash function against the old one so it should be correct.

The patch is against 0.3.6. Improves khash/s by roughly 115%.

http://pastebin.com/XN1JDb53

Topic: 4 hashes parallel on SSE2 CPUs for 0.3.6 - page 5. (Read 22072 times)