Pages:
Author

Topic: 4 hashes parallel on SSE2 CPUs for 0.3.6 - page 5. (Read 22072 times)

sr. member
Activity: 337
Merit: 285
That's amazing...

So are you saying you use 128-bit registers to SIMD four 32-bit data at once?  I've wondered about that for a long time, but I didn't think it would be possible due to addition carrying into the neighbour's value.
That's how it works. Four 32 bit values in a 128 bit vector. They're calculated independently, but at the same time.

Btw. Why are you using this alignup<16> function when __attribute__ ((aligned (16))) will tell the compiler to align at compiletime?
sr. member
Activity: 308
Merit: 258
Darn, it means the next release, the difficulty is going to have to increase to 1000 or so to keep up, LOL  Grin
founder
Activity: 364
Merit: 7553
That's amazing...

So are you saying you use 128-bit registers to SIMD four 32-bit data at once?  I've wondered about that for a long time, but I didn't think it would be possible due to addition carrying into the neighbour's value.
sr. member
Activity: 337
Merit: 285
Tell me if it works Smiley
Donations are welcome. 17asVKkzRGTFvvGH9dMGQaHe78xzfvgSSA
sr. member
Activity: 308
Merit: 258
Awesome, I'll have to give it a try myself then.  Shocked
sr. member
Activity: 337
Merit: 285
Performance of stock code (as measured by my test/benchmark program) is about 1500khash/s.
My code does 3500khash/s. Both figures are for one core. It scales well because I do 128 hashes at once and keep the datastructures small enough to fit in the CPU cache.

I have two local collision attacks which will squeeze another 300khash/s out, but they are not stable yet.
sr. member
Activity: 308
Merit: 258
I take it that you've already tested the hash limit before performance starts to suffer against the stock code? I'm just curious myself.
sr. member
Activity: 337
Merit: 285
This patch will calculate four hashes on one core using vector instructions. There's a test programm included that validates the new hash function against the old one so it should be correct.

The patch is against 0.3.6. Improves khash/s by roughly 115%.

http://pastebin.com/XN1JDb53
Pages:
Jump to: