Pages:
Author

Topic: An (even more) optimized version of cpuminer (pooler's cpuminer, CPU-only) - page 70. (Read 1958260 times)

hero member
Activity: 838
Merit: 507
In case you're wondering why the SSE2 version sucks on K8 and K10 ... reason is rather simple.
the salsa20 function is a long string of data dependent 4*32-bit vector integer operations (i.e. output of one operation is used as input to the next).
And the execution latencies for the most used instructions in the salsa20 core (shift r/l immediate, add, xor) are all 2 clocks on K8/K10, all 1 clock on Atom/Core/Core2/Nehalem/SB.
End result ... sse2 salsa20 needs roughly twice the clocks/round on AMD compared to any modern intel.

Thank you for your insight, ArtForz!
Yes, I think I have read somewhere that since the Core architecture Intel CPUs can actually handle SSE registers 128 bits at a time.
I have never been too fond of Intel, but it's nice to see that sometimes you get what you pay for! Smiley
full member
Activity: 128
Merit: 100
Awesome:
Intel i7 2600k  @ 4.2ghz - 4 threads - Old: 2.8khash/s New: 5.3-5.5khash/s
sr. member
Activity: 309
Merit: 250
Binaries for Windows now available, thanks diki!
https://github.com/downloads/pooler/cpuminer/pooler-cpuminer-win32.zip
https://github.com/downloads/pooler/cpuminer/pooler-cpuminer-win64.zip
(Please let me know if the packages miss any dynamic libraries.)
Yeah, on Win7 64, I can't find libeay32.dll. I grabbed one from a version of OpenSSL, but it complains about missing functions, etc.

Thank you for reporting this, I have added the DLL to the package.

this works fantastic now with that DLL:

core i7 w7-64 / 6 threads / old 2.6 KH/s/thread / new 5.3 KH/s/thread
full member
Activity: 154
Merit: 101
Bitcoin!
I went from 6.20 MH/s to 6.52 MH/s on my AMD box.  Not a *huge* improvement, but I'll definitely take it. Smiley  I'll test on an Intel box next.
vip
Activity: 980
Merit: 1001
Binaries for Windows now available, thanks diki!
https://github.com/downloads/pooler/cpuminer/pooler-cpuminer-win32.zip
https://github.com/downloads/pooler/cpuminer/pooler-cpuminer-win64.zip
(Please let me know if the packages miss any dynamic libraries.)
Yeah, on Win7 64, I can't find libeay32.dll. I grabbed one from a version of OpenSSL, but it complains about missing functions, etc.

Thank you for reporting this, I have added the DLL to the package.
thanks for the quick fix Smiley
sr. member
Activity: 309
Merit: 250
Binaries for Windows now available, thanks diki!
https://github.com/downloads/pooler/cpuminer/pooler-cpuminer-win32.zip
https://github.com/downloads/pooler/cpuminer/pooler-cpuminer-win64.zip
(Please let me know if the packages miss any dynamic libraries.)
Yeah, on Win7 64, I can't find libeay32.dll. I grabbed one from a version of OpenSSL, but it complains about missing functions, etc.

the same is missing here W7-64...

w7-32 works very fine:
i7 920 / 7 threads from 1.5 KH/s/thread to 3.1 KH/s/thread

WOW - more then doubled
hero member
Activity: 838
Merit: 507
Binaries for Windows now available, thanks diki!
https://github.com/downloads/pooler/cpuminer/pooler-cpuminer-win32.zip
https://github.com/downloads/pooler/cpuminer/pooler-cpuminer-win64.zip
(Please let me know if the packages miss any dynamic libraries.)
Yeah, on Win7 64, I can't find libeay32.dll. I grabbed one from a version of OpenSSL, but it complains about missing functions, etc.

Thank you for reporting this, I have added the DLL to the package.
newbie
Activity: 21
Merit: 0
Binaries for Windows now available, thanks diki!
https://github.com/downloads/pooler/cpuminer/pooler-cpuminer-win32.zip
https://github.com/downloads/pooler/cpuminer/pooler-cpuminer-win64.zip
(Please let me know if the packages miss any dynamic libraries.)
Yeah, on Win7 64, I can't find libeay32.dll. I grabbed one from a version of OpenSSL, but it complains about missing functions, etc.
sr. member
Activity: 406
Merit: 257
In case you're wondering why the SSE2 version sucks on K8 and K10 ... reason is rather simple.
the salsa20 function is a long string of data dependent 4*32-bit vector integer operations (i.e. output of one operation is used as input to the next).
And the execution latencies for the most used instructions in the salsa20 core (shift r/l immediate, add, xor) are all 2 clocks on K8/K10, all 1 clock on Atom/Core/Core2/Nehalem/SB.
End result ... sse2 salsa20 needs roughly twice the clocks/round on AMD compared to any modern intel.
hero member
Activity: 838
Merit: 507
newbie
Activity: 21
Merit: 0
I'm trying to compile this for Mac and got this error:
Coblee, yeah -- those are some of the issues we worked through last night. It took hours, because pooler doesn't have a mac, and I don't have any assembly skillz. Smiley

The short version is, all macros need to be expanded (ie, eliminated), and then if he hasn't changed them, a few MOVQ ops need to be changed to the incorrect-but-still-works-and-makes-apple-happy MOVD. Smiley

The reason I'm still having problems with 10.4 is that we didn't do any work on scrypt-x86.S, and it has even more macros than the 64 bit one. Smiley

BTW: SockPuppet = trunkboy = shawnp0wers = Shawn Powers from Linux Journal, for those playing at home...
hero member
Activity: 838
Merit: 507
I'm trying to compile this for Mac and got this error:

Quote
gcc -DHAVE_CONFIG_H -I. -pthread -fno-strict-aliasing -I./compat/jansson    -O3 -Wall -msse2 -msse3 -msse4.1 -msse4.2 -msse4 -g -march=core2 -MT minerd-scrypt-x64.o -MD -MP -MF .deps/minerd-scrypt-x64.Tpo -c -o minerd-scrypt-x64.o `test -f 'scrypt-x64.S' || echo './'`scrypt-x64.S
scrypt-x64.S:131:Alignment too large: 15. assumed.
scrypt-x64.S:11:expecting operand before ','; got nothing
scrypt-x64.S:11:expecting operand before ','; got nothing
scrypt-x64.S:11:expecting operand before ','; got nothing
scrypt-x64.S:11:expecting operand before ','; got nothing
scrypt-x64.S:11:suffix or operands invalid for `rol'
scrypt-x64.S:11:suffix or operands invalid for `rol'

... snip ...

scrypt-x64.S:566:expecting operand before ','; got nothing
scrypt-x64.S:566:expecting operand before ','; got nothing
scrypt-x64.S:566:suffix or operands invalid for `pshufd'
scrypt-x64.S:566:suffix or operands invalid for `pshufd'
scrypt-x64.S:566:suffix or operands invalid for `pshufd'
make[2]: *** [minerd-scrypt-x64.o] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

Any ideas?

That's the same exact problem SockPuppet and I faced yesterday. I still don't know why but apparently the assembler available on MacOS doesn't like my macros.
We finally got it to compile by expanding all macros in the source, but don't ask me to do that again Smiley (ok, if you insist I can send you the temporary patched file.)
I will try to solve the issue with SockPuppet as soon as possible, I am really curious about where the problem actually lies.
legendary
Activity: 1204
Merit: 1000
฿itcoin: Currency of Resistance!
WOW!!!!  From 1.73 to 2.68 khash/s !!!!!!

This is one of my Intel CPU...

The next is one of my AMD CPU (AM3):

from 2.85 to 3.27!!

Awesome work!!! I want to donate some litecoins to you pooler!!  :-D
donator
Activity: 1653
Merit: 1286
Creator of Litecoin. Cryptocurrency enthusiast.
I'm trying to compile this for Mac and got this error:

Quote
gcc -DHAVE_CONFIG_H -I. -pthread -fno-strict-aliasing -I./compat/jansson    -O3 -Wall -msse2 -msse3 -msse4.1 -msse4.2 -msse4 -g -march=core2 -MT minerd-scrypt-x64.o -MD -MP -MF .deps/minerd-scrypt-x64.Tpo -c -o minerd-scrypt-x64.o `test -f 'scrypt-x64.S' || echo './'`scrypt-x64.S
scrypt-x64.S:131:Alignment too large: 15. assumed.
scrypt-x64.S:11:expecting operand before ','; got nothing
scrypt-x64.S:11:expecting operand before ','; got nothing
scrypt-x64.S:11:expecting operand before ','; got nothing
scrypt-x64.S:11:expecting operand before ','; got nothing
scrypt-x64.S:11:suffix or operands invalid for `rol'
scrypt-x64.S:11:suffix or operands invalid for `rol'

... snip ...

scrypt-x64.S:566:expecting operand before ','; got nothing
scrypt-x64.S:566:expecting operand before ','; got nothing
scrypt-x64.S:566:suffix or operands invalid for `pshufd'
scrypt-x64.S:566:suffix or operands invalid for `pshufd'
scrypt-x64.S:566:suffix or operands invalid for `pshufd'
make[2]: *** [minerd-scrypt-x64.o] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

Any ideas?
full member
Activity: 147
Merit: 100
PooL-X.eu
ahh good news Smiley my stock phenom 940 3ghz went from 3,05 to 3,25 kh/s
legendary
Activity: 1204
Merit: 1000
฿itcoin: Currency of Resistance!
WOW!!!!  From 1.73 to 2.68 khash/s !!!!!!
newbie
Activity: 21
Merit: 0
My part was little more than that of a trained monkey, but I was happy to help troubleshoot.

One huge thing we hashed out just last night was the OSX compatibility stuff. I'm not sure if pooler has updated the github repo with the changes required for OSX compilation, but if nothing else I can post an OSX 10.6 compatible binary here.  (It works with 10.7 too, 10.4 requires a bit more work and will be available later...)

http://dl.dropbox.com/u/828037/minerd_for_OSX_10.6-7.zip
legendary
Activity: 889
Merit: 1000
Bitcoin calls me an Orphan
Hi!!

 For Intel CPU, what should be the CFLAGS options?!

 CFLAGS = -g -O3

 --

 And for AMD?

 CFLAGS = -mtune=amdfam10 -O3 -ffast-math -mabm -msse4a -pipe

Thanks!
Thiago

I used CFLAGS="-O3 -Wall -msse2" ./configure for intel.. worked well
hero member
Activity: 838
Merit: 507
Hi!!

 For Intel CPU, what should be the CFLAGS options?!

 CFLAGS = -g -O3

 --

 And for AMD?

 CFLAGS = -mtune=amdfam10 -O3 -ffast-math -mabm -msse4a -pipe

Thanks!
Thiago

Good news: you don't need to worry too much about CFLAGS.
Just use "-O3". gcc cannot optimize assembly code anyway.
newbie
Activity: 22
Merit: 0
Atom 330: 2 kH/s -> 4.8 kH/s

Nice work!
Pages:
Jump to: