If anybody is interested, this is the assembly code from Intel Compiler which gives me the fastest 4way code on AMD K10. 1.13
GH/s MH/s per 1 GHz per physical core.
https://gist.github.com/853566Compile cpuminer-0.7.1, download the above code, issue:
gcc -c sha256_4way.s and do
make again to link the object file to the executable. It's about 7% faster than the gcc version.
That is what I get
sha256_4way.s:11221: Error: bad register name `%rbp'
sha256_4way.s:11223: Error: bad register name `%rbx'
sha256_4way.s:11225: Error: bad register name `%r15'
sha256_4way.s:11227: Error: bad register name `%r14'
sha256_4way.s:11229: Error: bad register name `%r13'
sha256_4way.s:11231: Error: bad register name `%r12'
a lot of this when i tried
gcc -c sha256_4way.s