(compiled from source https://github.com/uncle-bob/quarkcoin-cpuminer (without CHEAT option))
http://rghost.ru/48483424
And for AMD processor did not notice much of a difference.
Can someone help me out with some test cases? I have three different generations of AMD chips, no Intel. While there is code that successfully uses sse4, I personally do not see a improvement with what I have. Intel may be a different story? The largest problem with increasing the speed, with what I have to test, is size. There is much code and initialization data. At any rate, I for the most part am modifying code originally designed to work fast on considerably larger data. The SSE4 code does a expensive startup and finish, just to do one round. The original post may easily be faster until that is worked on. It's still really close, but is faster.
Anyone that feel like being a guanine pig, run the "makeprof.sh" script and send me the "a.txt" that it produces. It'll just time the hash function and list the cpu used. It could would be greatly appreciated.
Oh, and the CHEAT options is disabled by default, it's a great boost for some old processors that I should just retire. Not so good most of the time.