the github repo contains something quite significant: SHA256 was moved onto the GPU! It isn't fully optimized code yet, but it works. Not only does it give an effective speed-up, it also lowers the CPU load to near zero.
Now when I upgrade the PSU on my dedicated mining rig to 1.1 kW, I should be able to power 3 GTX 780 TI cards, hopefully. At the moment the "800W" PSU craps out when I run two cards under load. The 12V Rail of that thing is just so under-dimensioned, it's not funny.
I also found the cause for validation problems on some newer cards (the code to detect the card's ability for overlapping kernel execution was broken)
the brave can try github.... The not so brave have to wait for a new release...
-H 0 : single threaded CPU SHA256 hashing
-H 1 : multi threaded CPU SHA256 hashing
-H 2 : GPU based SHA256 hashing (now the default)
I also found out that my code to overlap memory transfers and kernels was completely NOT working. Which is why moving the SHA256 part to the GPU results in an effective speed-up (there's now only memory copies from the GPU to the CPU - and it is much less data!). I will fix mentioned problem when I am in a fixing mood
Christian