---snipped to save space---
Setting -C 2 and omitting the -m 1 option bumps the numbers up very slightly (to around 122 khash/s). Still can't get up over 300 khash/s for some reason. Any additional thoughts would be appreciated, I'm glad to test any option(s) you can think of or recompile in some other way. Heck I'll even dive through some of the code if you have a logical way for me to debug it!
-----
I'm all for experimentation to find the optimal configuration, but I'm not sure there is an optimal configuration for the code/build I have right now that's superior to the 12-18 release from last month. The only other alternative I can come up with is I built the wrong executable (Release|x64) or I copied the wrong executable/DLLs in when performing my test(s).
-----
EDIT: I just noticed you made a commit about 30 minutes ago, so I pulled that down and compiled it as well, just in case there would be any change. Sadly the numbers above still stand even with the newest build.
Firestar,
If you think the executable you built might be suspect, try one of the later ones posted in this thread, this is what I'm able to get with a GTX 680 ~410 khash with
-d 0 -H 1 -m 1 -l Y8x32 The GTX 680 is overclocked to 1345Mhz, and it runs ~71C
I just tried the cudaminer build
provided here and I'm closer to 360 khash/s now, which is a 1-1.5% improvement over K kernel. I have no idea why my build is so different (yet works)! I noticed that the build I linked was built for x86, I might try doing that on my machine and see how it turns out.
EDIT: Just tried it with your settings, wow... that's quite the improvement! Up to about 380 khash/s now. Thanks for the suggestions! Specifying -C 2 was the culprit--that option shaves off about 25-30 khash/s on my rig.