With the 12-18 build I get around 350 khash/s, and my command line is: cudaminer -r 10 -R 30 -T 30 -d 0 -H 1 -i 0 -C 2 -l K8x32
With the 01-20 build (I created on my machine using VS2012) I get about 120 khash/s, and this iteration I ran this command line: cudaminer -d 0 -H 1 -i 0 -C 2 -m 1 -l K32x32
Have you tried with -C 0 and passing -m 1 in the second case? This would mimick the behavior of the 2013-12-18 version.
Also why not use the new Y kernel submitted by nVidia recently. Autotune it and you might find that it is faster.
Thanks for the suggestions! Running -m 1 and -C 0 and -l auto ended up with these results:
*** CudaMiner for nVidia GPUs by Christian Buchner ***
This is version 2014-01-20 (beta)
based on pooler-cpuminer 2.3.2 (c) 2010 Jeff Garzik, 2012 pooler
Cuda additions Copyright 2013,2014 Christian Buchner
My donation address: LKS1WDKGED647msBQfLBHV3Ls8sveGncnm
[2014-02-01 09:12:19] 1 miner threads started, using 'scrypt' algorithm.
[2014-02-01 09:12:19] Starting Stratum on stratum+tcp://[address-omitted.org]
[2014-02-01 09:12:20] Stratum detected new block
[2014-02-01 09:12:20] GPU #0: GeForce GTX 680 with compute capability 3.0
[2014-02-01 09:12:20] GPU #0: interactive: 0, tex-cache: 0 , single-alloc: 1
[2014-02-01 09:12:20] GPU #0: 8 hashes / 1.0 MB per warp.
[2014-02-01 09:12:21] GPU #0: Performing auto-tuning (Patience...)
[2014-02-01 09:12:21] GPU #0: maximum total warps (BxW): 1126
[2014-02-01 09:12:50] Stratum detected new block
[2014-02-01 09:15:18] Stratum detected new block
[2014-02-01 09:16:06] Stratum detected new block
[2014-02-01 09:16:40] GPU #0: 175048.53 hash/s with configuration K372x3
[2014-02-01 09:16:40] GPU #0: using launch configuration K372x3
[2014-02-01 09:16:40] GPU #0: GeForce GTX 680, 35.01 khash/s
[2014-02-01 09:16:57] GPU #0: GeForce GTX 680, 127.79 khash/s
[2014-02-01 09:17:38] GPU #0: GeForce GTX 680, 123.76 khash/s
[2014-02-01 09:17:39] accepted: 1/1 (100.00%), 123.76 khash/s (yay!!!)
[2014-02-01 09:18:24] GPU #0: GeForce GTX 680, 124.87 khash/s
[2014-02-01 09:18:24] accepted: 2/2 (100.00%), 124.87 khash/s (yay!!!)
Not sure why auto-tune is reaching so hard on the launch configuration calculations... last time it was 64x16, this time it's a whopping 372x3! And it's still selecting the K kernel instead of the Y kernel, but I'm guessing that's just autotune's process thinking the K kernel would be a better choice.
-----
I decided to specify the Y kernel. I tried several configurations, including Y64x16, Y64x8, Y32x32, Y16x32 and Y32x16... all of them crashed, then finally I specified Y8x32 and got this:
*** CudaMiner for nVidia GPUs by Christian Buchner ***
This is version 2014-01-20 (beta)
based on pooler-cpuminer 2.3.2 (c) 2010 Jeff Garzik, 2012 pooler
Cuda additions Copyright 2013,2014 Christian Buchner
My donation address: LKS1WDKGED647msBQfLBHV3Ls8sveGncnm
[2014-02-01 09:28:49] 1 miner threads started, using 'scrypt' algorithm.
[2014-02-01 09:28:49] Starting Stratum on stratum+tcp://[address-omitted.org]
[2014-02-01 09:28:50] GPU #0: GeForce GTX 680 with compute capability 3.0
[2014-02-01 09:28:50] GPU #0: interactive: 0, tex-cache: 0 , single-alloc: 1
[2014-02-01 09:28:50] GPU #0: 32 hashes / 4.0 MB per warp.
[2014-02-01 09:28:50] GPU #0: using launch configuration Y8x32
[2014-02-01 09:28:50] GPU #0: GeForce GTX 680, 33.85 khash/s
[2014-02-01 09:29:07] GPU #0: GeForce GTX 680, 121.87 khash/s
[2014-02-01 09:29:13] GPU #0: GeForce GTX 680, 117.39 khash/s
[2014-02-01 09:29:13] accepted: 1/1 (100.00%), 117.39 khash/s (yay!!!)
[2014-02-01 09:30:11] GPU #0: GeForce GTX 680, 121.75 khash/s
[2014-02-01 09:30:15] GPU #0: GeForce GTX 680, 119.19 khash/s
[2014-02-01 09:30:16] accepted: 2/2 (100.00%), 119.19 khash/s (yay!!!)
[2014-02-01 09:30:23] GPU #0: GeForce GTX 680, 118.83 khash/s
[2014-02-01 09:30:23] accepted: 3/3 (100.00%), 118.83 khash/s (yay!!!)
Setting -C 2 and omitting the -m 1 option bumps the numbers up very slightly (to around 122 khash/s). Still can't get up over 300 khash/s for some reason. Any additional thoughts would be appreciated, I'm glad to test any option(s) you can think of or recompile in some other way. Heck I'll even dive through some of the code if you have a logical way for me to debug it!
-----
I'm all for experimentation to find the optimal configuration, but I'm not sure there is an optimal configuration for the code/build I have right now that's superior to the 12-18 release from last month. The only other alternative I can come up with is I built the wrong executable (Release|x64) or I copied the wrong executable/DLLs in when performing my test(s).
-----
EDIT: I just noticed you made a commit about 30 minutes ago, so I pulled that down and compiled it as well, just in case there would be any change. Sadly the numbers above still stand even with the newest build.