I've been playing with the lookup gap. My GTX 780 went from 3.7 kH/s to 5.0 kH/s on Yacoin with -L 5. (-L 4 produces 4.948, -L 3 4.535, and -L 2 was almost no improvement)
My GT 640s received no benefit at N Factor 14. At N Factor 15 they produce 0.684 kH/s. If I recall they maxed out at at 0.6 previously.
I've been mining an N Factor 13 coin as of late, and my 780 went from 10.7 to 16.0 kH/s with -L 3. Cudaminer does not fair well with NF=13 when I try to autotune with -L 3 if I don't also specify -lT.
Masochist:CudaMiner mark$ ./cudaminer --algo=scrypt-jane:13 -d0 -m1 -i0 -L3 --benchmark -D
*** CudaMiner for nVidia GPUs by Christian Buchner ***
This is version 2013-12-18 (beta)
based on pooler-cpuminer 2.3.2 (c) 2010 Jeff Garzik, 2012 pooler
Cuda additions Copyright 2013 Christian Buchner
My donation address: LKS1WDKGED647msBQfLBHV3Ls8sveGncnm
[2014-01-18 22:09:13] 1 miner threads started, using 'scrypt-jane' algorithm.
[2014-01-18 22:09:13] DEBUG: got new work in 1 ms
[2014-01-18 22:09:13] Given scrypt-jane parameters: 13
[2014-01-18 22:09:13] Nfactor is 13 (N=16384)!
[2014-01-18 22:09:20] GPU #0: GeForce GTX 780 with compute capability 3.5
[2014-01-18 22:09:20] GPU #0: interactive: 0, tex-cache: 0 , single-alloc: 1
[2014-01-18 22:09:20] GPU #0: 8 hashes / 5.3 MB per warp.
[2014-01-18 22:09:22] GPU #0: Performing auto-tuning (Patience...)
[2014-01-18 22:09:22] GPU #0: cudaError 2 (out of memory) calling 'cudaMalloc((void **) &d_idata, mem_size)' (salsa_kernel.cu line 499)
[2014-01-18 22:09:22] GPU #0: cudaError 2 (out of memory) calling 'cudaMalloc((void **) &d_odata, mem_size)' (salsa_kernel.cu line 501)
[2014-01-18 22:09:22] GPU #0: cudaError 11 (invalid argument) calling 'cudaMemcpy(d_idata, h_idata, mem_size, cudaMemcpyHostToDevice)' (salsa_kernel.cu line 506)
[2014-01-18 22:09:22] GPU #0: maximum warps: 527
[2014-01-18 22:09:22] GPU #0: cudaError 4 (unspecified launch failure) calling 'cudaDeviceSynchronize()' (salsa_kernel.cu line 534)
It proceeds to repeat the last line for each warp in every block. More than a little spammy, though I guess I'm asking with it with the debug flag on, but I'm not really interested in waiting several hours for autotune to complete to find out what my optimal config is, particularly when I'm testing multiple lookup gaps with multiple N factors.
It seems to autotune fine with -L3 if I specify -lT. It will not autotune with -L4 in any case, and will fail in the same fashion as above, although if I just give it a config, -L4 seems to work fine. I'm not too worried about it, looking at the -L2 and -L3 charts makes it pretty clear I was experiencing diminishing returns.