For the kernel launch configuration: Take the total number of CUDA cores on your card, multiply by 2, and append x32 for the K kernel and x16 for the F kernel. Don't bother with the T kernel for the moment.
This should give somewhat higher performance than the original 1000x32 or 1024x32 configurations suggested in the README, especially on the biger cards like GTX 660Ti, GTX 780, etc...
Use -L 512 with these huge launch configurations or out of memory is guaranteed. But on the other hand, these out of memory errors don't prevent keccak hashing (the program just continues...). Cleaning up these unnecessary memory allocations would be the next item on my "finish the keccak feature" TODO list.
Christian
heh, its strange, but mine old settings "-l F1536x16 -L 256" still give me 2-3 MHash more than the advised "-l F384x16 -L 512" : 39MHash (old) vs 36-37 (new) ?
GTX550Ti 1Gb VRAM (192 cores), last cudaminer build from 09.02, WinXP 32bit
p.s. Maxcoin it is, stratum pool