This is written in the Readme, hope it helps:
Currently there is just one prefix, which is "S". Later releases may
see the introduction of more kernel variants with using other letters.
Examples:
e.g. S27x3 is a launch configuration that works well on GTX 260
28x4 is a launch configuration that works on Geforce GTX 460
290x2 is a launch configuration that works on Geforce GTX 660Ti
You should wait through autotune to see what kernel is found best for
your current hardware configuration.
The choice between Non-Titan and Titan CUDA kernels is automatically
made based on your device's compute capability. Titans cost around
a thousand dollars, so you probably don't have one.
Prefix | Non-Titan | Titan
-------------------------------------------------------
| low shared memory | default kernel
| optimized kernel | with funnel shifter
| |
S | spinlock kernel | spinlock kernel
| for Kepler GPUs | with funnel shifter
Can anyone explain like i'm 5? If my card used to autotune to the default kernal and is now auto tuning to the spinlock kernel will it actually make a difference?
Here's a quote from Christian which goes into more detail about the -l flag format that helped me understand it much better than the README:
Can someone explain this "64x2" "S27x3,28x4" thing to me or point me in the right direction on reading up on it??
I have a 580gtx and i'm trying to figure out the best set up
well i did not figure out the meaning, but if you run it and let it autotune, it will choose automatically what's the best one (then you can add the flag in the batch file, like -l 112x2 for me)
112x2 means it throws 112 blocks at CUDA, and each consists of 2 warps. A warp is a group of 32 threads.
So in total it computes 112*2*32 = 7168 hashes in parallel in a single CUDA kernel launch.
And because the scrypt scratchpad is 131072 bytes long, this would consume 7168*131072 bytes of memory
on the card. That's about 917 MB.
So, as I understand it...
The kernel selection options (by autotune or -l flag) prior to the 2013-04-30 release were of the format:
- "S" OR ""(no value)] - where S is optimized for older devices (compute capability < 2.0?) and no value is for all other devices
- #b - # of blocks
- x
- #w - # of warps (groups of 32 threads)
And the kernel selection options (by autotune or -l flag) for the 2013-04-30 release onward are of the format:
- "S" OR ""(no value)] - where S is optimized for Kepler devices (some GTX 6xx GPUs) and no value is for all other devices
- #b - # of blocks
- x
- #w - # of warps (groups of 32 threads)
An important consideration is that
#b x #w x 32 x 131072 should be less than the RAM (in bytes) on the card in question.
I'm not really sure how to break it down to explain-like-I'm-five levels without resorting to some MS paint-esque diagramming, which probably would do more harm than good.
And the answer to your question is...
maybe. I have a Kepler mobile card but I saw equivalent performance when switching on the S flag at the same blocks and warps, and texture caching settings. I haven't had a chance to run any extensive testing or autotuning yet though.