4GB GT 630 for $30 : wow! good price. I have yet to enable the Fermi kernel for scrypt-jane though.
Tesla kernels don't need to explicitly enable a texture for cached reading, as they automatically pull their data through this cache (look up what the __ldg intrinsics do in the latest CUDA programming guide)
I might try to figure out a way to chop the scrypt-jane kernels into a series of smaller kernel launches, which may make make it less taxing on the display and also allowing the use of interactive mode again.
Titan kernels are now scrypt-jane enabled! I get 3.2 kHash/s on GTX 780Ti using -l T7x3 now. And power use is cut in half compared to LTC mining. What a pity the 780Ti doesn't have 6 Gigs of RAM, or I could use -l T14x3, doubling the speed. Someone should try this launch config with the 6 GB Geforce Titan models though. Could yield some 6 kHash/s.
I also have a crazy idea that would basically remove the memory limitations for scrypt-jane mining. It requires joining the A and B kernels into a single kernel again and re-using the scratchpad memory on the GPU. So instead of giving each thread a unique 4 MB scratchpad, we may be able to reuse the same scratchpad memory for all non-concurrently executed thread blocks. I think this is a similar concept that the "intensity" parameter on the ATI cards is controlling when running cgminer. Unfortunately this idea might be incompatible with the texture cache, as this cache does not guarantee read/write coherency within a single kernel invocation. But hey, it could get my 780Ti's to 6 kHash/s...maybe.
EDIT: okay, I made a mistake in my thoughts here. with so few thread blocks running on the GPU, ALL of them would be executing concurrently. And hence the memory reuse concept falls flat.
Christian
Well this is pretty awesome. I do have a question though.
What is the significance of values that do show up in autotune vs those that don't? Auto-tune selected T7x2 for my 780. I had been using K9x2 previously, but T9x2 was blank in autotune. I gave it a shot anyway and it worked. So I tried a few others. Previously, K10x2 gave me memory warnings and wouldn't verify on the CPU. T10x2 now gives me 3.26 kHash and T21x1 gives me 3.4 and I'm getting shares accepted like nobody's business!
According to wikipedia, some 630s have kepler cores in them, I'm not sure if the particular one I'm after does or not, but I'm trying to find out. Either way, I can't pass it up for $30.
Thanks for the info about Titan kernels. I'm not current with CUDA at all. I wrote a very poor automated satisfiability theorem prover with CUDA in 2009, and I've forgotten most it since. I've spent most of the day looking through CUDA code though, when I ought to be working on my own completely unrelated code. Perhaps one of these days I'll be caught up enough to contribute.
Regarding your memory brainstorming... how much overhead is involved with creating thread blocks, and is there any concept of synchronization, i.e. locking, between thread blocks? I'm guessing no, but on the off chance that there is, can you queue up some blocks and have them wait on the currently executing blocks before they start using the scratch pad?
I will have to try it. I have a second mining rig being connected tomorrow which will double my production so my 780 could get these and save me some power. As the new rig I wont be paying any power on.
If anyone can pm me with instructions on setting up scrypt-jane for my 780 and some good pool suggestions it would be great and I can give it a try tomorrow once my second machine is running. As long as I can make 0.01btc in 3 days or so I am happy to mine it and be a tester for the code in cudaminer
With my GTX 780 I'm currently on track to make 0.014 BTC in 24 hours. You need the latest commit from git, then just pass the argument "--algo=scrypt-jane" As for compiling? I don't know much about the windows world, but on linux, or os x you just run ./autogen.sh; ./configure; make; If that doesn't work, read the error messages and try to deduce what your missing.
That reminds me, I always have to modify a few includes when I compile on OS X, perhaps I'll submit a pull request with a few #ifdefs.