Author

Topic: doubling Litecoin mining efficiency on nVidia (Read 15138 times)

hero member
Activity: 914
Merit: 500

This thread on the nvidia forums discusses the possible benefits from the Compute 3.5 architecture for SHA-256 hashing https://devtalk.nvidia.com/default/topic/496471/cuda-programming-and-performance/amd-radeon-3x-faster-on-bitcoin-mining-sha-256-hashing-performance/3/

Don't be too optimistic, the instruction count is "only" reduced by about 20%. So the speed boost is expected to be moderate at best.

My sign of hope there is that the 20% reduction was in instruction count only, not actual execution time. Wink
full member
Activity: 185
Merit: 100
Awesome Cheesy. Will you be releasing binaries? (And maybe a donation address)
hero member
Activity: 756
Merit: 502
Have you looked at cgminer?

I've just compiled pooler's cpuminer on Windows (without the assembly bits) and I think I will be adding my CUDA stuff there.

Christian
hero member
Activity: 756
Merit: 502

This thread on the nvidia forums discusses the possible benefits from the Compute 3.5 architecture for SHA-256 hashing https://devtalk.nvidia.com/default/topic/496471/cuda-programming-and-performance/amd-radeon-3x-faster-on-bitcoin-mining-sha-256-hashing-performance/3/

Don't be too optimistic, the instruction count is "only" reduced by about 20%. So the speed boost is expected to be moderate at best.
hero member
Activity: 914
Merit: 500
But the new 64 bit funnel shifter is only available in the 3.5 compute capability, as offered by the Titan card, or the GF110 based Teslas. Lesser Geforce cards like your 690GTX have compute 3.0 only. A question on stackoverflow deals with this feature:
http://stackoverflow.com/questions/12767113/funnel-shift-what-is-it.  It has the potential to somewhat speed up SHA-256 and scrypt hashing, but the price of the cards is just way too high.

I would be willing to make the investment in a GeForce TITAN card (GK110) in order to collaborate with you in order to make a more efficient pure CUDA mining routine for both SHA-256 and scrypt. I think there's huge potential there because the funnel shifter brings to the table performance gains along the same lines as BFI_INT did for OpenCL.

Let me know how you would like to collaborate on this. Again, I'm not a seasoned CUDA developer but I feel I can still be of assistance Smiley
full member
Activity: 185
Merit: 100
For integrating my CUDA code, I briefly looked at the source code of the reaper GPU miner, but I do not really like it. Looks like a hack.
Have you looked at cgminer?
hero member
Activity: 756
Merit: 502
I'd be interested in helping you test your code as I have a 690GTX that is begging to be spun up for Litecoins! Smiley

One thing I was curious about is the new "Shift Left" (SHLFT) instructions in the new Kepler architecture  and how it might be used to juice even more performance from a pure CUDA based miner.

For integrating my CUDA code, I briefly looked at the source code of the reaper GPU miner, but I do not really like it. Looks like a hack.

I only know about a SHFL instruction, which is for intra-warp data exchange (shuffle?). I do not see this speeding up hashing at the moment.

But the new 64 bit funnel shifter is only available in the 3.5 compute capability, as offered by the Titan card, or the GF110 based Teslas. Lesser Geforce cards like your 690GTX have compute 3.0 only. A question on stackoverflow deals with this feature:
http://stackoverflow.com/questions/12767113/funnel-shift-what-is-it.  It has the potential to somewhat speed up SHA-256 and scrypt hashing, but the price of the cards is just way too high.
hero member
Activity: 914
Merit: 500
Christian,

I was curious if there were any more serious CUDA developers out there looking to tackle this issue. Thank you for stepping up for the community!

As far as a miner is concerned re: CUDA, the only one I know of that actually uses CUDA vs. OpenCL is the ooooold version of RPCMiner (here: https://bitcointalksearch.org/topic/rpc-miners-cpu4waycudaopencl-2444)

I'd be interested in helping you test your code as I have a 690GTX that is begging to be spun up for Litecoins! Smiley

Additionally, I'm very, very novice at CUDA development (as in, went through "CUDA By Example" six months ago as research for work) but I'd like to help any way I can. One thing I was curious about is the new "Shift Left" (SHLFT) instructions in the new Kepler architecture (more info: http://developer.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0642-GTC2012-Inside-Kepler.pdf) and how it might be used to juice even more performance from a pure CUDA based miner.

Thanks again and keep the updates coming! Smiley
hero member
Activity: 756
Merit: 502
Hi,

I was always a bit dismayed at the state of OpenCL mining on nVidia cards. Getting 20kHash/sec on an nVidia gtx 260 (the 216 shader version) was always rather disappointing. Even my GTX 660Ti peaked out at 80kHash/sec.

I sat down and rewrote all of the computationally heavy part in CUDA. That's the several 1024 iterations of calling the Salsa20/8 core (the part inbetween the two PBKDF2_SHA256() calls).

My code reference is this pretty cleaned up and concise implementation: https://github.com/litecoin-project/litecoin/blob/master/src/scrypt.c  I ported this to CUDA making heavy use of shared memory, and making sure that memory accesses are pretty close to optimal.

Now I am getting something like 45 kHash/sec from the GTX 260. Still no match for any ATI card, but about the hashing power of a high-end CPU miner. I still need to figure out in which existing miner application I can include this new CUDA code.

Christian
Jump to: