The amount of greed surrounding a possible GPU primecoin miner seems to be phenomenal.
Too many people seem to fall for "send me bitcoin and get early access". Sorry, but soliciting payment for early access is just dirty.
Let stop that bullshit - I'm going to share my code, even though my miner isn't fully functional right now (but I think I've already taken it quite far).
Github link:
http://github.com/primedigger/primecoinI repeat, THE CODE DOESN'T RUN CORRECTLY CURRENTLY AND CRASHES, this is just for developers who want to join and help and to show that there is a transparent development.
My general plan for porting this to CUDA:
I couldn't get getblocktemplate to work, so I added CUDA directly to the qt-client.
Let the CPU handle the candidate search, candidates get send to the GPU and the GPU acts as a co-processor that only does "ProbablePrimeChainTest" very fast. This makes it easier to have a proof-of-concept GPU miner soon.
What I did so far :
I started with the latest high performance client (hp4).
- Ported the code path in "ProbablePrimeChainTest" so that it runs with pure mpz_t like functions and minimised the number of functions that are needed. I compiled this successfully against the big integer library in
https://github.com/dmatlack/cuda-rsa.
- Changed code in "MineProbablePrimeChain" so that candidates are collected for ProbablePrimeChainTest. I made sure to measure that candidate collection is much faster than testing them in "ProbablePrimeChainTest" and this will put a theoretical limit on the speed up. If I didn't do a mistake while measuring it, the upper limit is somwhere in the 100x-1000x range. So this should be a viable approach.
- Candidates are transfered to the GPU as hex char* strings. The big integer library in cuda-rsa has "mpz_set_str", but unfortunately that doesn't work on the GPU as of now. It might be better to produce the mpz format that the GPU needs directly on the CPU instead of parsing strings. Note: Later on, transfers to the GPU can be made async, so that CPU and GPU mine in tandem.
(I also changed the sieve of Erastothenes to sieve of Atkin - I had that code flying around anyway - , but that has nothing to do with the GPU.)
The hardest part is having a reliable big integer library for CUDA. Thats why there is no working GPU miner yet. The one in
https://github.com/dmatlack/cuda-rsa needs more testing and someone could work on this independently from this project. It was the best library I could find for big integer+modulo arithmetic. If you know a better one let me know.
What I like: One way or another we will end up with a highly optimised big integer library for GPUs. That is something big on its own!
Stop sending your money to someone claiming to give you early access - my guess, the first functional GPU miners won't be very fast anyway (e.g. single digit speedup compared to hp4). An unoptimised big integer library won't outperform GMP by much.
P.s. You should call the binary with "-printtoconsole -printmining -testnet" to debug. Also I'm developing this under linux 64bit with standard paths. I only updated the qtcreator project file, so if you have CUDA working under a 64bit linux you should be able to build the project with qtcreator.