Pages:
Author

Topic: [XPM] CUDA enabled qt client miner for primecoins. Source code inside. WIP - page 4. (Read 31718 times)

member
Activity: 75
Merit: 10
Good job primedigger!  Just looking at your 5-chains score, I think that means that this is slower than most CPU's though right?  My crappy 4-core AMD does 400-800 5-chains/ hour (around 3000PPS).

Yes, it's an extremely crappy score, but it's a start. The high performance client does a good job of squeezing out as much as possible out of your CPU though!
hero member
Activity: 820
Merit: 1000
Good job primedigger!  Just looking at your 5-chains score, I think that means that this is slower than most CPU's though right?  My crappy 4-core AMD does 400-800 5-chains/ hour (around 3000PPS).
member
Activity: 75
Merit: 10
My 2 cents: mining entirely on GPU wont be easy and is impractical, but tandem mining with interleaved CPU+GPU computations may very well give good speed ups.

Some feedback from knowledgeable people indicates that mod_exp probably would not speed up as well on gpu. However I think if gpu can do the sieve much more efficiently it could generate a lot less candidates for the Fermat test, which could speed things up quite a bit.

There is indeed a problem with the speeds of Fermat tests on the GPU. GNU GMP uses the most sophisticated algorithms available, the student library I found and which I started to extend uses the most basic algorithms.

Mpz_powmod needs fast multiplications of big ints, GMP's algorithm is most likely in O(log(n)*n), school book multiplication which the GPU now uses is O(n^2). I hoped that for the ~400 bit numbers involved it wouldn't make such a difference. Currently, the new version in my repo does Fermat tests on the GPU (rate is 10 per second), but my CPU is still faster due to better algorithms and a better big num implementation.

But don't worry, I won't give up so fast! The current situation is that I either need to look into porting better algorithms to the GPU or to do something else than Fermat tests on the GPU to sieve candidates (e.g. trial division with most common primes).

Anybody with a better GPU than the Geforce 570 TI I own, please test this! My stats (base version is still hp4):

2013-07-24 21:53:38 primemeter     24303 prime/h    490729 test/h        47 5-chains/h

prime/h  and test/h seem to fluctuate enormously and seem to be rather meaningless. As most tests are on the GPU, I have no idea if this is even measuring the tests right. 5-chains is accurate though.

You have to use setgenerate true 1, i.e. one CPU thread for mining. 
legendary
Activity: 1205
Merit: 1010
My 2 cents: mining entirely on GPU wont be easy and is impractical, but tandem mining with interleaved CPU+GPU computations may very well give good speed ups.

Some feedback from knowledgeable people indicates that mod_exp probably would not speed up as well on gpu. However I think if gpu can do the sieve much more efficiently it could generate a lot less candidates for the Fermat test, which could speed things up quite a bit.
newbie
Activity: 26
Merit: 0
member
Activity: 75
Merit: 10
Progress update:

- GPU does Fermats tests now and CPU does the rest. Fermat tests seem to work fine now.
- I can find prime chains, but couldn't find a block on testnet in a timely manner

Todo:
- Transferring mpz types with strings is slow so I will transform gmp mpz's directly to my CUDA format on the CPU.
- A lot happenend with the high performance client, I will update my codebase to hp7
- The changes in hp6 could also be useful on the GPU ( -> fast divisibility tests before doing the expensive Fermat's test)
- Interleave CPU+GPU computations and async memory copys. Without this, my client won't be very fast.

I don't like to put an ETA on this, it's done when it's done. But I hope to have something by next week which outperforms my old intel core2 quad core.

My 2 cents: mining entirely on GPU wont be easy and is impractical, but tandem mining with interleaved CPU+GPU computations may very well give good speed ups.

Edit: And thanks for the donations / pledges guys!

I donated .2 BTC . thanks for the update.
full member
Activity: 122
Merit: 100
Progress update:

- GPU does Fermats tests now and CPU does the rest. Fermat tests seem to work fine now.
- I can find prime chains, but couldn't find a block on testnet in a timely manner

Todo:
- Transferring mpz types with strings is slow so I will transform gmp mpz's directly to my CUDA format on the CPU.
- A lot happenend with the high performance client, I will update my codebase to hp7
- The changes in hp6 could also be useful on the GPU ( -> fast divisibility tests before doing the expensive Fermat's test)
- Interleave CPU+GPU computations and async memory copys. Without this, my client won't be very fast.

I don't like to put an ETA on this, it's done when it's done. But I hope to have something by next week which outperforms my old intel core2 quad core.

My 2 cents: mining entirely on GPU wont be easy and is impractical, but tandem mining with interleaved CPU+GPU computations may very well give good speed ups.

Edit: And thanks for the donations / pledges guys!

The fact that you are finding blocks on testnet is quite an achievement. Keep up the good work Smiley
member
Activity: 75
Merit: 10
Progress update:

- GPU does Fermats tests now and CPU does the rest. Fermat tests seem to work fine now.
- I can find prime chains, but couldn't find a block on testnet in a timely manner

Todo:
- Transferring mpz types with strings is slow so I will transform gmp mpz's directly to my CUDA format on the CPU.
- A lot happenend with the high performance client, I will update my codebase to hp7
- The changes in hp6 could also be useful on the GPU ( -> fast divisibility tests before doing the expensive Fermat's test)
- Interleave CPU+GPU computations and async memory copys. Without this, my client won't be very fast.

I don't like to put an ETA on this, it's done when it's done. But I hope to have something by next week which outperforms my old intel core2 quad core.

My 2 cents: mining entirely on GPU wont be easy and is impractical, but tandem mining with interleaved CPU+GPU computations may very well give good speed ups.

Edit: And thanks for the donations / pledges guys!
sr. member
Activity: 294
Merit: 250
How is progress so far? CUDA is a must Smiley
sr. member
Activity: 519
Merit: 252
555
Based on previous prime-number-based-research projects, CUDA has outperformed OpenCL.

AFAIK, there are two reasons why the Lucas-Lehmer test for Mersenne primes has been done in CUDA rather than OpenCL. It uses floating point math, and there are better FFT libraries available for CUDA. (For the huge numbers involved, multiplication is more efficient via Fourier transform and convolution.)

On the other hand, trial division (another important part of Mersenne prime search) seems to be more efficient with OpenCL on AMD, as it uses integer math. I assume Primecoin would work fine with just integer math.

Sources: http://mersenneforum.org/forumdisplay.php?f=92

Of course, there is a more ideological point about using a language that works across many platforms (CPUs and GPUs from several vendors) than tying yourself to Nvidia, especially in an open source project.
member
Activity: 75
Merit: 10
if I do mine with this with this, I'll pledge to give 10% of the total XPM I mine for the first 5 days to primedigger. I have two 2.0 cuda compute capability cards and one 3.0.
member
Activity: 63
Merit: 10
Donated some BTC just now, keep up the good work Smiley

BTW, are there any guides how to compile this for Windows (or how to cross-compile with Linux?)
or should Bitcoin-QT building guides apply to this as well?
sr. member
Activity: 243
Merit: 250
I would definitely donate if the CUDA miner can outperform the AMD OCL miner (on GPU in same price range).

Best of luck to you.
full member
Activity: 383
Merit: 100
Precompiled windows version please! All of us don't have compilers handy you know.

Why do you want a miner that can't mine yet?
Oh yeah, didn't see that until now.
full member
Activity: 156
Merit: 100
Firstbits: 1dithi
Precompiled windows version please! All of us don't have compilers handy you know.

Why do you want a miner that can't mine yet?
full member
Activity: 383
Merit: 100
Precompiled windows version please! All of us don't have compilers handy you know.
member
Activity: 75
Merit: 10
hi
if you guys dont mind i would also try it on a nvidia card ...
thx

Of course, if you want to hack on this - go ahead! Just to clarify: The project isn't in a stage were you can start mining with this, but we're not far away from that either.
newbie
Activity: 18
Merit: 0
hi
if you guys dont mind i would also try it on a nvidia card ...
thx
sr. member
Activity: 294
Merit: 250
I may want to try this.
legendary
Activity: 1064
Merit: 1000
I specifically bought a Nvidia card for my main computer so I would not worry about mining with it when not using it.

Now this...dammit  Cheesy
Pages:
Jump to: