Pages:
Author

Topic: [XPM] CUDA enabled qt client miner for primecoins. Source code inside. WIP - page 2. (Read 31806 times)

sr. member
Activity: 363
Merit: 250
I think there might be a bug in hp-9 somewhere, so that the trial division doesn't work quite right in that version and sorts out wrong candidates. I will need to confirm this with the hp-9 sources without my changes, so that I'm sure I didn't introduce that bug. The CUDA trial division seems to be doing the right think and it doesn't find any candidates to discard, because if I understand it right, the sieve already did that. To put it in different words: this idea is likely a dead end.

I can confirm that fast division is buggy in hp9. It's also not needed in hp9 anymore because of Sunny's optimization. I'll remove the code in my next release. Thanks for spotting the bug.

So uhh... should we not be using HP9 then?
sr. member
Activity: 301
Merit: 250
I think there might be a bug in hp-9 somewhere, so that the trial division doesn't work quite right in that version and sorts out wrong candidates. I will need to confirm this with the hp-9 sources without my changes, so that I'm sure I didn't introduce that bug. The CUDA trial division seems to be doing the right think and it doesn't find any candidates to discard, because if I understand it right, the sieve already did that. To put it in different words: this idea is likely a dead end.

I can confirm that fast division is buggy in hp9. It's also not needed in hp9 anymore because of Sunny's optimization. I'll remove the code in my next release. Thanks for spotting the bug.
full member
Activity: 213
Merit: 100
Keep on keeping on, I guess.
member
Activity: 75
Merit: 10
There are also a couple of papers on Fermat tests on the GPU (e.g. http://www.gpgpgpu.com/gecco2009/6.pdf), however these implementations are usually assuming that n is smaller than 32 or 64bits, which makes the test much easier.

I just skimmed over that paper.  Their results are novel, but almost useless in application.
If you're doing Fermat tests, there's a good chance the numbers you want to analyze are greater than 2^64.   Sad

Exactly, and they are much greater than 2^64 in primcoin.

As for CUMP, it is completely useless for primecoin, as it only implements floating point arithmetic and then only addition, multiplication and subtraction.
full member
Activity: 168
Merit: 100
There are also a couple of papers on Fermat tests on the GPU (e.g. http://www.gpgpgpu.com/gecco2009/6.pdf), however these implementations are usually assuming that n is smaller than 32 or 64bits, which makes the test much easier.

I just skimmed over that paper.  Their results are novel, but almost useless in application.
If you're doing Fermat tests, there's a good chance the numbers you want to analyze are greater than 2^64.   Sad
full member
Activity: 168
Merit: 100
I take it CUMP didn't have what you needed?
full member
Activity: 178
Merit: 100
member
Activity: 75
Merit: 10
So, sad news:

I think there might be a bug in hp-9 somewhere, so that the trial division doesn't work quite right in that version and sorts out wrong candidates. I will need to confirm this with the hp-9 sources without my changes, so that I'm sure I didn't introduce that bug. The CUDA trial division seems to be doing the right think and it doesn't find any candidates to discard, because if I understand it right, the sieve already did that. To put it in different words: this idea is likely a dead end.

I pushed my changes for anyone who that wants to play with it. There is also still a very slow ported version of the Fermat test, which is easily outperformed by GMP's implementation on the CPU. I think there is no easy way to avoid doing Fermat tests on the GPU. So for now, there is sadly nothing for the GPU which is faster than hp-9 on the CPU.

I will have a close look at Mtrlt's project, but as it seems, he might have similar problems. It would be a major achievement if gets a GPU Fermat test working with a good speed-up. This means that a fast GPU "exponentiation by squaring" algorithm is available to the research community and prime research would benefit from that in general, as most prime tests (not only Fermat's test) need that. There are also a couple of papers on Fermat tests on the GPU (e.g. http://www.gpgpgpu.com/gecco2009/6.pdf), however these implementations are usually assuming that n is smaller than 32 or 64bits, which makes the test much easier.

Also if Mtrlt succeds, he really deserves his price money... it's really not an easy task and I doubt other GPU implementations are in the wild. I would also then port over his method to my CUDA project.
newbie
Activity: 20
Merit: 0

I was away for the past week and will look into it again this week. Yes, it's just a hobby project and it got bigger than I expected. Currently, I'm the only one working on this, so if someone wants to chip in and help (programming), send me a PM.

Status:

I will push my lastest changes soon, I have updated my code basis to hp-9 and I implemented a fast big num small prime trial division for the GPU. Depending on the settings, this can filter out 10-90% of all candidates. The CPU than computes the fermat tests on the remaining candidates. I was under the impression that the sieve would already filter out all chains versus small primes, but apparently the high performance client still filters out some candidates with trial divisions and does this before doing fermat tests.

If a fast fermat test for the GPU surfaces, than filtering+fermat tests could be chained directly on the GPU to give a better speed up.

To clarify: I didn't push my changes because I still have a silly bug somewhere, so that apparently not all prime divisors are found. But doing more prime division tests than what the high performance client does by default yields already better speed ups directly on the CPU.

I only wish that I knew enough to help, but as it is, I know nothing. Thanks for the update!
member
Activity: 63
Merit: 10
I was away for the past week and will look into it again this week. Yes, it's just a hobby project and it got bigger than I expected. Currently, I'm the only one working on this, so if someone wants to chip in and help (programming), send me a PM.

Status:

I will push my lastest changes soon, I have updated my code basis to hp-9 and I implemented a fast big num small prime trial division for the GPU. Depending on the settings, this can filter out 10-90% of all candidates. The CPU than computes the fermat tests on the remaining candidates. I was under the impression that the sieve would already filter out all chains versus small primes, but apparently the high performance client still filters out some candidates with trial divisions and does this before doing fermat tests.

If a fast fermat test for the GPU surfaces, than filtering+fermat tests could be chained directly on the GPU to give a better speed up.

To clarify: I didn't push my changes because I still have a silly bug somewhere, so that apparently not all prime divisors are found. But doing more prime division tests than what the high performance client does by default yields already better speed ups directly on the CPU.

Thanks for the update and keep up the good work Smiley
legendary
Activity: 2674
Merit: 3000
Terminated.
good news , glad to see your still working on it
I wondered if he quit it, so we got some good news  Smiley
legendary
Activity: 1540
Merit: 1011
FUD Philanthropist™
i am watching you Wink

CUDA !!!!!!
newbie
Activity: 44
Merit: 0
good news , glad to see your still working on it
member
Activity: 75
Merit: 10
I was away for the past week and will look into it again this week. Yes, it's just a hobby project and it got bigger than I expected. Currently, I'm the only one working on this, so if someone wants to chip in and help (programming), send me a PM.

Status:

I will push my lastest changes soon, I have updated my code basis to hp-9 and I implemented a fast big num small prime trial division for the GPU. Depending on the settings, this can filter out 10-90% of all candidates. The CPU than computes the fermat tests on the remaining candidates. I was under the impression that the sieve would already filter out all chains versus small primes, but apparently the high performance client still filters out some candidates with trial divisions and does this before doing fermat tests.

If a fast fermat test for the GPU surfaces, than filtering+fermat tests could be chained directly on the GPU to give a better speed up.

To clarify: I didn't push my changes because I still have a silly bug somewhere, so that apparently not all prime divisors are found. But doing more prime division tests than what the high performance client does by default yields already better speed ups directly on the CPU.
full member
Activity: 213
Merit: 100
Forget about it guys, this miner was never gonna happen. Obviously the people who started and promoted this thread had no idea what they were getting into.

The reaper guy's miner is probably the only GPU miner we will EVER be seeing.

So it would seem, sadly. I hope once mrtlt's is open sourced we can get some real community development going for the good of the coin. Hopefully by that point, I will have finish the crunch at work and I can try to dive in as well.
sr. member
Activity: 406
Merit: 250
Forget about it guys, this miner was never gonna happen. Obviously the people who started and promoted this thread had no idea what they were getting into.

The reaper guy's miner is probably the only GPU miner we will EVER be seeing.
legendary
Activity: 2674
Merit: 3000
Terminated.
It's abandoned. Lol. Probably everyone figured out that this is too difficult. Heck even mlmrt was having trouble.

LIES! He's managed to have the same efficiency as an AMD multi-core.

With an AMD multi-core + a HD6990.


Wait... that's better?
It's not..
member
Activity: 63
Merit: 10
What's the progress so far?
sr. member
Activity: 406
Merit: 250
It's abandoned. Lol. Probably everyone figured out that this is too difficult. Heck even mlmrt was having trouble.

LIES! He's managed to have the same efficiency as an AMD multi-core.

With an AMD multi-core + a HD6990.


Wait... that's better?
full member
Activity: 213
Merit: 100
I'm still on it - with a different idea. As it turns out, doing Fermat tests on the GPU is not a no brainer and getting that fast requires too much effort for now, so I'll try to port something else to the GPU.

I'm still sure a GPU miner is possible, but right now I would say it's a lot harder than for the other coins. The other OpenCL miner project is (amusingly!) also having problems.

As i'm sure you are already aware, mlmrt ported the sieve to the GPU. Is that what you are going after?

Pages:
Jump to: