[XPM] CUDA enabled qt client miner for primecoins. Source code inside. WIP - page 2.

crendore

sr. member

Activity: 363

Merit: 250

Quote from: mikaelh on August 16, 2013, 01:51:20 AM

Quote from: primedigger on August 15, 2013, 12:10:59 PM

I think there might be a bug in hp-9 somewhere, so that the trial division doesn't work quite right in that version and sorts out wrong candidates. I will need to confirm this with the hp-9 sources without my changes, so that I'm sure I didn't introduce that bug. The CUDA trial division seems to be doing the right think and it doesn't find any candidates to discard, because if I understand it right, the sieve already did that. To put it in different words: this idea is likely a dead end.

I can confirm that fast division is buggy in hp9. It's also not needed in hp9 anymore because of Sunny's optimization. I'll remove the code in my next release. Thanks for spotting the bug.

So uhh... should we not be using HP9 then?

mikaelh

sr. member

Activity: 301

Merit: 250

Quote from: primedigger on August 15, 2013, 12:10:59 PM

I think there might be a bug in hp-9 somewhere, so that the trial division doesn't work quite right in that version and sorts out wrong candidates. I will need to confirm this with the hp-9 sources without my changes, so that I'm sure I didn't introduce that bug. The CUDA trial division seems to be doing the right think and it doesn't find any candidates to discard, because if I understand it right, the sieve already did that. To put it in different words: this idea is likely a dead end.

I can confirm that fast division is buggy in hp9. It's also not needed in hp9 anymore because of Sunny's optimization. I'll remove the code in my next release. Thanks for spotting the bug.

refer_2_me

full member

Activity: 213

Merit: 100

Keep on keeping on, I guess.

primedigger

member

Activity: 75

Merit: 10

Quote from: gigawatt on August 15, 2013, 01:25:27 PM

Quote from: primedigger on August 15, 2013, 12:10:59 PM

There are also a couple of papers on Fermat tests on the GPU (e.g. http://www.gpgpgpu.com/gecco2009/6.pdf), however these implementations are usually assuming that n is smaller than 32 or 64bits, which makes the test much easier.

I just skimmed over that paper. Their results are novel, but almost useless in application.
If you're doing Fermat tests, there's a good chance the numbers you want to analyze are greater than 2^64. Sad

Exactly, and they are much greater than 2^64 in primcoin.

As for CUMP, it is completely useless for primecoin, as it only implements floating point arithmetic and then only addition, multiplication and subtraction.

gigawatt

full member

Activity: 168

Merit: 100

Quote from: primedigger on August 15, 2013, 12:10:59 PM

There are also a couple of papers on Fermat tests on the GPU (e.g. http://www.gpgpgpu.com/gecco2009/6.pdf), however these implementations are usually assuming that n is smaller than 32 or 64bits, which makes the test much easier.

I just skimmed over that paper. Their results are novel, but almost useless in application.
If you're doing Fermat tests, there's a good chance the numbers you want to analyze are greater than 2^64. Sad

gigawatt

full member

Activity: 168

Merit: 100

I take it CUMP didn't have what you needed?

lemons

full member

Activity: 178

Merit: 100

CUDA ++1

primedigger

member

Activity: 75

Merit: 10

So, sad news:

I think there might be a bug in hp-9 somewhere, so that the trial division doesn't work quite right in that version and sorts out wrong candidates. I will need to confirm this with the hp-9 sources without my changes, so that I'm sure I didn't introduce that bug. The CUDA trial division seems to be doing the right think and it doesn't find any candidates to discard, because if I understand it right, the sieve already did that. To put it in different words: this idea is likely a dead end.

I pushed my changes for anyone who that wants to play with it. There is also still a very slow ported version of the Fermat test, which is easily outperformed by GMP's implementation on the CPU. I think there is no easy way to avoid doing Fermat tests on the GPU. So for now, there is sadly nothing for the GPU which is faster than hp-9 on the CPU.

I will have a close look at Mtrlt's project, but as it seems, he might have similar problems. It would be a major achievement if gets a GPU Fermat test working with a good speed-up. This means that a fast GPU "exponentiation by squaring" algorithm is available to the research community and prime research would benefit from that in general, as most prime tests (not only Fermat's test) need that. There are also a couple of papers on Fermat tests on the GPU (e.g. http://www.gpgpgpu.com/gecco2009/6.pdf), however these implementations are usually assuming that n is smaller than 32 or 64bits, which makes the test much easier.

Also if Mtrlt succeds, he really deserves his price money... it's really not an easy task and I doubt other GPU implementations are in the wild. I would also then port over his method to my CUDA project.

SynergyCores

newbie

Activity: 20

Merit: 0

Quote from: primedigger on August 13, 2013, 10:00:41 AM

I was away for the past week and will look into it again this week. Yes, it's just a hobby project and it got bigger than I expected. Currently, I'm the only one working on this, so if someone wants to chip in and help (programming), send me a PM.

Status:

I will push my lastest changes soon, I have updated my code basis to hp-9 and I implemented a fast big num small prime trial division for the GPU. Depending on the settings, this can filter out 10-90% of all candidates. The CPU than computes the fermat tests on the remaining candidates. I was under the impression that the sieve would already filter out all chains versus small primes, but apparently the high performance client still filters out some candidates with trial divisions and does this before doing fermat tests.

If a fast fermat test for the GPU surfaces, than filtering+fermat tests could be chained directly on the GPU to give a better speed up.

To clarify: I didn't push my changes because I still have a silly bug somewhere, so that apparently not all prime divisors are found. But doing more prime division tests than what the high performance client does by default yields already better speed ups directly on the CPU.

I only wish that I knew enough to help, but as it is, I know nothing. Thanks for the update!

jaakkop

member

Activity: 63

Merit: 10

Quote from: primedigger on August 13, 2013, 10:00:41 AM

I was away for the past week and will look into it again this week. Yes, it's just a hobby project and it got bigger than I expected. Currently, I'm the only one working on this, so if someone wants to chip in and help (programming), send me a PM.

Status:

I will push my lastest changes soon, I have updated my code basis to hp-9 and I implemented a fast big num small prime trial division for the GPU. Depending on the settings, this can filter out 10-90% of all candidates. The CPU than computes the fermat tests on the remaining candidates. I was under the impression that the sieve would already filter out all chains versus small primes, but apparently the high performance client still filters out some candidates with trial divisions and does this before doing fermat tests.

If a fast fermat test for the GPU surfaces, than filtering+fermat tests could be chained directly on the GPU to give a better speed up.

To clarify: I didn't push my changes because I still have a silly bug somewhere, so that apparently not all prime divisors are found. But doing more prime division tests than what the high performance client does by default yields already better speed ups directly on the CPU.

Thanks for the update and keep up the good work

Lauda

legendary

Activity: 2674

Merit: 3000

Terminated.

Quote from: ncr1pt0r on August 13, 2013, 10:25:52 AM

good news , glad to see your still working on it

I wondered if he quit it, so we got some good news

Spoetnik

legendary

Activity: 1540

Merit: 1011

FUD Philanthropist™

i am watching you Wink

CUDA !!!!!!

ncr1pt0r

newbie

Activity: 44

Merit: 0

good news , glad to see your still working on it

primedigger

member

Activity: 75

Merit: 10

I was away for the past week and will look into it again this week. Yes, it's just a hobby project and it got bigger than I expected. Currently, I'm the only one working on this, so if someone wants to chip in and help (programming), send me a PM.

Status:

I will push my lastest changes soon, I have updated my code basis to hp-9 and I implemented a fast big num small prime trial division for the GPU. Depending on the settings, this can filter out 10-90% of all candidates. The CPU than computes the fermat tests on the remaining candidates. I was under the impression that the sieve would already filter out all chains versus small primes, but apparently the high performance client still filters out some candidates with trial divisions and does this before doing fermat tests.

If a fast fermat test for the GPU surfaces, than filtering+fermat tests could be chained directly on the GPU to give a better speed up.

To clarify: I didn't push my changes because I still have a silly bug somewhere, so that apparently not all prime divisors are found. But doing more prime division tests than what the high performance client does by default yields already better speed ups directly on the CPU.

refer_2_me

full member

Activity: 213

Merit: 100

Quote from: ReCat on August 07, 2013, 04:39:41 PM

Forget about it guys, this miner was never gonna happen. Obviously the people who started and promoted this thread had no idea what they were getting into.

The reaper guy's miner is probably the only GPU miner we will EVER be seeing.

So it would seem, sadly. I hope once mrtlt's is open sourced we can get some real community development going for the good of the coin. Hopefully by that point, I will have finish the crunch at work and I can try to dive in as well.

ReCat

sr. member

Activity: 406

Merit: 250

Forget about it guys, this miner was never gonna happen. Obviously the people who started and promoted this thread had no idea what they were getting into.

The reaper guy's miner is probably the only GPU miner we will EVER be seeing.

Lauda

legendary

Activity: 2674

Merit: 3000

Terminated.

Quote from: ReCat on July 31, 2013, 10:39:23 AM

Quote from: Kouye on July 30, 2013, 05:26:10 PM

Quote from: ReCat on July 30, 2013, 04:55:06 PM

It's abandoned. Lol. Probably everyone figured out that this is too difficult. Heck even mlmrt was having trouble.

LIES! He's managed to have the same efficiency as an AMD multi-core.

With an AMD multi-core + a HD6990.

Wait... that's better?

It's not..

jaakkop

member

Activity: 63

Merit: 10

What's the progress so far?

ReCat

sr. member

Activity: 406

Merit: 250

Quote from: Kouye on July 30, 2013, 05:26:10 PM

Quote from: ReCat on July 30, 2013, 04:55:06 PM

It's abandoned. Lol. Probably everyone figured out that this is too difficult. Heck even mlmrt was having trouble.

LIES! He's managed to have the same efficiency as an AMD multi-core.

With an AMD multi-core + a HD6990.

Wait... that's better?

refer_2_me

full member

Activity: 213

Merit: 100

Quote from: primedigger on July 31, 2013, 04:52:58 AM

I'm still on it - with a different idea. As it turns out, doing Fermat tests on the GPU is not a no brainer and getting that fast requires too much effort for now, so I'll try to port something else to the GPU.

I'm still sure a GPU miner is possible, but right now I would say it's a lot harder than for the other coins. The other OpenCL miner project is (amusingly!) also having problems.

As i'm sure you are already aware, mlmrt ported the sieve to the GPU. Is that what you are going after?

Topic: [XPM] CUDA enabled qt client miner for primecoins. Source code inside. WIP - page 2. (Read 31806 times)