[XPM] CUDA enabled qt client miner for primecoins. Source code inside. WIP - page 3.

primedigger

member

Activity: 75

Merit: 10

I'm still on it - with a different idea. As it turns out, doing Fermat tests on the GPU is not a no brainer and getting that fast requires too much effort for now, so I'll try to port something else to the GPU.

I'm still sure a GPU miner is possible, but right now I would say it's a lot harder than for the other coins. The other OpenCL miner project is (amusingly!) also having problems.

maco

sr. member

Activity: 294

Merit: 250

has anyone tested this yet? is it working?

bcp19

hero member

Activity: 532

Merit: 500

Quote from: Kouye on July 30, 2013, 05:26:10 PM

Quote from: ReCat on July 30, 2013, 04:55:06 PM

It's abandoned. Lol. Probably everyone figured out that this is too difficult. Heck even mlmrt was having trouble.

LIES! He's managed to have the same efficiency as an AMD multi-core.

With an AMD multi-core + a HD6990.

WOW, we can spend ~194 watts running an AMD multi-core or ~525 watts running the AMD and a GPU and get the same results!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
I WANT I WANT I WANT I WANT!

Kouye

sr. member

Activity: 336

Merit: 250

Cuddling, censored, unicorn-shaped troll.

Quote from: ReCat on July 30, 2013, 04:55:06 PM

It's abandoned. Lol. Probably everyone figured out that this is too difficult. Heck even mlmrt was having trouble.

LIES! He's managed to have the same efficiency as an AMD multi-core.

With an AMD multi-core + a HD6990.

ReCat

sr. member

Activity: 406

Merit: 250

It's abandoned. Lol. Probably everyone figured out that this is too difficult. Heck even mlmrt was having trouble.

hasle2

full member

Activity: 122

Merit: 100

Is this project still active or has it been abandoned?

primedigger

member

Activity: 75

Merit: 10

Quote from: bcp19 on July 25, 2013, 02:03:20 PM

Quote from: primedigger on July 25, 2013, 10:27:25 AM

Please check that you're using the latest SDK. I also encountered memory problems with cuda 5.0 and I'm using 5.5 now which works for me.

Just curious, have you looked at the Mfaktc source code at all? While it is used for trial factoring Mersenne Primes, which may not be helpful, the writer did get it to sieve completely on the GPU, which may.

I looked into it, yes. Code is not very understandable though...

liteuser

full member

Activity: 145

Merit: 100

I've updated to cuda-5.5 (and driver 319.21)

Running with cuda-gdb I get the following error:

Code:

Have 2400 candidates after main loop
Cuda start!
[New Thread 0x7fffacc38700 (LWP 14248)]
[Context Create of context 0x7fff700234f0 on Device 0]
[Launch of CUDA Kernel 0 (runPrimeCandidateSearch<<<(25,1,1),(192,1,1)>>>) on Device 0]

Program received signal CUDA_EXCEPTION_10, Device Illegal Address.
[Switching focus to CUDA kernel 0, grid 1, block (15,0,0), thread (0,0,0), device 0, sm 3, warp 0, lane 0]
0x00007fff7091b760 in long_multiplication(unsigned int * @generic, unsigned int * @generic, unsigned int * @generic, unsigned int, unsigned int) (
    product=0x3fff6b4, op1=0x3fff734, op2=0x3fff634, num_digits=17, 
    prod_capacity=1073741824)
    at primecoin/src/cuda/digit.h:406
406	    product[i] = 0;

ReCat

sr. member

Activity: 406

Merit: 250

Fascinating. This CUDA miner is already vaguely functional now? Now that's some community effort. I wonder what will be the eventual result of this. Will fast CPU's and GPU's working together be the new mining rigs?

Entz

full member

Activity: 210

Merit: 100

I not use any kind of messenger beware of scammers

Having the same problem as K1773R (GTX670 using CUDA 5.5 and the driver it includes). Tried it on mainnet as I still cannot connect to testnet for some reason.

Code:

Have 101 candidates after main loop
Cuda start!
{... some block messages i.e. getblocks -1 to blah, accept etc}
Have -1 candidates after main loop
Cuda+host test round finished with -1 candidates (0 host chain tests)
Cuda error: cudaMemcpy: cudaMemcpyDeviceToHost, unspecified launch failure
ERROR: PrimecoinMiner() : primorial minimum overflow
ERROR: PrimecoinMiner() : primorial minimum overflow
ERROR: PrimecoinMiner() : primorial minimum overflow
ERROR: PrimecoinMiner() : primorial minimum overflow
ERROR: PrimecoinMiner() : primorial minimum overflow
ERROR: PrimecoinMiner() : primorial minimum overflow
ERROR: PrimecoinMiner() : primorial minimum overflow
ERROR: PrimecoinMiner() : primorial minimum overflow

from GDB

Code:

[0] start! 
sizeof(struct) = 400
mpz_print:mpz_capacity: 0 
[0] string candidate is 
[0] N is: mpz_capacity: 30 ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
[0] E is: mpz_capacity: 30 fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffe

Edit: This may just be PEBKAC / RTFM issue on my part. Just saw your note about running with 1 cpu only.

Still crashed, managed to get a few rounds. Used to crash right away

Code:

2013-07-25 19:51:56 primemeter         0 prime/h    498885 test/h         0 5-chains/h
2013-07-25 19:52:56 primemeter         0 prime/h   8404040 test/h         0 5-chains/h
2013-07-25 19:53:56 primemeter         0 prime/h   4184750 test/h         0 5-chains/h

bcp19

hero member

Activity: 532

Merit: 500

Quote from: primedigger on July 25, 2013, 10:27:25 AM

Please check that you're using the latest SDK. I also encountered memory problems with cuda 5.0 and I'm using 5.5 now which works for me.

Just curious, have you looked at the Mfaktc source code at all? While it is used for trial factoring Mersenne Primes, which may not be helpful, the writer did get it to sieve completely on the GPU, which may.

Entz

full member

Activity: 210

Merit: 100

I not use any kind of messenger beware of scammers

Just got this compiled (Talk about a mess, when my cuda sdk was installed the paths were completely different then they should of /nvidia-304 vs /nvidia-current etc, then some fun Qt conflicts).

Anyone have a working node for testnet they can post? Not having any luck connecting.

K1773R

legendary

Activity: 1792

Merit: 1008

/dev/null

Quote from: primedigger on July 25, 2013, 10:27:25 AM

Please check that you're using the latest SDK. I also encountered memory problems with cuda 5.0 and I'm using 5.5 now which works for me.

ACK, will do later and report back Wink

Schleicher

hero member

Activity: 675

Merit: 514

Would it make any difference if we use __restricted__ pointers in the CUDA code?

primedigger

member

Activity: 75

Merit: 10

Please check that you're using the latest SDK. I also encountered memory problems with cuda 5.0 and I'm using 5.5 now which works for me.

TheSwede75

full member

Activity: 224

Merit: 100

More than wiling to help perform tests as instructed if a windows binary is posted. Got an old GTX 475 rattling around that I could out to work..

K1773R

legendary

Activity: 1792

Merit: 1008

/dev/null

Quote from: primedigger on July 25, 2013, 08:45:44 AM

Quote from: K1773R on July 25, 2013, 08:31:53 AM

Quote from: primedigger on July 25, 2013, 08:06:02 AM

Quote from: Sunny King on July 24, 2013, 09:12:02 PM

Quote from: primedigger on July 24, 2013, 09:55:20 AM

My 2 cents: mining entirely on GPU wont be easy and is impractical, but tandem mining with interleaved CPU+GPU computations may very well give good speed ups.

Some feedback from knowledgeable people indicates that mod_exp probably would not speed up as well on gpu. However I think if gpu can do the sieve much more efficiently it could generate a lot less candidates for the Fermat test, which could speed things up quite a bit.

There is indeed a problem with the speeds of Fermat tests on the GPU. GNU GMP uses the most sophisticated algorithms available, the student library I found and which I started to extend uses the most basic algorithms.

Mpz_powmod needs fast multiplications of big ints, GMP's algorithm is most likely in O(log(n)*n), school book multiplication which the GPU now uses is O(n^2). I hoped that for the ~400 bit numbers involved it wouldn't make such a difference. Currently, the new version in my repo does Fermat tests on the GPU (rate is 10 per second), but my CPU is still faster due to better algorithms and a better big num implementation.

But don't worry, I won't give up so fast! The current situation is that I either need to look into porting better algorithms to the GPU or to do something else than Fermat tests on the GPU to sieve candidates (e.g. trial division with most common primes).

Anybody with a better GPU than the Geforce 570 TI I own, please test this! My stats (base version is still hp4):

2013-07-24 21:53:38 primemeter 24303 prime/h 490729 test/h 47 5-chains/h

prime/h and test/h seem to fluctuate enormously and seem to be rather meaningless. As most tests are on the GPU, I have no idea if this is even measuring the tests right. 5-chains is accurate though.

You have to use setgenerate true 1, i.e. one CPU thread for mining.

running current git (b0c7062f3925482935f5eb352b17737d21b95c5b) and i cant see any usage of my GPU, no heat nor used mem increases when using the QT. anything special to activate so it mines with the GPU? i got a powerfull GPU to test with Wink

EDIT:

Code:

2013-07-25 13:35:43 primemeter         0 prime/h   34261932 test/h         0 5-chains/h

seems the miner thread which should launch the CUDA is borked?

EDIT2:

Code:

Have 2400 candidates after main loop
Cuda start!
Cuda error: cudaMemcpy: cudaMemcpyDeviceToHost, the launch timed out and was terminated

from debug log

You can also run it with -printmining -printtoconsole to see that output directly. Could you compile the cuda portion with -G -g (change the qt project file where it invokes nvcc) and give me the output of cuda-memcheck?

You can also #define CUDA_DEBUG in the cu file, to see the GPU printfs from the console

was already running with -g just waiting for the "cuda start message", stoped it now and recompile with -D CUDA_DEBUG
EDIT: its up and running, waiting for the cuda init + crash Wink

EDIT2: why does it take so long until the miner starts the cuda thread? that seems stupid :S
EDIT3: here we go, it crashed

debug.log

Code:

Have 2400 candidates after main loop
Cuda start!
Cuda error: cudaMemcpy: cudaMemcpyDeviceToHost, unspecified launch failure

stdout

Code:

[0] start! 
sizeof(struct) = 400
mpz_print:mpz_capacity: 0 
[0] string candidate is 
[0] N is: mpz_capacity: 30 ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
[0] E is: mpz_capacity: 30 fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffe

gdb: dont want to spam, sending per PM and message too big -.-

Vorksholk

legendary

Activity: 1713

Merit: 1029

If someone could give me some specific compilation directions (or a windows binary!) I can test on a 780.

primedigger

member

Activity: 75

Merit: 10

Quote from: K1773R on July 25, 2013, 08:31:53 AM

Quote from: primedigger on July 25, 2013, 08:06:02 AM

Quote from: Sunny King on July 24, 2013, 09:12:02 PM

Quote from: primedigger on July 24, 2013, 09:55:20 AM

My 2 cents: mining entirely on GPU wont be easy and is impractical, but tandem mining with interleaved CPU+GPU computations may very well give good speed ups.

Some feedback from knowledgeable people indicates that mod_exp probably would not speed up as well on gpu. However I think if gpu can do the sieve much more efficiently it could generate a lot less candidates for the Fermat test, which could speed things up quite a bit.

There is indeed a problem with the speeds of Fermat tests on the GPU. GNU GMP uses the most sophisticated algorithms available, the student library I found and which I started to extend uses the most basic algorithms.

Mpz_powmod needs fast multiplications of big ints, GMP's algorithm is most likely in O(log(n)*n), school book multiplication which the GPU now uses is O(n^2). I hoped that for the ~400 bit numbers involved it wouldn't make such a difference. Currently, the new version in my repo does Fermat tests on the GPU (rate is 10 per second), but my CPU is still faster due to better algorithms and a better big num implementation.

But don't worry, I won't give up so fast! The current situation is that I either need to look into porting better algorithms to the GPU or to do something else than Fermat tests on the GPU to sieve candidates (e.g. trial division with most common primes).

Anybody with a better GPU than the Geforce 570 TI I own, please test this! My stats (base version is still hp4):

2013-07-24 21:53:38 primemeter 24303 prime/h 490729 test/h 47 5-chains/h

prime/h and test/h seem to fluctuate enormously and seem to be rather meaningless. As most tests are on the GPU, I have no idea if this is even measuring the tests right. 5-chains is accurate though.

You have to use setgenerate true 1, i.e. one CPU thread for mining.

running current git (b0c7062f3925482935f5eb352b17737d21b95c5b) and i cant see any usage of my GPU, no heat nor used mem increases when using the QT. anything special to activate so it mines with the GPU? i got a powerfull GPU to test with Wink

EDIT:

Code:

2013-07-25 13:35:43 primemeter         0 prime/h   34261932 test/h         0 5-chains/h

seems the miner thread which should launch the CUDA is borked?

EDIT2:

Code:

Have 2400 candidates after main loop
Cuda start!
Cuda error: cudaMemcpy: cudaMemcpyDeviceToHost, the launch timed out and was terminated

from debug log

You can also run it with -printmining -printtoconsole to see that output directly. Could you compile the cuda portion with -G -g (change the qt project file where it invokes nvcc) and give me the output of cuda-memcheck?

You can also #define CUDA_DEBUG in the cu file, to see the GPU printfs from the console

K1773R

legendary

Activity: 1792

Merit: 1008

/dev/null

Quote from: primedigger on July 25, 2013, 08:06:02 AM

Quote from: Sunny King on July 24, 2013, 09:12:02 PM

Quote from: primedigger on July 24, 2013, 09:55:20 AM

My 2 cents: mining entirely on GPU wont be easy and is impractical, but tandem mining with interleaved CPU+GPU computations may very well give good speed ups.

Some feedback from knowledgeable people indicates that mod_exp probably would not speed up as well on gpu. However I think if gpu can do the sieve much more efficiently it could generate a lot less candidates for the Fermat test, which could speed things up quite a bit.

There is indeed a problem with the speeds of Fermat tests on the GPU. GNU GMP uses the most sophisticated algorithms available, the student library I found and which I started to extend uses the most basic algorithms.

Mpz_powmod needs fast multiplications of big ints, GMP's algorithm is most likely in O(log(n)*n), school book multiplication which the GPU now uses is O(n^2). I hoped that for the ~400 bit numbers involved it wouldn't make such a difference. Currently, the new version in my repo does Fermat tests on the GPU (rate is 10 per second), but my CPU is still faster due to better algorithms and a better big num implementation.

But don't worry, I won't give up so fast! The current situation is that I either need to look into porting better algorithms to the GPU or to do something else than Fermat tests on the GPU to sieve candidates (e.g. trial division with most common primes).

Anybody with a better GPU than the Geforce 570 TI I own, please test this! My stats (base version is still hp4):

2013-07-24 21:53:38 primemeter 24303 prime/h 490729 test/h 47 5-chains/h

prime/h and test/h seem to fluctuate enormously and seem to be rather meaningless. As most tests are on the GPU, I have no idea if this is even measuring the tests right. 5-chains is accurate though.

You have to use setgenerate true 1, i.e. one CPU thread for mining.

running current git (b0c7062f3925482935f5eb352b17737d21b95c5b) and i cant see any usage of my GPU, no heat nor used mem increases when using the QT. anything special to activate so it mines with the GPU? i got a powerfull GPU to test with Wink

EDIT:

Code:

2013-07-25 13:35:43 primemeter         0 prime/h   34261932 test/h         0 5-chains/h

seems the miner thread which should launch the CUDA is borked?

EDIT2:

Code:

Have 2400 candidates after main loop
Cuda start!
Cuda error: cudaMemcpy: cudaMemcpyDeviceToHost, the launch timed out and was terminated

from debug.log
after the message it segfaults, going to debug with gdb Wink

Topic: [XPM] CUDA enabled qt client miner for primecoins. Source code inside. WIP - page 3. (Read 31806 times)