[XPM] A GPU miner for XPM is around the corner? (via CUMP).

AstroKev

newbie

Activity: 23

Merit: 0

Agreed that XPM mining is a good addition to BTC/LTC mining! Except for the peopel who bought the cruddiest processors they could find when building their rigs...

Even though only ADD/SUB/MUL are in CUMP, one might be able to use a binomial method (guess/check/converge) to implement DIV without much difficulty though I don't know how well the operation would perform.

There are quite a few factoring algos/sieves on github using cuda, though my cursory glance was showing non-arbitrary precision routines. I thought floating point arithmetic has no place in integer factorization?!

mustyoshi

sr. member

Activity: 287

Merit: 250

Quote from: iCEBREAKER on July 17, 2013, 01:01:58 PM

Quote from: mustyoshi on July 17, 2013, 11:57:30 AM

Quote from: mikaelh on July 17, 2013, 03:09:31 AM

I guess you failed to notice that CUMP only supports addition, subtraction and multiplication. You need division and other special functions for Primecoin mining.

Quote

Division is just subtraction with a counter.

That's beautiful. :')

horeaper

newbie

Activity: 49

Merit: 0

I think XPM is a good addition to current GPU rigs, now your CPU won't be sleeping all the time, right?

markm

legendary

Activity: 2940

Merit: 1090

If a GPU is only twice as fast as a CPU, maybe putting more CPUs on one's motherboard might be more cost-effective than putting one or more GPUs in a machine?

Or think blade servers, two or more CPUs per blade, would GPUs really be more cost-effective?

-MarkM-

AstroKev

newbie

Activity: 23

Merit: 0

Here is a few year old example where an implementation of ECM was developed and compared against the standard CPU and the result was roughly 2x faster. I know we're not talking about ECM here but again it's suggestive of what one might expect.

http://eecm.cr.yp.to/gpuecm-20090127.pdf

modular arithmetic is very easy to implement with the four basic arithmetic functions, so I'm not sure what the holdup is around that?

iCEBREAKER

legendary

Activity: 2156

Merit: 1072

Crypto is the separation of Power and State.

Quote from: mustyoshi on July 17, 2013, 11:57:30 AM

Quote from: mikaelh on July 17, 2013, 03:09:31 AM

I guess you failed to notice that CUMP only supports addition, subtraction and multiplication. You need division and other special functions for Primecoin mining.

Quote

Division is just subtraction with a counter.

oxfeeefeee

member

Activity: 73

Merit: 10

Quote from: solracx on July 17, 2013, 11:43:05 AM

Quote from: oxfeeefeee on July 17, 2013, 11:31:59 AM

Another paper with a nice table, gives you some idea on how things will go.

http://trone.di.fc.ul.pt/images/e/e2/ASAP11-paper.pdf

GTS8800 [17] GTX8800 [10] GTX260 (This paper) GTX580 [39] Intel W3565 [46] AMD Phenom II 1090T [46]
Cores 112 128 192 512 4 6
Frequency (MHz) 1188 1350 1294 1544 3200 3200
Price (USD) 250 173 100 500 300 200
TDP (W) 150 155 202 244 130 125
GFLOPS 399 518 715 1581 102 153
Modexp/s 6504 11074 41426 149464 32608 77002
Modexp/s (scaled) 13052 15282 41426 46973 N/A N/A
Modexp/s/W 43 71 205 612 250 616
Modexp/s/USD 26 64 414 298 131 385
Table I
COMPARISONS OF MODULAR EXPONENTIATION PERFORMANCES ON VARIOUS CPU AND GPU IMPLEMENTATIONS.

looking at the numbers, looks like GPU is only around 2x CPU???

Yes it seems so, and the performance/watt is basically at the same level. But you can't just plug 6 AMD CPUs to a single rig like what you'd do with GPUs.

I'm guessing this is because CPUs have AVX which is 256 bit, and that makes them very good at dealing with big numbers compared to GPUs which can only support 64bit natively.

markm

legendary

Activity: 2940

Merit: 1090

Quote from: mustyoshi on July 17, 2013, 11:57:30 AM

Quote from: mikaelh on July 17, 2013, 03:09:31 AM

I guess you failed to notice that CUMP only supports addition, subtraction and multiplication. You need division and other special functions for Primecoin mining.

Division is just subtraction with a counter.

Unless you like to optimise it, like maybe by using some shifts or whatever (albeit maybe also with counters involved in that part too?)

I have often wondered if the Trachtenburg Speed System of basic mathematics is any faster on machines than other approaches? Maybe more useful for base ten than binary though?

-MarkM-

mustyoshi

sr. member

Activity: 287

Merit: 250

Quote from: mikaelh on July 17, 2013, 03:09:31 AM

I guess you failed to notice that CUMP only supports addition, subtraction and multiplication. You need division and other special functions for Primecoin mining.

Division is just subtraction with a counter.

solracx

sr. member

Activity: 294

Merit: 250

Quote from: oxfeeefeee on July 17, 2013, 11:31:59 AM

Another paper with a nice table, gives you some idea on how things will go.

http://trone.di.fc.ul.pt/images/e/e2/ASAP11-paper.pdf

GTS8800 [17] GTX8800 [10] GTX260 (This paper) GTX580 [39] Intel W3565 [46] AMD Phenom II 1090T [46]
Cores 112 128 192 512 4 6
Frequency (MHz) 1188 1350 1294 1544 3200 3200
Price (USD) 250 173 100 500 300 200
TDP (W) 150 155 202 244 130 125
GFLOPS 399 518 715 1581 102 153
Modexp/s 6504 11074 41426 149464 32608 77002
Modexp/s (scaled) 13052 15282 41426 46973 N/A N/A
Modexp/s/W 43 71 205 612 250 616
Modexp/s/USD 26 64 414 298 131 385
Table I
COMPARISONS OF MODULAR EXPONENTIATION PERFORMANCES ON VARIOUS CPU AND GPU IMPLEMENTATIONS.

looking at the numbers, looks like GPU is only around 2x CPU???

oxfeeefeee

member

Activity: 73

Merit: 10

Another paper with a nice table, gives you some idea on how things will go.

http://trone.di.fc.ul.pt/images/e/e2/ASAP11-paper.pdf

GTS8800 [17] GTX8800 [10] GTX260 (This paper) GTX580 [39] Intel W3565 [46] AMD Phenom II 1090T [46]
Cores 112 128 192 512 4 6
Frequency (MHz) 1188 1350 1294 1544 3200 3200
Price (USD) 250 173 100 500 300 200
TDP (W) 150 155 202 244 130 125
GFLOPS 399 518 715 1581 102 153
Modexp/s 6504 11074 41426 149464 32608 77002
Modexp/s (scaled) 13052 15282 41426 46973 N/A N/A
Modexp/s/W 43 71 205 612 250 616
Modexp/s/USD 26 64 414 298 131 385
Table I
COMPARISONS OF MODULAR EXPONENTIATION PERFORMANCES ON VARIOUS CPU AND GPU IMPLEMENTATIONS.

oxfeeefeee

member

Activity: 73

Merit: 10

Quote from: meta.p02 on July 17, 2013, 08:12:38 AM

Quote from: oxfeeefeee on July 17, 2013, 08:07:17 AM

Thanks，so this is what we are currently doing with our CPUs now?

Code:

 (Initialize) Set (x; y; f) = (1; a; e).
 (Loop) While f > 0, do as follows:
    { If f%2 = 0 then replace (x; y; f) by (x; y^2 %n; f/2),
    { otherwise replace (x; y; f) by (xy%n; y; f-1).
 (Terminate) Return x

from http://people.reed.edu/~jerry/361/lectures/bigprimes.pdf

That's about it. Currently we need to find a way to port it to the GPU so that the GPU can run multiple copies of (1; 2; e) in parallel.

It seems this is a very well studied problem, A search showed a lot of papers about this, e.g. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.7236&rep=rep1&type=pdf
So now again I'm confident that a GPU miner is coming soon.

meta.p02

full member

Activity: 196

Merit: 100

Quote from: oxfeeefeee on July 17, 2013, 08:07:17 AM

Thanks，so this is what we are currently doing with our CPUs now?

Code:

 (Initialize) Set (x; y; f) = (1; a; e).
 (Loop) While f > 0, do as follows:
    { If f%2 = 0 then replace (x; y; f) by (x; y^2 %n; f/2),
    { otherwise replace (x; y; f) by (xy%n; y; f-1).
 (Terminate) Return x

from http://people.reed.edu/~jerry/361/lectures/bigprimes.pdf

That's about it. Currently we need to find a way to port it to the GPU so that the GPU can run multiple copies of (1; 2; e) in parallel.

oxfeeefeee

member

Activity: 73

Merit: 10

Quote from: meta.p02 on July 17, 2013, 07:39:22 AM

Look up fast modular exponentation.

Basically, you don't want to be storing the entire a^p, because it's going to be on the order of 10^90 digits. So, after every step of exponentation, you have to reduce it modulo m.

Thanks，so this is what we are currently doing with our CPUs now?

Code:

 (Initialize) Set (x; y; f) = (1; a; e).
 (Loop) While f > 0, do as follows:
    { If f%2 = 0 then replace (x; y; f) by (x; y^2 %n; f/2),
    { otherwise replace (x; y; f) by (xy%n; y; f-1).
 (Terminate) Return x

from http://people.reed.edu/~jerry/361/lectures/bigprimes.pdf

meta.p02

full member

Activity: 196

Merit: 100

Quote from: oxfeeefeee on July 17, 2013, 04:39:53 AM

Quote from: mikaelh on July 17, 2013, 03:09:31 AM

I guess you failed to notice that CUMP only supports addition, subtraction and multiplication. You need division and other special functions for Primecoin mining.

Did a very quick look at the code, it seems that the majority of calculation is BN_mod_exp operation, which is r=a^p%m. while CUMP doesn't support % yet, we can still let GPU do the a^p part. that would still be a lot more faster right? unless there is some fast algorithm that requires we do a^p%m altogether.

again, please correct me if I'm wrong.

Look up fast modular exponentation.

Basically, you don't want to be storing the entire a^p, because it's going to be on the order of 10^90 digits. So, after every step of exponentation, you have to reduce it modulo m.

oxfeeefeee

member

Activity: 73

Merit: 10

Quote from: mikaelh on July 17, 2013, 03:09:31 AM

I guess you failed to notice that CUMP only supports addition, subtraction and multiplication. You need division and other special functions for Primecoin mining.

Did a very quick look at the code, it seems that the majority of calculation is BN_mod_exp operation, which is r=a^p%m. while CUMP doesn't support % yet, we can still let GPU do the a^p part. that would still be a lot more faster right? unless there is some fast algorithm that requires we do a^p%m altogether.

again, please correct me if I'm wrong.

paulthetafy

hero member

Activity: 820

Merit: 1000

I think it's a fair bet that XPM has already been implemented in both OpenCL and CUDA. They just haven't been publicly released yet

PTT

oxfeeefeee

member

Activity: 73

Merit: 10

Quote from: mikaelh on July 17, 2013, 03:09:31 AM

I guess you failed to notice that CUMP only supports addition, subtraction and multiplication. You need division and other special functions for Primecoin mining.

Thanks for the answer, It's helpful even it's "no" because it provides useful information for the miners to plan their investments.

mikaelh

sr. member

Activity: 301

Merit: 250

I guess you failed to notice that CUMP only supports addition, subtraction and multiplication. You need division and other special functions for Primecoin mining.

oxfeeefeee

member

Activity: 73

Merit: 10

As you know mikaelh's build uses GMP to do the BigNum calculation, which gives a huge performance boost.

And I googled "GMP GPU" and found this CUMP http://www.hpcs.cs.tsukuba.ac.jp/~nakayama/cump/ which basically is a CUDA version of GMP. So it's reasonable to assume that a build using CUDA is very easy to make. according to this http://www.hpcs.cs.tsukuba.ac.jp/~nakayama/cump/index.php?CUMP%20Performance%20Evaluation, it wouldn't be so dramatic compared to the GPU revolution happened to the other coins.

Anyone worked with this CUMP library before? it there any blockers for this? or is someone already working on it?

Topic: [XPM] A GPU miner for XPM is around the corner? (via CUMP). (Read 4941 times)