Author

Topic: [XPM] A GPU miner for XPM is around the corner? (via CUMP). (Read 4937 times)

newbie
Activity: 23
Merit: 0
Agreed that XPM mining is a good addition to BTC/LTC mining!  Except for the peopel who bought the cruddiest processors they could find when building their rigs...

Even though only ADD/SUB/MUL are in CUMP, one might be able to use a binomial method (guess/check/converge) to implement DIV without much difficulty though I don't know how well the operation would perform.

There are quite a few factoring algos/sieves on github using cuda, though my cursory glance was showing non-arbitrary precision routines.  I thought floating point arithmetic has no place in integer factorization?!
sr. member
Activity: 287
Merit: 250
I guess you failed to notice that CUMP only supports addition, subtraction and multiplication. You need division and other special functions for Primecoin mining.
Quote

Division is just subtraction with a counter.

That's beautiful. :')
newbie
Activity: 49
Merit: 0
I think XPM is a good addition to current GPU rigs, now your CPU won't be sleeping all the time, right?
legendary
Activity: 2940
Merit: 1090
If a GPU is only twice as fast as a CPU, maybe putting more CPUs on one's motherboard might be more cost-effective than putting one or more GPUs in a machine?

Or think blade servers, two or more CPUs per blade, would GPUs really be more cost-effective?

-MarkM-
newbie
Activity: 23
Merit: 0
Here is a few year old example where an implementation of ECM was developed and compared against the standard CPU and the result was roughly 2x faster.  I know we're not talking about ECM here but again it's suggestive of what one might expect.

http://eecm.cr.yp.to/gpuecm-20090127.pdf

modular arithmetic is very easy to implement with the four basic arithmetic functions, so I'm not sure what the holdup is around that?
legendary
Activity: 2156
Merit: 1072
Crypto is the separation of Power and State.
I guess you failed to notice that CUMP only supports addition, subtraction and multiplication. You need division and other special functions for Primecoin mining.
Quote

Division is just subtraction with a counter.
member
Activity: 73
Merit: 10
Another paper with a nice table, gives you some idea on how things will go.

http://trone.di.fc.ul.pt/images/e/e2/ASAP11-paper.pdf

GTS8800 [17] GTX8800 [10] GTX260 (This paper) GTX580 [39] Intel W3565 [46] AMD Phenom II 1090T [46]
Cores 112 128 192 512 4 6
Frequency (MHz) 1188 1350 1294 1544 3200 3200
Price (USD) 250 173 100 500 300 200
TDP (W) 150 155 202 244 130 125
GFLOPS 399 518 715 1581 102 153
Modexp/s 6504 11074 41426 149464 32608 77002
Modexp/s (scaled) 13052 15282 41426 46973 N/A N/A
Modexp/s/W 43 71 205 612 250 616
Modexp/s/USD 26 64 414 298 131 385
Table I
COMPARISONS OF MODULAR EXPONENTIATION PERFORMANCES ON VARIOUS CPU AND GPU IMPLEMENTATIONS.

looking at the numbers, looks like GPU is only around 2x CPU???

Yes it seems so, and the performance/watt is basically at the same level. But you can't just plug 6 AMD CPUs to a single rig like what you'd do with GPUs.

I'm guessing this is because CPUs have AVX which is 256 bit, and that makes them very good at dealing with big numbers compared to GPUs which can only support 64bit natively.
legendary
Activity: 2940
Merit: 1090
I guess you failed to notice that CUMP only supports addition, subtraction and multiplication. You need division and other special functions for Primecoin mining.
Division is just subtraction with a counter.

Unless you like to optimise it, like maybe by using some shifts or whatever (albeit maybe also with counters involved in that part too?)

I have often wondered if the Trachtenburg Speed System of basic mathematics is any faster on machines than other approaches? Maybe more useful for base ten than binary though?

-MarkM-
sr. member
Activity: 287
Merit: 250
I guess you failed to notice that CUMP only supports addition, subtraction and multiplication. You need division and other special functions for Primecoin mining.
Division is just subtraction with a counter.
sr. member
Activity: 294
Merit: 250
Another paper with a nice table, gives you some idea on how things will go.

http://trone.di.fc.ul.pt/images/e/e2/ASAP11-paper.pdf

GTS8800 [17] GTX8800 [10] GTX260 (This paper) GTX580 [39] Intel W3565 [46] AMD Phenom II 1090T [46]
Cores 112 128 192 512 4 6
Frequency (MHz) 1188 1350 1294 1544 3200 3200
Price (USD) 250 173 100 500 300 200
TDP (W) 150 155 202 244 130 125
GFLOPS 399 518 715 1581 102 153
Modexp/s 6504 11074 41426 149464 32608 77002
Modexp/s (scaled) 13052 15282 41426 46973 N/A N/A
Modexp/s/W 43 71 205 612 250 616
Modexp/s/USD 26 64 414 298 131 385
Table I
COMPARISONS OF MODULAR EXPONENTIATION PERFORMANCES ON VARIOUS CPU AND GPU IMPLEMENTATIONS.

looking at the numbers, looks like GPU is only around 2x CPU???
member
Activity: 73
Merit: 10
Another paper with a nice table, gives you some idea on how things will go.

http://trone.di.fc.ul.pt/images/e/e2/ASAP11-paper.pdf

GTS8800 [17] GTX8800 [10] GTX260 (This paper) GTX580 [39] Intel W3565 [46] AMD Phenom II 1090T [46]
Cores 112 128 192 512 4 6
Frequency (MHz) 1188 1350 1294 1544 3200 3200
Price (USD) 250 173 100 500 300 200
TDP (W) 150 155 202 244 130 125
GFLOPS 399 518 715 1581 102 153
Modexp/s 6504 11074 41426 149464 32608 77002
Modexp/s (scaled) 13052 15282 41426 46973 N/A N/A
Modexp/s/W 43 71 205 612 250 616
Modexp/s/USD 26 64 414 298 131 385
Table I
COMPARISONS OF MODULAR EXPONENTIATION PERFORMANCES ON VARIOUS CPU AND GPU IMPLEMENTATIONS.
member
Activity: 73
Merit: 10

Thanks,so this is what we are currently doing with our CPUs now?
Code:
 (Initialize) Set (x; y; f) = (1; a; e).
 (Loop) While f > 0, do as follows:
    { If f%2 = 0 then replace (x; y; f) by (x; y^2 %n; f/2),
    { otherwise replace (x; y; f) by (xy%n; y; f-1).
 (Terminate) Return x
from http://people.reed.edu/~jerry/361/lectures/bigprimes.pdf

That's about it. Currently we need to find a way to port it to the GPU so that the GPU can run multiple copies of (1; 2; e) in parallel.


It seems this is a very well studied problem, A search showed a lot of papers about this, e.g. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.7236&rep=rep1&type=pdf
So now again I'm confident that a GPU miner is coming soon.
full member
Activity: 196
Merit: 100

Thanks,so this is what we are currently doing with our CPUs now?
Code:
 (Initialize) Set (x; y; f) = (1; a; e).
 (Loop) While f > 0, do as follows:
    { If f%2 = 0 then replace (x; y; f) by (x; y^2 %n; f/2),
    { otherwise replace (x; y; f) by (xy%n; y; f-1).
 (Terminate) Return x
from http://people.reed.edu/~jerry/361/lectures/bigprimes.pdf

That's about it. Currently we need to find a way to port it to the GPU so that the GPU can run multiple copies of (1; 2; e) in parallel.
member
Activity: 73
Merit: 10
Look up fast modular exponentation.

Basically, you don't want to be storing the entire a^p, because it's going to be on the order of 10^90 digits. So, after every step of exponentation, you have to reduce it modulo m.

Thanks,so this is what we are currently doing with our CPUs now?
Code:
 (Initialize) Set (x; y; f) = (1; a; e).
 (Loop) While f > 0, do as follows:
    { If f%2 = 0 then replace (x; y; f) by (x; y^2 %n; f/2),
    { otherwise replace (x; y; f) by (xy%n; y; f-1).
 (Terminate) Return x
from http://people.reed.edu/~jerry/361/lectures/bigprimes.pdf
full member
Activity: 196
Merit: 100
I guess you failed to notice that CUMP only supports addition, subtraction and multiplication. You need division and other special functions for Primecoin mining.

Did a very quick look at the code, it seems that the majority of calculation is BN_mod_exp operation, which is r=a^p%m. while CUMP doesn't support % yet, we can still let GPU do the a^p part. that would still be a lot more faster right? unless there is some fast algorithm that requires we do a^p%m altogether.

again, please correct me if I'm wrong.

Look up fast modular exponentation.

Basically, you don't want to be storing the entire a^p, because it's going to be on the order of 10^90 digits. So, after every step of exponentation, you have to reduce it modulo m.
member
Activity: 73
Merit: 10
I guess you failed to notice that CUMP only supports addition, subtraction and multiplication. You need division and other special functions for Primecoin mining.

Did a very quick look at the code, it seems that the majority of calculation is BN_mod_exp operation, which is r=a^p%m. while CUMP doesn't support % yet, we can still let GPU do the a^p part. that would still be a lot more faster right? unless there is some fast algorithm that requires we do a^p%m altogether.

again, please correct me if I'm wrong.
hero member
Activity: 820
Merit: 1000
I think it's a fair bet that XPM has already been implemented in both OpenCL and CUDA.  They just haven't been publicly released yet

PTT
member
Activity: 73
Merit: 10
I guess you failed to notice that CUMP only supports addition, subtraction and multiplication. You need division and other special functions for Primecoin mining.

Thanks for the answer, It's helpful even it's "no" because it provides useful information for the miners to plan their investments.
sr. member
Activity: 301
Merit: 250
I guess you failed to notice that CUMP only supports addition, subtraction and multiplication. You need division and other special functions for Primecoin mining.
member
Activity: 73
Merit: 10
As you know mikaelh's build uses GMP to do the BigNum calculation, which gives a huge performance boost.

And I googled "GMP GPU" and found this CUMP http://www.hpcs.cs.tsukuba.ac.jp/~nakayama/cump/ which basically is a CUDA version of GMP. So it's reasonable to assume that a build using CUDA is very easy to make. according to this http://www.hpcs.cs.tsukuba.ac.jp/~nakayama/cump/index.php?CUMP%20Performance%20Evaluation, it wouldn't be so dramatic compared to the GPU revolution happened to the other coins.

Anyone worked with this CUMP library before? it there any blockers for this? or is someone already working on it? Smiley
Jump to: