[XPM] Primecoin Built-in Miner Sieve Performance Issue - page 10.

Chemisist

member

Activity: 99

Merit: 10

Quote from: AgentME on July 12, 2013, 04:50:03 PM

Quote from: ?? on ??

After playing with the code a couple of hours, I think the PPS figure is misleading.

As far as I understood it, the algorithm is split in two parts : building a sieve, and trying each factor stored in the sieve with three different tests.
Each part has a variable execution time (determined by the number of prime factors you can find).
Recent modification of the code (and Chemisist's proposal) put a timeout in these parts so that the execution time invested in a possible solution is capped.
By capping these parts, there is a chance to abort the testing of a valid solution.

Thus, a complex trade-off arises : invest more time in the current candidate, or jump to the next candidate after a given period of time.
Chemisist suggests an adaptative approach (I just skimmed through your code, I may have misunderstood), that's very interesting

However, IMHO, that does only marginally improve the chances of finding a solution (read block), especially as difficulty rises.
Why ? Because the distribution of prime factors is very very difficult to predict.
Play along with the timeout values, this can lead to multiply or divide your PPS by 10.
But that does not improve your chances of finding a solution.

Anyway, here is my advice : do not focus on the PPS, it is not reliable measure of performance.

Thumbs up to Sunny King (and his team) for designing this coin. The proof-of-work proposed is brilliant and very interesting to play with.

Agreed. I noticed earlier if you cap off the sieve weaving time to almost nothing, you can easily get absurdly high PPS values but you won't actually earn blocks faster. There's a trade-off that needs to be analyzed closer.

The high pps number is due to the very low hard cap on the time set to check the actual sieve that has been produced (it's set to 10 ms in the current master branch on github, line 372 in prime.cpp). So with the very short weaving time of whatever you decide to set, the sieve has a very large number of prime candidates, most of which satisfy the following check:

Code:

if(TargetGetLength(nProbablePrimeChainLength) >= 1)
nPrimesHit++;

but many of which are not actually primes. Anyway, I'm currently testing my code against Sunny's on the testnet (with the large thread count issue potentially fixed, fingers crossed) to see which can find more blocks in 10 minutes on my T9300 laptop. Results to come shortly

drummerjdb666

full member

Activity: 244

Merit: 101

Quote from: dudeguy on July 12, 2013, 05:01:04 PM

Quote from: maco on July 12, 2013, 04:45:22 PM

30 hours + mining ( not a single block mined ) I just started trying out Sunny's update: v0.1.1

On July 10, 2013 - I mined 5 blocks with 400 - 600 PPS .. note: I lost 4 orphans in that action, so it was technically 9 blocks mined, but 4 were lost, so 5 have been check marked. Anyways, the interesting thing is, I haven't touched a block. Anyone else getting the same results?

Find an optimized QT client here for your processor. I'm on an i3 3225 and I'm getting more than you since the first optimized QT I downloaded. Prior to that though (July 10th) I was getting 0 blocks mined and was probably the most unlucky person mining primecoin.

I have an I3 that has produced 6 blocks since start... the second pair of 3 were on the first mod client though and orphaned Sad

That was actually my gf's laptop.. she was pissed lol! only 63 primes instead of 100

altsay

sr. member

Activity: 359

Merit: 250

Quote from: ?? on ??

Play along with the timeout values, this can lead to multiply or divide your PPS by 10.
But that does not improve your chances of finding a solution.

What do you mean by timeout values?

dudeguy

member

Activity: 182

Merit: 10

Quote from: maco on July 12, 2013, 04:45:22 PM

30 hours + mining ( not a single block mined ) I just started trying out Sunny's update: v0.1.1

On July 10, 2013 - I mined 5 blocks with 400 - 600 PPS .. note: I lost 4 orphans in that action, so it was technically 9 blocks mined, but 4 were lost, so 5 have been check marked. Anyways, the interesting thing is, I haven't touched a block. Anyone else getting the same results?

Find an optimized QT client here for your processor. I'm on an i3 3225 and I'm getting more than you since the first optimized QT I downloaded. Prior to that though (July 10th) I was getting 0 blocks mined and was probably the most unlucky person mining primecoin.

jlspartz

full member

Activity: 205

Merit: 100

Got a quadro 5000 I'm waiting to use. Someone already working on cuda or opencl?

AgentME

member

Activity: 84

Merit: 10

Quote from: ?? on ??

After playing with the code a couple of hours, I think the PPS figure is misleading.

As far as I understood it, the algorithm is split in two parts : building a sieve, and trying each factor stored in the sieve with three different tests.
Each part has a variable execution time (determined by the number of prime factors you can find).
Recent modification of the code (and Chemisist's proposal) put a timeout in these parts so that the execution time invested in a possible solution is capped.
By capping these parts, there is a chance to abort the testing of a valid solution.

Thus, a complex trade-off arises : invest more time in the current candidate, or jump to the next candidate after a given period of time.
Chemisist suggests an adaptative approach (I just skimmed through your code, I may have misunderstood), that's very interesting

However, IMHO, that does only marginally improve the chances of finding a solution (read block), especially as difficulty rises.
Why ? Because the distribution of prime factors is very very difficult to predict.
Play along with the timeout values, this can lead to multiply or divide your PPS by 10.
But that does not improve your chances of finding a solution.

Anyway, here is my advice : do not focus on the PPS, it is not reliable measure of performance.

Thumbs up to Sunny King (and his team) for designing this coin. The proof-of-work proposed is brilliant and very interesting to play with.

Agreed. I noticed earlier if you cap off the sieve weaving time to almost nothing, you can easily get absurdly high PPS values but you won't actually earn blocks faster. There's a trade-off that needs to be analyzed closer.

maco

sr. member

Activity: 294

Merit: 250

30 hours + mining ( not a single block mined ) I just started trying out Sunny's update: v0.1.1

On July 10, 2013 - I mined 5 blocks with 400 - 600 PPS .. note: I lost 4 orphans in that action, so it was technically 9 blocks mined, but 4 were lost, so 5 have been check marked. Anyways, the interesting thing is, I haven't touched a block. Anyone else getting the same results?

Chemisist

member

Activity: 99

Merit: 10

Quote from: ?? on ??

After playing with the code a couple of hours, I think the PPS figure is misleading.

As far as I understood it, the algorithm is split in two parts : building a sieve, and trying each factor stored in the sieve with three different tests.
Each part has a variable execution time (determined by the number of prime factors you can find).
Recent modification of the code (and Chemisist's proposal) put a timeout in these parts so that the execution time invested in a possible solution is capped.
By capping these parts, there is a chance to abort the testing of a valid solution.

Thus, a complex trade-off arises : invest more time in the current candidate, or jump to the next candidate after a given period of time.
Chemisist suggests an adaptative approach (I just skimmed through your code, I may have misunderstood), that's very interesting

However, IMHO, that does only marginally improve the chances of finding a solution (read block), especially as difficulty rises.
Why ? Because the distribution of prime factors is very very difficult to predict.
Play along with the timeout values, this can lead to multiply or divide your PPS by 10.
But that does not improve your chances of finding a solution.

Anyway, here is my advice : do not focus on the PPS, it is not reliable measure of performance.

Thumbs up to Sunny King (and his team) for designing this coin. The proof-of-work proposed is brilliant and very interesting to play with.

I have a hard timeout (sieveBuildTime) on the weaving of the sieve (generally results in less than 1000-2000 prime candidates to check) and a soft timeout on the analysis of the resultant sieve (3* sieveBuildTime) which allows for all the prime candidates to be checked. By completing the sieve (takes ~2-5 seconds depending on the machine) you reduce the 1000-2000 prime candidates down to 5-50. However, the additional time taken to weave the sieve to reduce the numbers from (best case) 2000 -> 5 is far greater than the time it would take to check these additional 1995 prime candidates. Thus, with my approach I try to equate the time taken to build the sieve with the time taken to analyze it, providing the hard cap for the weave and the soft cap for the checking.

Sunny, can you weigh in on this?

oroqen

sr. member

Activity: 280

Merit: 250

Quote from: dudeguy on July 12, 2013, 04:19:33 PM

Quote from: anonppcoin on July 12, 2013, 04:16:37 PM

Quote from: drummerjdb666 on July 12, 2013, 03:34:22 PM

k well either way.. for the noobs... if you install sunny's new client..

don't use this https://www.dropbox.com/s/f7fu0u0yk4i09il/primecoin0712v2-ivyonly.zip

it is making the wallet hang on splash screen..

I just copied the files into the wallet folder.. no worky

Are you sure you're on Ivy? Because 0710-avx is an mtune build while 0712v2 will only work on Ivy Bridge cpus.

Weird. I'm getting 50-100 higher PPS on your Sandy+Ivy V2 build than your Ivy only V2 build? Either way they are like the V2 Rockets of QT clients right now!

People need to donate to anonppcoin. He (assuming) put a lot of hard work into building.

There's several benchmarks on www.phoronix.com on newer versions of gcc with ivy bridge support bench-marked against older versions. From my own experience Gcc4.8 with core-avx-i was a performace loss regardless of the software it compiled compained too 4.6 with corei7-avx, I can only speak for Linux on that account, as always your mileage may vary.

drummerjdb666

full member

Activity: 244

Merit: 101

Quote from: anonppcoin on July 12, 2013, 01:52:41 PM

My latest Windows builds. From Chemisist source:

Tuned for Sandy and Ivy Intel Core processors (AVX), O3:

https://www.dropbox.com/s/18bgecwqzsmwsh2/primecoin0712v2-avx.zip

Ivy Bridge ONLY build:

https://www.dropbox.com/s/f7fu0u0yk4i09il/primecoin0712v2-ivyonly.zip

XPM: AR2BpBnitqXudN67Ncuc9FfYVT8u9jNe7a

If i use your build alone.. i get the cannot find port error.. use listen=0.... but if I overwrite ur file on top of zalfrin's it works and produces almost 2000pps on "IB -3770k"

hopefully no more orphans

dudeguy

member

Activity: 182

Merit: 10

Quote from: anonppcoin on July 12, 2013, 04:16:37 PM

Quote from: drummerjdb666 on July 12, 2013, 03:34:22 PM

k well either way.. for the noobs... if you install sunny's new client..

don't use this https://www.dropbox.com/s/f7fu0u0yk4i09il/primecoin0712v2-ivyonly.zip

it is making the wallet hang on splash screen..

I just copied the files into the wallet folder.. no worky

Are you sure you're on Ivy? Because 0710-avx is an mtune build while 0712v2 will only work on Ivy Bridge cpus.

Weird. I'm getting 50-100 higher PPS on your Sandy+Ivy V2 build than your Ivy only V2 build? Either way they are like the V2 Rockets of QT clients right now!

People need to donate to anonppcoin. He (assuming) put a lot of hard work into building.

dudeguy

member

Activity: 182

Merit: 10

Quote from: TheSwede75 on July 12, 2013, 03:52:01 PM

Quote from: Vilepickle on July 12, 2013, 03:49:10 PM

Quote from: Chemisist on July 12, 2013, 03:36:43 PM

Quote from: tocket on July 12, 2013, 03:21:32 PM

I also notice a decline in PPS using Chemisist's code compare to the official (0.11). Getting an average of about 1800 PPS vs 2200 with official. I'm only running 12 threads as well, so it's not just affecting users with high thread counts.

Well, you're running with more threads than I was able to test (highest I got to was 8 on my core i7-950), so I'm not entirely sure why this is, though it could be that all threads are trying to access a single variable which is determining how long to let the sieve be woven for. I suppose that all these threads could end up blocking each other and cause a significant portion of idle time. It would be far more effective to have a sieve weaving time variable (see line 11 in my version of the prime.cpp: static volatile int sieveBuildTime = 0;) for each individual boost thread, but I'm not sure how to do this as I am entirely unfamiliar with the Boost library (I'm not a c++ programmer Sad

)

Tried yours with 30 threads on one box, I'm only pulling about 1800pps. Latest official repo nets around 4000.

I suspect that optimizing the CPU miner code much further is of little consequence to most people. Right now Cluster instances and a few large miners are totally owning the coin with 100.000s+ PPS.

Next step in 'mainstreaming' the coin is a CUDA/GPU implementation.

So basically we're fubard until GPU mining/mining pools come into play?

anonppcoin

newbie

Activity: 48

Merit: 0

Quote from: drummerjdb666 on July 12, 2013, 03:34:22 PM

k well either way.. for the noobs... if you install sunny's new client..

don't use this https://www.dropbox.com/s/f7fu0u0yk4i09il/primecoin0712v2-ivyonly.zip

it is making the wallet hang on splash screen..

I just copied the files into the wallet folder.. no worky

Are you sure you're on Ivy? Because 0710-avx is an mtune build while 0712v2 will only work on Ivy Bridge cpus.

kimosan

hero member

Activity: 644

Merit: 501

Quote from: fabrizziop on July 12, 2013, 12:31:20 PM

-O2 generic build: (generally -O3 gives some errors at end but seems to run fine, I'll go the safe route first).

https://www.dropbox.com/s/stfb9t66tnp6yld/primecoin-chemisist-mod.rar

Been running your build for about 2 hrs. Found a block 10 minutes ago. i5 2500k 3-cores | 650-900 pps

(-03 was slower)

Thank you for posting it.

TheSwede75

full member

Activity: 224

Merit: 100

Quote from: Vilepickle on July 12, 2013, 03:49:10 PM

Quote from: Chemisist on July 12, 2013, 03:36:43 PM

Quote from: tocket on July 12, 2013, 03:21:32 PM

I also notice a decline in PPS using Chemisist's code compare to the official (0.11). Getting an average of about 1800 PPS vs 2200 with official. I'm only running 12 threads as well, so it's not just affecting users with high thread counts.

Well, you're running with more threads than I was able to test (highest I got to was 8 on my core i7-950), so I'm not entirely sure why this is, though it could be that all threads are trying to access a single variable which is determining how long to let the sieve be woven for. I suppose that all these threads could end up blocking each other and cause a significant portion of idle time. It would be far more effective to have a sieve weaving time variable (see line 11 in my version of the prime.cpp: static volatile int sieveBuildTime = 0;) for each individual boost thread, but I'm not sure how to do this as I am entirely unfamiliar with the Boost library (I'm not a c++ programmer Sad

)

Tried yours with 30 threads on one box, I'm only pulling about 1800pps. Latest official repo nets around 4000.

I suspect that optimizing the CPU miner code much further is of little consequence to most people. Right now Cluster instances and a few large miners are totally owning the coin with 100.000s+ PPS.

Next step in 'mainstreaming' the coin is a CUDA/GPU implementation.

mumus

sr. member

Activity: 291

Merit: 250

Quote from: Sunny King on July 12, 2013, 02:48:07 PM

Please note primespersecond is not an accurate measure of actual performance. It has some correlations but if sieve round is reduced too short you could see inflated pps but not really faster performance. Only block rate is an accurate measure of true performance.

I'm also testing my Windows build from Chemisist's version and I can report that indeed there is an improvement in the PPS rate but I can achieve the same or even better result if I'm adding gensieveroundlimitms=500 to the primecoin.conf file and using the official 0.1.1 release. In Chemisist code the default is set to 400 instead of 1000 compared to Sunny's release.
So I'm not 100% sure that this new optimizations really makes any change. I would suggest for everybody to check these new versions using gensieveroundlimitms=1000 in their config file.
Based on Sunny's post lowering gensieveroundlimitms value may improve PPS but ....

Vilepickle

member

Activity: 76

Merit: 10

Quote from: Chemisist on July 12, 2013, 03:36:43 PM

Quote from: tocket on July 12, 2013, 03:21:32 PM

I also notice a decline in PPS using Chemisist's code compare to the official (0.11). Getting an average of about 1800 PPS vs 2200 with official. I'm only running 12 threads as well, so it's not just affecting users with high thread counts.

Well, you're running with more threads than I was able to test (highest I got to was 8 on my core i7-950), so I'm not entirely sure why this is, though it could be that all threads are trying to access a single variable which is determining how long to let the sieve be woven for. I suppose that all these threads could end up blocking each other and cause a significant portion of idle time. It would be far more effective to have a sieve weaving time variable (see line 11 in my version of the prime.cpp: static volatile int sieveBuildTime = 0;) for each individual boost thread, but I'm not sure how to do this as I am entirely unfamiliar with the Boost library (I'm not a c++ programmer Sad

)

Tried yours with 30 threads on one box, I'm only pulling about 1800pps. Latest official repo nets around 4000.

TheSwede75

full member

Activity: 224

Merit: 100

Quote from: Chemisist on July 12, 2013, 03:42:40 PM

Quote from: fabrizziop on July 12, 2013, 03:39:21 PM

Quote from: Chemisist on July 12, 2013, 03:36:43 PM

Quote from: tocket on July 12, 2013, 03:21:32 PM

I also notice a decline in PPS using Chemisist's code compare to the official (0.11). Getting an average of about 1800 PPS vs 2200 with official. I'm only running 12 threads as well, so it's not just affecting users with high thread counts.

Well, you're running with more threads than I was able to test (highest I got to was 8 on my core i7-950), so I'm not entirely sure why this is, though it could be that all threads are trying to access a single variable which is determining how long to let the sieve be woven for. I suppose that all these threads could end up blocking each other and cause a significant portion of idle time. It would be far more effective to have a sieve weaving time variable for each individual boost thread, but I'm not sure how to do this as I am entirely unfamiliar with the Boost library (I'm not a c++ programmer Sad

)

Actually with my FX 8350 I get just a bit less speed with your code than the official 0.11, and with my sempron 145 I get around 20% less with your code. Maybe it doesn't work well for AMD architectures?.

I wish I had a better answer than "I don't know." Unfortunately, I've done zero testing on the amd platform, sorry!

Ironically my AMD K10 get better speed with O2 then official while my 1100T get much better speed with the O2 build. I also suspect memory speed plays a fairly large factor.

Chemisist

member

Activity: 99

Merit: 10

Quote from: fabrizziop on July 12, 2013, 03:39:21 PM

Quote from: Chemisist on July 12, 2013, 03:36:43 PM

Quote from: tocket on July 12, 2013, 03:21:32 PM

I also notice a decline in PPS using Chemisist's code compare to the official (0.11). Getting an average of about 1800 PPS vs 2200 with official. I'm only running 12 threads as well, so it's not just affecting users with high thread counts.

Well, you're running with more threads than I was able to test (highest I got to was 8 on my core i7-950), so I'm not entirely sure why this is, though it could be that all threads are trying to access a single variable which is determining how long to let the sieve be woven for. I suppose that all these threads could end up blocking each other and cause a significant portion of idle time. It would be far more effective to have a sieve weaving time variable for each individual boost thread, but I'm not sure how to do this as I am entirely unfamiliar with the Boost library (I'm not a c++ programmer Sad

)

Actually with my FX 8350 I get just a bit less speed with your code than the official 0.11, and with my sempron 145 I get around 20% less with your code. Maybe it doesn't work well for AMD architectures?.

I wish I had a better answer than "I don't know." Unfortunately, I've done zero testing on the amd platform, sorry!

drummerjdb666

full member

Activity: 244

Merit: 101

Quote from: TheSwede75 on July 12, 2013, 02:50:22 PM

Quote from: Sunny King on July 12, 2013, 02:48:07 PM

Please note primespersecond is not an accurate measure of actual performance. It has some correlations but if sieve round is reduced too short you could see inflated pps but not really faster performance. Only block rate is an accurate measure of true performance.

Understood. Now we just need someone to build a CUDA implementation and release it to the public before some basement hacker gets one working and runs it for himself. The theoretical speed of CUDA Mersenne solving is something like 1000x faster then CPU. A single SLI Nvidia rig could throw 25 million PPS if optimized on CUDA.

AWWWW!!! This is going to make me really regret trading my gtx660 for a 6950 +30$ Sad

isn't it. ISN'T IT!!!!.... ISN'T IT!?!!!..... ISN'T IT?!!!

lmao. grr.

Topic: [XPM] Primecoin Built-in Miner Sieve Performance Issue - page 10. (Read 69182 times)