Pages:
Author

Topic: [XPM] Primecoin Built-in Miner Sieve Performance Issue - page 8. (Read 69150 times)

newbie
Activity: 48
Merit: 0
Why make it a weave timing parameter and not just a weave count parameter? I think that would be a better metric, as a change in CPU load means the timing parameter's results will change a lot.


Agreed.
member
Activity: 84
Merit: 10
I think he using Chemisist 2nd release that he posted about last page
But he is actually pointing out something more interesting.

With proclimit == number-of-cores the cpu utilization will be 100%.

With proclimit > number-of-cores the same amount of cpu is being used, but the reported pps is higher.
That's because less time is spent weaving with the threads fighting each other, and more false-positives are counted by the primespersec value.
member
Activity: 99
Merit: 10
Alright, so just updated my version (currently on github) such that each thread an independent evolving weave timing parameter.  To compare to mine with Sunny's most recent update, I used the testnet where my version found 30 confirmed blocks in 10 minutes while the original code found 16 confirmed blocks.  I feel that this is a legitimate comparison because there were no other nodes on the test net currently mining (I know this because my client found every continuous block in both cases).  This comparison was performed with a t61p IBM laptop with a T9300 Core 2 Duo processor.  The current difficulty on the testnet is 5.4426.  

Going to test this with the 8 threads on my Core i7 next.

Mind linking to your github? The speed this thread is updating is a bit overwhelming. Thanks in advance.

Updated my profile website to link directly to it, just fyi so you dont have to keep coming back here...

https://github.com/Chemisist/primecoin
member
Activity: 84
Merit: 10
Alright, so just updated my version (currently on github) such that each thread an independent evolving weave timing parameter.  To compare to mine with Sunny's most recent update, I used the testnet where my version found 30 confirmed blocks in 10 minutes while the original code found 16 confirmed blocks.  I feel that this is a legitimate comparison because there were no other nodes on the test net currently mining (I know this because my client found every continuous block in both cases).  This comparison was performed with a t61p IBM laptop with a T9300 Core 2 Duo processor.  The current difficulty on the testnet is 5.4426.  
Why make it a weave timing parameter and not just a weave count parameter? I think that would be a better metric, as a change in CPU load means the timing parameter's results will change a lot.

some of us on #eligius-prime were able with lukes help and others to get it running.. now im just waiting to see if i can actually get a block..

[image]

Can you share your source code?  Did you modify Sunny's algorithm at all?
I think the biggest change in Luke's miner is that it moves the bnTwoInverse calculation out of Weave() and just pre-calculates it for all of the primes in GeneratePrimeTable(). I didn't get much more performance out of porting that change to primecoin but I didn't check too hard.
legendary
Activity: 1843
Merit: 1338
XXXVII Fnord is toast without bread
I could use some fine tuning for the AMD FX series, I have seen others asking for a specific release for this chip but have not seen any that are available.

Otherwise for now I will keep testing the overthreading, I think for PPS rate the 10 threads per core (for me genproclimit=80) has been best so far.


16:30:45

{
"blocks" : 24598,
"currentblocksize" : 1000,
"currentblocktx" : 0,
"errors" : "",
"generate" : true,
"genproclimit" : 80,
"primespersec" : 2492,
"pooledtx" : 0,
"testnet" : false
}
sr. member
Activity: 359
Merit: 250
setgenerate true  80     or......1048576   Shocked

Thinking out of the box, congrats.

Now waiting for Win binaries of Chemisist newest version with improved thread handling.

But it slows down the computer much more than it was on -1. I hope that doesn't increase the possibility of orphans.
member
Activity: 84
Merit: 10
Agreed. I noticed earlier if you cap off the sieve weaving time to almost nothing, you can easily get absurdly high PPS values but you won't actually earn blocks faster. There's a trade-off that needs to be analyzed closer.

The high pps number is due to the very low hard cap on the time set to check the actual sieve that has been produced (it's set to 10 ms in the current master branch on github, line 372 in prime.cpp).  So with the very short weaving time of whatever you decide to set, the sieve has a very large number of prime candidates, most of which satisfy the following check:
Code:
if(TargetGetLength(nProbablePrimeChainLength) >= 1)
     nPrimesHit++;

but many of which are not actually primes.  Anyway, I'm currently testing my code against Sunny's on the testnet (with the large thread count issue potentially fixed, fingers crossed) to see which can find more blocks in 10 minutes on my T9300 laptop.  Results to come shortly
Isn't nProbablePrimeChainLength always zero if N+1 and N-1 both fail the FermatProbablePrimalityTest in ProbableCunninghamChainTest? Or does that get a ton of false positives when the sieve isn't weaved much? (I imagine I probably just answered my own question.)

Anyway, good luck with finding the sweet spot in the trade-off!
sr. member
Activity: 246
Merit: 250
My spoon is too big!
Alright, so just updated my version (currently on github) such that each thread an independent evolving weave timing parameter.  To compare to mine with Sunny's most recent update, I used the testnet where my version found 30 confirmed blocks in 10 minutes while the original code found 16 confirmed blocks.  I feel that this is a legitimate comparison because there were no other nodes on the test net currently mining (I know this because my client found every continuous block in both cases).  This comparison was performed with a t61p IBM laptop with a T9300 Core 2 Duo processor.  The current difficulty on the testnet is 5.4426.  

Going to test this with the 8 threads on my Core i7 next.

Mind linking to your github? The speed this thread is updating is a bit overwhelming. Thanks in advance.
newbie
Activity: 48
Merit: 0
Updated windows build using the new Chemisist source. Tuned for Intel Sandy and Ivy Bridge but compatible with other architecture.

https://www.dropbox.com/s/4k0xmuajxf5i4ly/primecoin0712v3-avx.zip

I'm seeing lower PPS than my v2 builds but I think that weaving will be better overall.
legendary
Activity: 1974
Merit: 1077
^ Will code for Bitcoins
setgenerate true  80     or......1048576   Shocked

Thinking out of the box, congrats.

Now waiting for Win binaries of Chemisist newest version with improved thread handling.
member
Activity: 99
Merit: 10
I think he using Chemisist 2nd release that he posted about last page
But he is actually pointing out something more interesting.

With proclimit == number-of-cores the cpu utilization will be 100%.

With proclimit > number-of-cores the same amount of cpu is being used, but the reported pps is higher.

That's not possible since the pps is based on the actual number of prime candidates processed.  This is correlated with block finding but not equivalent apparently
legendary
Activity: 1843
Merit: 1338
XXXVII Fnord is toast without bread
I think he using Chemisist 2nd release that he posted about last page
But he is actually pointing out something more interesting.

With proclimit == number-of-cores the cpu utilization will be 100%.

With proclimit > number-of-cores the same amount of cpu is being used, but the reported pps is higher.

I used the same process with YAC and was told I was wasting the hash by splitting into "micro-threads" making it more difficult to solve. Or the person that said this didn't want me to use the tactic and they used it for themselves.

see post here.

https://bitcointalksearch.org/topic/annyac-yacoin-yet-another-altcoin-start-is-now-196196
legendary
Activity: 1843
Merit: 1338
XXXVII Fnord is toast without bread

16:00:58

getmininginfo


16:00:58

{
"blocks" : 24436,
"currentblocksize" : 1000,
"currentblocktx" : 0,
"errors" : "",
"generate" : true,
"genproclimit" : 80,
"primespersec" : 2359,
"pooledtx" : 0,
"testnet" : false
}


AMD FX 8120

Perhaps PPS isn't the actual goal after all, since I think this measurement is largely misunderstood.

That's the conclusion that I've reached also.  I'm comparing the actual number of blocks generated over 10 minutes (though I should probably do it for longer) on the testnet between production Primecoin code and what I'm working on.

Though I should point out that with each change in genproclimit I saw better results up to 10x actual cores.

15:43:56
?
{
"blocks" : 24336,
"currentblocksize" : 18956,
"currentblocktx" : 1,
"errors" : "",
"generate" : true,
"genproclimit" : 320,
"primespersec" : 2196,
"pooledtx" : 1,
"testnet" : false
}


Mine is always -1 so far. How did you managed to increase that number?

setgenerate true  80     or......1048576   Shocked
sr. member
Activity: 476
Merit: 250
I think he using Chemisist 2nd release that he posted about last page
But he is actually pointing out something more interesting.

With proclimit == number-of-cores the cpu utilization will be 100%.

With proclimit > number-of-cores the same amount of cpu is being used, but the reported pps is higher.
member
Activity: 99
Merit: 10
Alright, testing with the core i7-950, 81/86 blocks found on testnet in 10 minutes with current Sunny King code and 97/97 blocks found in 10 minutes with my code with 8 threads running.  Testing this concept of overthreading now
sr. member
Activity: 359
Merit: 250

16:00:58

getmininginfo


16:00:58

{
"blocks" : 24436,
"currentblocksize" : 1000,
"currentblocktx" : 0,
"errors" : "",
"generate" : true,
"genproclimit" : 80,
"primespersec" : 2359,
"pooledtx" : 0,
"testnet" : false
}


AMD FX 8120

Perhaps PPS isn't the actual goal after all, since I think this measurement is largely misunderstood.

That's the conclusion that I've reached also.  I'm comparing the actual number of blocks generated over 10 minutes (though I should probably do it for longer) on the testnet between production Primecoin code and what I'm working on.

Though I should point out that with each change in genproclimit I saw better results up to 10x actual cores.

15:43:56
?
{
"blocks" : 24336,
"currentblocksize" : 18956,
"currentblocktx" : 1,
"errors" : "",
"generate" : true,
"genproclimit" : 320,
"primespersec" : 2196,
"pooledtx" : 1,
"testnet" : false
}


Mine is always -1 so far. How did you managed to increase that number?
legendary
Activity: 1843
Merit: 1338
XXXVII Fnord is toast without bread


Ohh... that's clever.  The setgenerate true -1 command just sets it to the number of processors.  Is this the same effect that you get from setting thread-concurrency to something like 3-4 times the number of stream processors on AMD cards?  I'm thinking yes...


I am not finding any more blocks, I have not found any yet at these "higher" settings.  Cry
sr. member
Activity: 791
Merit: 271
This is personal

16:00:58

getmininginfo


16:00:58

{
"blocks" : 24436,
"currentblocksize" : 1000,
"currentblocktx" : 0,
"errors" : "",
"generate" : true,
"genproclimit" : 80,
"primespersec" : 2359,
"pooledtx" : 0,
"testnet" : false
}


AMD FX 8120

Perhaps PPS isn't the actual goal after all, since I think this measurement is largely misunderstood.

I'd love to know how are you getting a PPS number so high on a FX 8120?.

I think he using Chemisist 2nd release that he posted about last page
member
Activity: 99
Merit: 10
Ohh... that's clever.  The setgenerate true -1 command just sets it to the number of processors.  Is this the same effect that you get from setting thread-concurrency to something like 3-4 times the number of stream processors on AMD cards?  I'm thinking yes...

(edit: on second thought, I shouldn't have clicked "quote" on that long post... )
legendary
Activity: 1843
Merit: 1338
XXXVII Fnord is toast without bread

I'd love to know how are you getting a PPS number so high on a FX 8120?.

Exactly my point.
Pages:
Jump to: