[XPM] Primecoin Built-in Miner Sieve Performance Issue - page 8.

anonppcoin

newbie

Activity: 48

Merit: 0

Quote from: AgentME on July 12, 2013, 06:36:11 PM

Why make it a weave timing parameter and not just a weave count parameter? I think that would be a better metric, as a change in CPU load means the timing parameter's results will change a lot.

Agreed.

AgentME

member

Activity: 84

Merit: 10

Quote from: LazyOtto on July 12, 2013, 06:21:22 PM

Quote from: UNOE on July 12, 2013, 06:18:09 PM

I think he using Chemisist 2nd release that he posted about last page

But he is actually pointing out something more interesting.

With proclimit == number-of-cores the cpu utilization will be 100%.

With proclimit > number-of-cores the same amount of cpu is being used, but the reported pps is higher.

That's because less time is spent weaving with the threads fighting each other, and more false-positives are counted by the primespersec value.

Chemisist

member

Activity: 99

Merit: 10

Quote from: redphlegm on July 12, 2013, 06:30:06 PM

Quote from: Chemisist on July 12, 2013, 05:35:25 PM

Alright, so just updated my version (currently on github) such that each thread an independent evolving weave timing parameter. To compare to mine with Sunny's most recent update, I used the testnet where my version found 30 confirmed blocks in 10 minutes while the original code found 16 confirmed blocks. I feel that this is a legitimate comparison because there were no other nodes on the test net currently mining (I know this because my client found every continuous block in both cases). This comparison was performed with a t61p IBM laptop with a T9300 Core 2 Duo processor. The current difficulty on the testnet is 5.4426.

Going to test this with the 8 threads on my Core i7 next.

Mind linking to your github? The speed this thread is updating is a bit overwhelming. Thanks in advance.

Updated my profile website to link directly to it, just fyi so you dont have to keep coming back here...

https://github.com/Chemisist/primecoin

AgentME

member

Activity: 84

Merit: 10

Quote from: Chemisist on July 12, 2013, 05:35:25 PM

Alright, so just updated my version (currently on github) such that each thread an independent evolving weave timing parameter. To compare to mine with Sunny's most recent update, I used the testnet where my version found 30 confirmed blocks in 10 minutes while the original code found 16 confirmed blocks. I feel that this is a legitimate comparison because there were no other nodes on the test net currently mining (I know this because my client found every continuous block in both cases). This comparison was performed with a t61p IBM laptop with a T9300 Core 2 Duo processor. The current difficulty on the testnet is 5.4426.

Why make it a weave timing parameter and not just a weave count parameter? I think that would be a better metric, as a change in CPU load means the timing parameter's results will change a lot.

Quote from: Chemisist on July 12, 2013, 05:43:08 PM

Quote from: gateway on July 12, 2013, 05:38:54 PM

some of us on #eligius-prime were able with lukes help and others to get it running.. now im just waiting to see if i can actually get a block..

[image]

Can you share your source code? Did you modify Sunny's algorithm at all?

I think the biggest change in Luke's miner is that it moves the bnTwoInverse calculation out of Weave() and just pre-calculates it for all of the primes in GeneratePrimeTable(). I didn't get much more performance out of porting that change to primecoin but I didn't check too hard.

PoolMinor

legendary

Activity: 1844

Merit: 1338

XXXVII Fnord is toast without bread

I could use some fine tuning for the AMD FX series, I have seen others asking for a specific release for this chip but have not seen any that are available.

Otherwise for now I will keep testing the overthreading, I think for PPS rate the 10 threads per core (for me genproclimit=80) has been best so far.

16:30:45

{
"blocks" : 24598,
"currentblocksize" : 1000,
"currentblocktx" : 0,
"errors" : "",
"generate" : true,
"genproclimit" : 80,
"primespersec" : 2492,
"pooledtx" : 0,
"testnet" : false
}

altsay

sr. member

Activity: 359

Merit: 250

Quote from: itod on July 12, 2013, 06:27:37 PM

Quote from: PoolMinor on July 12, 2013, 06:23:45 PM

setgenerate true 80 or......1048576 Shocked

Thinking out of the box, congrats.

Now waiting for Win binaries of Chemisist newest version with improved thread handling.

But it slows down the computer much more than it was on -1. I hope that doesn't increase the possibility of orphans.

AgentME

member

Activity: 84

Merit: 10

Quote from: Chemisist on July 12, 2013, 05:13:36 PM

Quote from: AgentME on July 12, 2013, 04:50:03 PM

Agreed. I noticed earlier if you cap off the sieve weaving time to almost nothing, you can easily get absurdly high PPS values but you won't actually earn blocks faster. There's a trade-off that needs to be analyzed closer.

The high pps number is due to the very low hard cap on the time set to check the actual sieve that has been produced (it's set to 10 ms in the current master branch on github, line 372 in prime.cpp). So with the very short weaving time of whatever you decide to set, the sieve has a very large number of prime candidates, most of which satisfy the following check:

Code:

if(TargetGetLength(nProbablePrimeChainLength) >= 1)
     nPrimesHit++;

but many of which are not actually primes. Anyway, I'm currently testing my code against Sunny's on the testnet (with the large thread count issue potentially fixed, fingers crossed) to see which can find more blocks in 10 minutes on my T9300 laptop. Results to come shortly

Isn't nProbablePrimeChainLength always zero if N+1 and N-1 both fail the FermatProbablePrimalityTest in ProbableCunninghamChainTest? Or does that get a ton of false positives when the sieve isn't weaved much? (I imagine I probably just answered my own question.)

Anyway, good luck with finding the sweet spot in the trade-off!

redphlegm

sr. member

Activity: 246

Merit: 250

My spoon is too big!

Quote from: Chemisist on July 12, 2013, 05:35:25 PM

Alright, so just updated my version (currently on github) such that each thread an independent evolving weave timing parameter. To compare to mine with Sunny's most recent update, I used the testnet where my version found 30 confirmed blocks in 10 minutes while the original code found 16 confirmed blocks. I feel that this is a legitimate comparison because there were no other nodes on the test net currently mining (I know this because my client found every continuous block in both cases). This comparison was performed with a t61p IBM laptop with a T9300 Core 2 Duo processor. The current difficulty on the testnet is 5.4426.

Going to test this with the 8 threads on my Core i7 next.

Mind linking to your github? The speed this thread is updating is a bit overwhelming. Thanks in advance.

anonppcoin

newbie

Activity: 48

Merit: 0

Updated windows build using the new Chemisist source. Tuned for Intel Sandy and Ivy Bridge but compatible with other architecture.

https://www.dropbox.com/s/4k0xmuajxf5i4ly/primecoin0712v3-avx.zip

I'm seeing lower PPS than my v2 builds but I think that weaving will be better overall.

itod

legendary

Activity: 1988

Merit: 1077

Honey badger just does not care

Quote from: PoolMinor on July 12, 2013, 06:23:45 PM

setgenerate true 80 or......1048576 Shocked

Thinking out of the box, congrats.

Now waiting for Win binaries of Chemisist newest version with improved thread handling.

Chemisist

member

Activity: 99

Merit: 10

Quote from: LazyOtto on July 12, 2013, 06:21:22 PM

Quote from: UNOE on July 12, 2013, 06:18:09 PM

I think he using Chemisist 2nd release that he posted about last page

But he is actually pointing out something more interesting.

With proclimit == number-of-cores the cpu utilization will be 100%.

With proclimit > number-of-cores the same amount of cpu is being used, but the reported pps is higher.

That's not possible since the pps is based on the actual number of prime candidates processed. This is correlated with block finding but not equivalent apparently

PoolMinor

legendary

Activity: 1844

Merit: 1338

XXXVII Fnord is toast without bread

Quote from: LazyOtto on July 12, 2013, 06:21:22 PM

Quote from: UNOE on July 12, 2013, 06:18:09 PM

I think he using Chemisist 2nd release that he posted about last page

But he is actually pointing out something more interesting.

With proclimit == number-of-cores the cpu utilization will be 100%.

With proclimit > number-of-cores the same amount of cpu is being used, but the reported pps is higher.

I used the same process with YAC and was told I was wasting the hash by splitting into "micro-threads" making it more difficult to solve. Or the person that said this didn't want me to use the tactic and they used it for themselves.

see post here.

https://bitcointalksearch.org/topic/annyac-yacoin-yet-another-altcoin-start-is-now-196196

PoolMinor

legendary

Activity: 1844

Merit: 1338

XXXVII Fnord is toast without bread

Quote from: altsay on July 12, 2013, 06:19:41 PM

Quote from: PoolMinor on July 12, 2013, 06:14:26 PM

Quote from: Chemisist on July 12, 2013, 06:11:33 PM

Quote from: PoolMinor on July 12, 2013, 06:08:37 PM

16:00:58

getmininginfo

16:00:58

{
"blocks" : 24436,
"currentblocksize" : 1000,
"currentblocktx" : 0,
"errors" : "",
"generate" : true,
"genproclimit" : 80,
"primespersec" : 2359,
"pooledtx" : 0,
"testnet" : false
}

AMD FX 8120

Perhaps PPS isn't the actual goal after all, since I think this measurement is largely misunderstood.

That's the conclusion that I've reached also. I'm comparing the actual number of blocks generated over 10 minutes (though I should probably do it for longer) on the testnet between production Primecoin code and what I'm working on.

Though I should point out that with each change in genproclimit I saw better results up to 10x actual cores.

15:43:56
?
{
"blocks" : 24336,
"currentblocksize" : 18956,
"currentblocktx" : 1,
"errors" : "",
"generate" : true,
"genproclimit" : 320,
"primespersec" : 2196,
"pooledtx" : 1,
"testnet" : false
}

Mine is always -1 so far. How did you managed to increase that number?

setgenerate true 80 or......1048576 Shocked

LazyOtto

sr. member

Activity: 476

Merit: 250

Quote from: UNOE on July 12, 2013, 06:18:09 PM

I think he using Chemisist 2nd release that he posted about last page

But he is actually pointing out something more interesting.

With proclimit == number-of-cores the cpu utilization will be 100%.

With proclimit > number-of-cores the same amount of cpu is being used, but the reported pps is higher.

Chemisist

member

Activity: 99

Merit: 10

Alright, testing with the core i7-950, 81/86 blocks found on testnet in 10 minutes with current Sunny King code and 97/97 blocks found in 10 minutes with my code with 8 threads running. Testing this concept of overthreading now

altsay

sr. member

Activity: 359

Merit: 250

Quote from: PoolMinor on July 12, 2013, 06:14:26 PM

Quote from: Chemisist on July 12, 2013, 06:11:33 PM

Quote from: PoolMinor on July 12, 2013, 06:08:37 PM

16:00:58

getmininginfo

16:00:58

{
"blocks" : 24436,
"currentblocksize" : 1000,
"currentblocktx" : 0,
"errors" : "",
"generate" : true,
"genproclimit" : 80,
"primespersec" : 2359,
"pooledtx" : 0,
"testnet" : false
}

AMD FX 8120

Perhaps PPS isn't the actual goal after all, since I think this measurement is largely misunderstood.

That's the conclusion that I've reached also. I'm comparing the actual number of blocks generated over 10 minutes (though I should probably do it for longer) on the testnet between production Primecoin code and what I'm working on.

Though I should point out that with each change in genproclimit I saw better results up to 10x actual cores.

15:43:56
?
{
"blocks" : 24336,
"currentblocksize" : 18956,
"currentblocktx" : 1,
"errors" : "",
"generate" : true,
"genproclimit" : 320,
"primespersec" : 2196,
"pooledtx" : 1,
"testnet" : false
}

Mine is always -1 so far. How did you managed to increase that number?

PoolMinor

legendary

Activity: 1844

Merit: 1338

XXXVII Fnord is toast without bread

Quote from: Chemisist on July 12, 2013, 06:16:41 PM

Ohh... that's clever. The setgenerate true -1 command just sets it to the number of processors. Is this the same effect that you get from setting thread-concurrency to something like 3-4 times the number of stream processors on AMD cards? I'm thinking yes...

I am not finding any more blocks, I have not found any yet at these "higher" settings. Cry

UNOE

sr. member

Activity: 791

Merit: 273

This is personal

Quote from: fabrizziop on July 12, 2013, 06:13:40 PM

Quote from: PoolMinor on July 12, 2013, 06:08:37 PM

16:00:58

getmininginfo

16:00:58

{
"blocks" : 24436,
"currentblocksize" : 1000,
"currentblocktx" : 0,
"errors" : "",
"generate" : true,
"genproclimit" : 80,
"primespersec" : 2359,
"pooledtx" : 0,
"testnet" : false
}

AMD FX 8120

Perhaps PPS isn't the actual goal after all, since I think this measurement is largely misunderstood.

I'd love to know how are you getting a PPS number so high on a FX 8120?.

I think he using Chemisist 2nd release that he posted about last page

Chemisist

member

Activity: 99

Merit: 10

Ohh... that's clever. The setgenerate true -1 command just sets it to the number of processors. Is this the same effect that you get from setting thread-concurrency to something like 3-4 times the number of stream processors on AMD cards? I'm thinking yes...

(edit: on second thought, I shouldn't have clicked "quote" on that long post... )

PoolMinor

legendary

Activity: 1844

Merit: 1338

XXXVII Fnord is toast without bread

Quote from: fabrizziop on July 12, 2013, 06:13:40 PM

I'd love to know how are you getting a PPS number so high on a FX 8120?.

Exactly my point.

Topic: [XPM] Primecoin Built-in Miner Sieve Performance Issue - page 8. (Read 69182 times)