[XPM] Primecoin Built-in Miner Sieve Performance Issue - page 7.

drummerjdb666

full member

Activity: 244

Merit: 101

should I try using more threads than 8? seems my 3770k won't go higher than 1700pps.. which is nice considering when we started i was originally getting 400.. I can't seem to get the 2 or 3k other people are showing from their 3770k's using the ivy only build.. on win7... tried to compile on ubunut though my vm but it seems I fail or are using the wrong distro

tinnvec

newbie

Activity: 54

Merit: 0

Quote from: PoolMinor on July 12, 2013, 07:50:49 PM

This is my AMD Phenom II 710 X3 Unleashed to 4 cores.

Looks right in line, my AMD Phenom II X4 920 is sitting around 1250 primes/sec

I also run on linux, so I thought I'd share my little bash startup script in case others can use it:

Code:

#!/bin/sh
cd [INSERT PATH TO PRIMECOIND HERE]
./primecoind --daemon
watch './primecoind getbalance ; ./primecoind getmininginfo'
kill -9 $(pidof primecoind)

This'll give you a little readout to watch your balance and miner info, when you quit (ctrl+c), it will then kill the primecoind process for you

PoolMinor

legendary

Activity: 1844

Merit: 1338

XXXVII Fnord is toast without bread

This is my AMD Phenom II 710 X3 Unleashed to 4 cores.

Code:

13:48:22

"blocks" : 23683,
"generate" : true,
"genproclimit" : 3,
"primespersec" : 439,
16:39:46
"blocks" : 24634,

"generate" : true,
"genproclimit" : 3,
"primespersec" : 409,

16:39:58
?
setgenerate true 30

16:40:40
?
getmininginfo

16:40:40
?
{
"blocks" : 24639,
"currentblocksize" : 1000,
"currentblocktx" : 0,
"errors" : "",
"generate" : true,
"genproclimit" : 30,
"primespersec" : 624,
"pooledtx" : 0,
"testnet" : false
}

16:40:55
?
getmininginfo

16:40:55
?
{
"blocks" : 24641,
"currentblocksize" : 1000,
"currentblocktx" : 0,
"errors" : "",
"generate" : true,
"genproclimit" : 30,
"primespersec" : 624,
"pooledtx" : 0,
"testnet" : false
}

16:41:34
?
getprimespersec

16:41:34
?
903

17:20:25
?
getprimespersec

17:20:25
?
1043

17:21:35
?
getprimespersec

17:21:35
?
1073

romerun

legendary

Activity: 1078

Merit: 1002

Bitcoin is new, makes sense to hodl.

although I get more pps from chemisis, but I have yet found a block since switching from the 1.1, it's been like 8 hours from 18 cores...

Chemisist

member

Activity: 99

Merit: 10

Quote from: fabrizziop on July 12, 2013, 07:17:37 PM

Quote from: Chemisist on July 12, 2013, 07:04:59 PM

Quote from: AgentME on July 12, 2013, 07:00:09 PM

Quote from: Chemisist on July 12, 2013, 06:53:22 PM

I don't call the Weave() function over and over and over like Sunny King's. I call it once and then have a for loop inside the function, to eliminate the overhead of continuous function calls

A little refactoring shouldn't stop a counter from being used instead of a timer.

I haven't thought about doing it this way tbh.

I'm getting over 1600 PPS with the new version! Are they for real or what?. I just compiled with -O2 -march=native.

Running mine versus the original on testnet shows that I mine 30 versus 16 with the original client in 10 minutes on a Core 2 Duo t9300. Running on an i7-950 on testnet generates 97 with mine and 81 with the original.

fabrizziop

hero member

Activity: 506

Merit: 500

Quote from: Chemisist on July 12, 2013, 07:04:59 PM

Quote from: AgentME on July 12, 2013, 07:00:09 PM

Quote from: Chemisist on July 12, 2013, 06:53:22 PM

I don't call the Weave() function over and over and over like Sunny King's. I call it once and then have a for loop inside the function, to eliminate the overhead of continuous function calls

A little refactoring shouldn't stop a counter from being used instead of a timer.

I haven't thought about doing it this way tbh.

I'm getting over 1600 PPS with the new version! Are they for real or what?. I just compiled with -O2 -march=native.

https://www.dropbox.com/s/vx9wnzfws4zttg8/primecoin-chemisist-mod-v2-o2-amd.rar

it should run on most recent cpus.

tadakaluri

hero member

Activity: 616

Merit: 500

Quote from: anonppcoin on July 12, 2013, 06:29:35 PM

Updated windows build using the new Chemisist source. Tuned for Intel Sandy and Ivy Bridge but compatible with other architecture.

https://www.dropbox.com/s/4k0xmuajxf5i4ly/primecoin0712v3-avx.zip

I'm seeing lower PPS than my v2 builds but I think that weaving will be better overall.

How to use it? Over write Installed files? or use from the downloaded folder itself?

Chemisist

member

Activity: 99

Merit: 10

Quote from: AgentME on July 12, 2013, 07:00:09 PM

Quote from: Chemisist on July 12, 2013, 06:53:22 PM

I don't call the Weave() function over and over and over like Sunny King's. I call it once and then have a for loop inside the function, to eliminate the overhead of continuous function calls

A little refactoring shouldn't stop a counter from being used instead of a timer.

I haven't thought about doing it this way tbh.

AgentME

member

Activity: 84

Merit: 10

Quote from: Chemisist on July 12, 2013, 06:53:22 PM

I don't call the Weave() function over and over and over like Sunny King's. I call it once and then have a for loop inside the function, to eliminate the overhead of continuous function calls

A little refactoring shouldn't stop a counter from being used instead of a timer.

redphlegm

sr. member

Activity: 246

Merit: 250

My spoon is too big!

Quote from: anonppcoin on July 12, 2013, 06:55:42 PM

Quote from: urubu on July 12, 2013, 06:50:04 PM

Quote from: anonppcoin on July 12, 2013, 01:52:41 PM

My latest Windows builds. From Chemisist source:

Tuned for Sandy and Ivy Intel Core processors (AVX), O3:

https://www.dropbox.com/s/18bgecwqzsmwsh2/primecoin0712v2-avx.zip

Ivy Bridge ONLY build:

https://www.dropbox.com/s/f7fu0u0yk4i09il/primecoin0712v2-ivyonly.zip

XPM: AR2BpBnitqXudN67Ncuc9FfYVT8u9jNe7a

Would your ivy bridge build be best for haswell?

The Ivy Bridge build will work well on Haswell. It doesn't have every instruction set available on Haswell, but most. I am probably done compiling for the night (yay, Friday!) but maybe another kind soul will build you a core-avx2 optimized daemon.

How about my outdated Nehalem?

anonppcoin

newbie

Activity: 48

Merit: 0

Quote from: urubu on July 12, 2013, 06:50:04 PM

Quote from: anonppcoin on July 12, 2013, 01:52:41 PM

My latest Windows builds. From Chemisist source:

Tuned for Sandy and Ivy Intel Core processors (AVX), O3:

https://www.dropbox.com/s/18bgecwqzsmwsh2/primecoin0712v2-avx.zip

Ivy Bridge ONLY build:

https://www.dropbox.com/s/f7fu0u0yk4i09il/primecoin0712v2-ivyonly.zip

XPM: AR2BpBnitqXudN67Ncuc9FfYVT8u9jNe7a

Would your ivy bridge build be best for haswell?

The Ivy Bridge build will work well on Haswell. It doesn't have every instruction set available on Haswell, but most. I am probably done compiling for the night (yay, Friday!) but maybe another kind soul will build you a core-avx2 optimized daemon.

Chemisist

member

Activity: 99

Merit: 10

Quote from: AgentME on July 12, 2013, 06:49:32 PM

Quote from: Chemisist on July 12, 2013, 06:44:20 PM

The rationale for the weave timing parameter instead of the weave count parameter is because the weave timing parameter requires only a call to "GetTimeMicros()" whereas determining the weave count parameter is far more intensive to calculate. To calculate it requires looping through all three arrays to find the values that are still false:

from prime.h:

Code:

unsigned int GetCandidateCount()
   {
   unsigned int nCandidates = 0;
   for (unsigned int nMultiplier = 0; nMultiplier < nSieveSize; nMultiplier++)
   {
   if (!vfCompositeCunningham1[nMultiplier] ||
   !vfCompositeCunningham2[nMultiplier] ||
   !vfCompositeBiTwin[nMultiplier])
   nCandidates++;
   }
   return nCandidates;
   }

the vfComposite arrays are all start out as "0" and when a value is found that is not a prime compatible with Sunny's algorithm, that value gets set to 1. All vfComposite arrays are nMaxSieveSize in length:

Code:

static const unsigned int nMaxSieveSize = 1000000u;

So to calculate a weave count parameter requires 3 million boolean tests, plus the costs of a loop plus the increment operator for each time the if statement returns true.

No, I meant only a counter of how many times the Weave() function is called, not related to GetCandidateCount().

I don't call the Weave() function over and over and over like Sunny King's. I call it once and then have a for loop inside the function, to eliminate the overhead of continuous function calls

urubu

member

Activity: 87

Merit: 10

Quote from: anonppcoin on July 12, 2013, 01:52:41 PM

My latest Windows builds. From Chemisist source:

Tuned for Sandy and Ivy Intel Core processors (AVX), O3:

https://www.dropbox.com/s/18bgecwqzsmwsh2/primecoin0712v2-avx.zip

Ivy Bridge ONLY build:

https://www.dropbox.com/s/f7fu0u0yk4i09il/primecoin0712v2-ivyonly.zip

XPM: AR2BpBnitqXudN67Ncuc9FfYVT8u9jNe7a

Would your ivy bridge build be best for haswell?

AgentME

member

Activity: 84

Merit: 10

Quote from: Chemisist on July 12, 2013, 06:44:20 PM

The rationale for the weave timing parameter instead of the weave count parameter is because the weave timing parameter requires only a call to "GetTimeMicros()" whereas determining the weave count parameter is far more intensive to calculate. To calculate it requires looping through all three arrays to find the values that are still false:

from prime.h:

Code:

unsigned int GetCandidateCount()
   {
   unsigned int nCandidates = 0;
   for (unsigned int nMultiplier = 0; nMultiplier < nSieveSize; nMultiplier++)
   {
   if (!vfCompositeCunningham1[nMultiplier] ||
   !vfCompositeCunningham2[nMultiplier] ||
   !vfCompositeBiTwin[nMultiplier])
   nCandidates++;
   }
   return nCandidates;
   }

the vfComposite arrays are all start out as "0" and when a value is found that is not a prime compatible with Sunny's algorithm, that value gets set to 1. All vfComposite arrays are nMaxSieveSize in length:

Code:

static const unsigned int nMaxSieveSize = 1000000u;

So to calculate a weave count parameter requires 3 million boolean tests, plus the costs of a loop plus the increment operator for each time the if statement returns true.

No, I meant only a counter of how many times the Weave() function is called, not related to GetCandidateCount().

Chemisist

member

Activity: 99

Merit: 10

Quote from: redphlegm on July 12, 2013, 06:43:17 PM

Quote from: altsay on July 12, 2013, 06:41:52 PM

Quote from: AgentME on July 12, 2013, 06:38:23 PM

Quote from: LazyOtto on July 12, 2013, 06:21:22 PM

Quote from: UNOE on July 12, 2013, 06:18:09 PM

I think he using Chemisist 2nd release that he posted about last page

But he is actually pointing out something more interesting.

With proclimit == number-of-cores the cpu utilization will be 100%.

With proclimit > number-of-cores the same amount of cpu is being used, but the reported pps is higher.

That's because less time is spent weaving with the threads fighting each other, and more false-positives are counted by the primespersec value.

So you say leaving setgenerate value on its default which is -1 is the best way to observe the real pps?

I think Chemisist was checking the solved block rate on testnet over a 10-minute period. Have those results from the overthreading been tallied?

I just ran one and with 40 threads on 8 cores giving me 61/62 confirmations over 10 minutes. There might be a maximum between 8 and 40, but I don't have the time right now to figure it out. Some friends just arrived so I am going to have to make an exit for the evening, unfortunately. I'll check in later tonight (maybe) or definitely tomorrow.

K1773R

legendary

Activity: 1792

Merit: 1008

/dev/null

Quote from: eule on July 12, 2013, 05:41:18 PM

Quote from: gateway on July 12, 2013, 05:38:54 PM

some of us on #eligius-prime were able with lukes help and others to get it running.. now im just waiting to see if i can actually get a block..

try testnet for tests! Cheesy

./primecoind stop
./primecoind -testnet

i mined some blocks in -testnet in some minutes:
http://pastebin.com/GN1fafrm

Chemisist

member

Activity: 99

Merit: 10

Quote from: AgentME on July 12, 2013, 06:36:11 PM

Quote from: Chemisist on July 12, 2013, 05:35:25 PM

Alright, so just updated my version (currently on github) such that each thread an independent evolving weave timing parameter. To compare to mine with Sunny's most recent update, I used the testnet where my version found 30 confirmed blocks in 10 minutes while the original code found 16 confirmed blocks. I feel that this is a legitimate comparison because there were no other nodes on the test net currently mining (I know this because my client found every continuous block in both cases). This comparison was performed with a t61p IBM laptop with a T9300 Core 2 Duo processor. The current difficulty on the testnet is 5.4426.

Why make it a weave timing parameter and not just a weave count parameter? I think that would be a better metric, as a change in CPU load means the timing parameter's results will change a lot.

Quote from: Chemisist on July 12, 2013, 05:43:08 PM

Quote from: gateway on July 12, 2013, 05:38:54 PM

some of us on #eligius-prime were able with lukes help and others to get it running.. now im just waiting to see if i can actually get a block..

[image]

Can you share your source code? Did you modify Sunny's algorithm at all?

I think the biggest change in Luke's miner is that it moves the bnTwoInverse calculation out of Weave() and just pre-calculates it for all of the primes in GeneratePrimeTable(). I didn't get much more performance out of porting that change though to primecoin but I didn't check too hard.

Thanks for the update on Luke's code.

The rationale for the weave timing parameter instead of the weave count parameter is because the weave timing parameter requires only a call to "GetTimeMicros()" whereas determining the weave count parameter is far more intensive to calculate. To calculate it requires looping through all three arrays to find the values that are still false:

from prime.h:

Code:

unsigned int GetCandidateCount()
   {
   unsigned int nCandidates = 0;
   for (unsigned int nMultiplier = 0; nMultiplier < nSieveSize; nMultiplier++)
   {
   if (!vfCompositeCunningham1[nMultiplier] ||
   !vfCompositeCunningham2[nMultiplier] ||
   !vfCompositeBiTwin[nMultiplier])
   nCandidates++;
   }
   return nCandidates;
   }

the vfComposite arrays are all start out as "0" and when a value is found that is not a prime compatible with Sunny's algorithm, that value gets set to 1. All vfComposite arrays are nMaxSieveSize in length:

Code:

static const unsigned int nMaxSieveSize = 1000000u;

So to calculate a weave count parameter requires 3 million boolean tests, plus the costs of a loop plus the increment operator for each time the if statement returns true.

LazyOtto

sr. member

Activity: 476

Merit: 250

Quote from: altsay on July 12, 2013, 06:41:52 PM

So you say leaving setgenerate value on its default which is -1 is the best way to observe the real pps?

yes

redphlegm

sr. member

Activity: 246

Merit: 250

My spoon is too big!

Quote from: altsay on July 12, 2013, 06:41:52 PM

Quote from: AgentME on July 12, 2013, 06:38:23 PM

Quote from: LazyOtto on July 12, 2013, 06:21:22 PM

Quote from: UNOE on July 12, 2013, 06:18:09 PM

I think he using Chemisist 2nd release that he posted about last page

But he is actually pointing out something more interesting.

With proclimit == number-of-cores the cpu utilization will be 100%.

With proclimit > number-of-cores the same amount of cpu is being used, but the reported pps is higher.

That's because less time is spent weaving with the threads fighting each other, and more false-positives are counted by the primespersec value.

So you say leaving setgenerate value on its default which is -1 is the best way to observe the real pps?

I think Chemisist was checking the solved block rate on testnet over a 10-minute period. Have those results from the overthreading been tallied?

altsay

sr. member

Activity: 359

Merit: 250

Quote from: AgentME on July 12, 2013, 06:38:23 PM

Quote from: LazyOtto on July 12, 2013, 06:21:22 PM

Quote from: UNOE on July 12, 2013, 06:18:09 PM

I think he using Chemisist 2nd release that he posted about last page

But he is actually pointing out something more interesting.

With proclimit == number-of-cores the cpu utilization will be 100%.

With proclimit > number-of-cores the same amount of cpu is being used, but the reported pps is higher.

That's because less time is spent weaving with the threads fighting each other, and more false-positives are counted by the primespersec value.

So you say leaving setgenerate value on its default which is -1 is the best way to observe the real pps?

Topic: [XPM] Primecoin Built-in Miner Sieve Performance Issue - page 7. (Read 69182 times)