Pages:
Author

Topic: [XPM] Primecoin Built-in Miner Sieve Performance Issue - page 7. (Read 69150 times)

full member
Activity: 244
Merit: 101
should I try using more threads than 8?  seems my 3770k won't go higher than 1700pps.. which is nice considering when we started i was originally getting 400..   I can't seem to get the 2 or 3k other people are showing from their 3770k's     using the ivy only build.. on win7...   tried to compile on ubunut though my vm but it seems I fail or are using the wrong distro
newbie
Activity: 54
Merit: 0
This is my AMD Phenom II 710 X3 Unleashed to 4 cores.

Looks right in line, my AMD Phenom II X4 920 is sitting around 1250 primes/sec

I also run on linux, so I thought I'd share my little bash startup script in case others can use it:
Code:
#!/bin/sh
cd [INSERT PATH TO PRIMECOIND HERE]
./primecoind --daemon
watch './primecoind getbalance ; ./primecoind getmininginfo'
kill -9 $(pidof primecoind)

This'll give you a little readout to watch your balance and miner info, when you quit (ctrl+c), it will then kill the primecoind process for you
legendary
Activity: 1843
Merit: 1338
XXXVII Fnord is toast without bread
This is my AMD Phenom II 710 X3 Unleashed to 4 cores.

Code:

13:48:22

"blocks" : 23683,
"generate" : true,
"genproclimit" : 3,
"primespersec" : 439,
16:39:46
"blocks" : 24634,

"generate" : true,
"genproclimit" : 3,
"primespersec" : 409,


16:39:58
?
setgenerate true 30

16:40:40
?
getmininginfo


16:40:40
?
{
"blocks" : 24639,
"currentblocksize" : 1000,
"currentblocktx" : 0,
"errors" : "",
"generate" : true,
"genproclimit" : 30,
"primespersec" : 624,
"pooledtx" : 0,
"testnet" : false
}


16:40:55
?
getmininginfo


16:40:55
?
{
"blocks" : 24641,
"currentblocksize" : 1000,
"currentblocktx" : 0,
"errors" : "",
"generate" : true,
"genproclimit" : 30,
"primespersec" : 624,
"pooledtx" : 0,
"testnet" : false
}

16:41:34
?
getprimespersec

16:41:34
?
903

17:20:25
?
getprimespersec


17:20:25
?
1043

17:21:35
?
getprimespersec


17:21:35
?
1073

legendary
Activity: 1078
Merit: 1002
Bitcoin is new, makes sense to hodl.
although I get more pps from chemisis, but I have yet found a block since switching from the 1.1, it's been like 8 hours from 18 cores...
member
Activity: 99
Merit: 10
I don't call the Weave() function over and over and over like Sunny King's.  I call it once and then have a for loop inside the function, to eliminate the overhead of continuous function calls
A little refactoring shouldn't stop a counter from being used instead of a timer.

I haven't thought about doing it this way tbh.



I'm getting over 1600 PPS with the new version! Are they for real or what?. I just compiled with -O2 -march=native.

Running mine versus the original on testnet shows that I mine 30 versus 16 with the original client in 10 minutes on a Core 2 Duo t9300.  Running on an i7-950 on testnet generates 97 with mine and 81 with the original.
hero member
Activity: 506
Merit: 500
I don't call the Weave() function over and over and over like Sunny King's.  I call it once and then have a for loop inside the function, to eliminate the overhead of continuous function calls
A little refactoring shouldn't stop a counter from being used instead of a timer.

I haven't thought about doing it this way tbh.



I'm getting over 1600 PPS with the new version! Are they for real or what?. I just compiled with -O2 -march=native.

https://www.dropbox.com/s/vx9wnzfws4zttg8/primecoin-chemisist-mod-v2-o2-amd.rar

it should run on most recent cpus.
hero member
Activity: 616
Merit: 500
Updated windows build using the new Chemisist source. Tuned for Intel Sandy and Ivy Bridge but compatible with other architecture.

https://www.dropbox.com/s/4k0xmuajxf5i4ly/primecoin0712v3-avx.zip

I'm seeing lower PPS than my v2 builds but I think that weaving will be better overall.

How to use it?  Over write Installed files? or use from the downloaded folder itself?
member
Activity: 99
Merit: 10
I don't call the Weave() function over and over and over like Sunny King's.  I call it once and then have a for loop inside the function, to eliminate the overhead of continuous function calls
A little refactoring shouldn't stop a counter from being used instead of a timer.

I haven't thought about doing it this way tbh.

member
Activity: 84
Merit: 10
I don't call the Weave() function over and over and over like Sunny King's.  I call it once and then have a for loop inside the function, to eliminate the overhead of continuous function calls
A little refactoring shouldn't stop a counter from being used instead of a timer.
sr. member
Activity: 246
Merit: 250
My spoon is too big!
My latest Windows builds. From Chemisist source:

Tuned for Sandy and Ivy Intel Core processors (AVX), O3:

https://www.dropbox.com/s/18bgecwqzsmwsh2/primecoin0712v2-avx.zip


Ivy Bridge ONLY build:

https://www.dropbox.com/s/f7fu0u0yk4i09il/primecoin0712v2-ivyonly.zip

XPM: AR2BpBnitqXudN67Ncuc9FfYVT8u9jNe7a

Would your ivy bridge build be best for haswell?

The Ivy Bridge build will work well on Haswell. It doesn't have every instruction set available on Haswell, but most. I am probably done compiling for the night (yay, Friday!) but maybe another kind soul will build you a core-avx2 optimized daemon.

How about my outdated Nehalem?
newbie
Activity: 48
Merit: 0
My latest Windows builds. From Chemisist source:

Tuned for Sandy and Ivy Intel Core processors (AVX), O3:

https://www.dropbox.com/s/18bgecwqzsmwsh2/primecoin0712v2-avx.zip


Ivy Bridge ONLY build:

https://www.dropbox.com/s/f7fu0u0yk4i09il/primecoin0712v2-ivyonly.zip

XPM: AR2BpBnitqXudN67Ncuc9FfYVT8u9jNe7a

Would your ivy bridge build be best for haswell?

The Ivy Bridge build will work well on Haswell. It doesn't have every instruction set available on Haswell, but most. I am probably done compiling for the night (yay, Friday!) but maybe another kind soul will build you a core-avx2 optimized daemon.
member
Activity: 99
Merit: 10
The rationale for the weave timing parameter instead of the weave count parameter is because the weave timing parameter requires only a call to "GetTimeMicros()" whereas determining the weave count parameter is far more intensive to calculate.  To calculate it requires looping through all three arrays to find the values that are still false:

from prime.h:

Code:
   unsigned int GetCandidateCount()
    {
        unsigned int nCandidates = 0;
        for (unsigned int nMultiplier = 0; nMultiplier < nSieveSize; nMultiplier++)
        {
            if (!vfCompositeCunningham1[nMultiplier] ||
                !vfCompositeCunningham2[nMultiplier] ||
                !vfCompositeBiTwin[nMultiplier])
                nCandidates++;
        }
        return nCandidates;
    }

the vfComposite arrays are all start out as "0" and when a value is found that is not a prime compatible with Sunny's algorithm, that value gets set to 1.  All vfComposite arrays are nMaxSieveSize in length:

Code:
static const unsigned int nMaxSieveSize = 1000000u;

So to calculate a weave count parameter requires 3 million boolean tests, plus the costs of a loop plus the increment operator for each time the if statement returns true.
No, I meant only a counter of how many times the Weave() function is called, not related to GetCandidateCount().

I don't call the Weave() function over and over and over like Sunny King's.  I call it once and then have a for loop inside the function, to eliminate the overhead of continuous function calls
member
Activity: 87
Merit: 10
My latest Windows builds. From Chemisist source:

Tuned for Sandy and Ivy Intel Core processors (AVX), O3:

https://www.dropbox.com/s/18bgecwqzsmwsh2/primecoin0712v2-avx.zip


Ivy Bridge ONLY build:

https://www.dropbox.com/s/f7fu0u0yk4i09il/primecoin0712v2-ivyonly.zip

XPM: AR2BpBnitqXudN67Ncuc9FfYVT8u9jNe7a

Would your ivy bridge build be best for haswell?
member
Activity: 84
Merit: 10
The rationale for the weave timing parameter instead of the weave count parameter is because the weave timing parameter requires only a call to "GetTimeMicros()" whereas determining the weave count parameter is far more intensive to calculate.  To calculate it requires looping through all three arrays to find the values that are still false:

from prime.h:

Code:
   unsigned int GetCandidateCount()
    {
        unsigned int nCandidates = 0;
        for (unsigned int nMultiplier = 0; nMultiplier < nSieveSize; nMultiplier++)
        {
            if (!vfCompositeCunningham1[nMultiplier] ||
                !vfCompositeCunningham2[nMultiplier] ||
                !vfCompositeBiTwin[nMultiplier])
                nCandidates++;
        }
        return nCandidates;
    }

the vfComposite arrays are all start out as "0" and when a value is found that is not a prime compatible with Sunny's algorithm, that value gets set to 1.  All vfComposite arrays are nMaxSieveSize in length:

Code:
static const unsigned int nMaxSieveSize = 1000000u;

So to calculate a weave count parameter requires 3 million boolean tests, plus the costs of a loop plus the increment operator for each time the if statement returns true.
No, I meant only a counter of how many times the Weave() function is called, not related to GetCandidateCount().
member
Activity: 99
Merit: 10
I think he using Chemisist 2nd release that he posted about last page
But he is actually pointing out something more interesting.

With proclimit == number-of-cores the cpu utilization will be 100%.

With proclimit > number-of-cores the same amount of cpu is being used, but the reported pps is higher.
That's because less time is spent weaving with the threads fighting each other, and more false-positives are counted by the primespersec value.

So you say leaving setgenerate value on its default which is -1 is the best way to observe the real pps?

I think Chemisist was checking the solved block rate on testnet over a 10-minute period. Have those results from the overthreading been tallied?

I just ran one and with 40 threads on 8 cores giving me 61/62 confirmations over 10 minutes.  There might be a maximum between 8 and 40, but I don't have the time right now to figure it out.  Some friends just arrived so I am going to have to make an exit for the evening, unfortunately.  I'll check in later tonight (maybe) or definitely tomorrow.
legendary
Activity: 1792
Merit: 1008
/dev/null
some of us on #eligius-prime were able with lukes help and others to get it running.. now im just waiting to see if i can actually get a block..
try testnet for tests!  Cheesy
./primecoind stop
./primecoind -testnet
i mined some blocks in -testnet in some minutes:
http://pastebin.com/GN1fafrm
member
Activity: 99
Merit: 10
Alright, so just updated my version (currently on github) such that each thread an independent evolving weave timing parameter.  To compare to mine with Sunny's most recent update, I used the testnet where my version found 30 confirmed blocks in 10 minutes while the original code found 16 confirmed blocks.  I feel that this is a legitimate comparison because there were no other nodes on the test net currently mining (I know this because my client found every continuous block in both cases).  This comparison was performed with a t61p IBM laptop with a T9300 Core 2 Duo processor.  The current difficulty on the testnet is 5.4426.  
Why make it a weave timing parameter and not just a weave count parameter? I think that would be a better metric, as a change in CPU load means the timing parameter's results will change a lot.

some of us on #eligius-prime were able with lukes help and others to get it running.. now im just waiting to see if i can actually get a block..

[image]

Can you share your source code?  Did you modify Sunny's algorithm at all?
I think the biggest change in Luke's miner is that it moves the bnTwoInverse calculation out of Weave() and just pre-calculates it for all of the primes in GeneratePrimeTable(). I didn't get much more performance out of porting that change though to primecoin but I didn't check too hard.

Thanks for the update on Luke's code.

The rationale for the weave timing parameter instead of the weave count parameter is because the weave timing parameter requires only a call to "GetTimeMicros()" whereas determining the weave count parameter is far more intensive to calculate.  To calculate it requires looping through all three arrays to find the values that are still false:

from prime.h:

Code:
   unsigned int GetCandidateCount()
    {
        unsigned int nCandidates = 0;
        for (unsigned int nMultiplier = 0; nMultiplier < nSieveSize; nMultiplier++)
        {
            if (!vfCompositeCunningham1[nMultiplier] ||
                !vfCompositeCunningham2[nMultiplier] ||
                !vfCompositeBiTwin[nMultiplier])
                nCandidates++;
        }
        return nCandidates;
    }

the vfComposite arrays are all start out as "0" and when a value is found that is not a prime compatible with Sunny's algorithm, that value gets set to 1.  All vfComposite arrays are nMaxSieveSize in length:

Code:
static const unsigned int nMaxSieveSize = 1000000u;

So to calculate a weave count parameter requires 3 million boolean tests, plus the costs of a loop plus the increment operator for each time the if statement returns true.
sr. member
Activity: 476
Merit: 250
So you say leaving setgenerate value on its default which is -1 is the best way to observe the real pps?
yes
sr. member
Activity: 246
Merit: 250
My spoon is too big!
I think he using Chemisist 2nd release that he posted about last page
But he is actually pointing out something more interesting.

With proclimit == number-of-cores the cpu utilization will be 100%.

With proclimit > number-of-cores the same amount of cpu is being used, but the reported pps is higher.
That's because less time is spent weaving with the threads fighting each other, and more false-positives are counted by the primespersec value.

So you say leaving setgenerate value on its default which is -1 is the best way to observe the real pps?

I think Chemisist was checking the solved block rate on testnet over a 10-minute period. Have those results from the overthreading been tallied?
sr. member
Activity: 359
Merit: 250
I think he using Chemisist 2nd release that he posted about last page
But he is actually pointing out something more interesting.

With proclimit == number-of-cores the cpu utilization will be 100%.

With proclimit > number-of-cores the same amount of cpu is being used, but the reported pps is higher.
That's because less time is spent weaving with the threads fighting each other, and more false-positives are counted by the primespersec value.

So you say leaving setgenerate value on its default which is -1 is the best way to observe the real pps?
Pages:
Jump to: