Author

Topic: [XPM] [ANN] Primecoin Release - First Scientific Computing Cryptocurrency - page 140. (Read 688877 times)

full member
Activity: 158
Merit: 100

did you measure the speedup of every change? the first rule of optimization is: measure! otherwise you may be making it worse without knowing...
Hi gatra,

with those few blocks/day and primes/s being quite imprecise...
I left optimizations in when they didn't appear to cause an immediately visible (~10min) deterioration in primes/s.

just a few comments:
those bool arrays are probably no much better than the vectors. Each bool takes 1 byte of RAM. If you want to make that faster you have use an array of ints that is 1/32 of the size, and encode each bool in a single bit of the 32 bits. That will make arrays shorter, speeding up mallocs and making more bools fit in the cache.

Also, probably changing the call to size() by a variable, and splitting the loop look like not worth it. However the upper and lower_bound are definitly better
Yeah, I'm aware of the bool issue, and I thought let's give it a try in the malloc vs. memory tradeoff. It actually seems to have been the single optimization that brought a rather direct improvement to primes/sec (apart from Sunny Kings change).
Still, take a look at a profiler, those mallocs are by far the most dominant in the whole program.
I'm thinking of merging the three arrays into one and saying e.g. 1 bit means vfCompositeCunningham1, another one means vfCompositeCunningham2 etc.
since they are usually addressed one after another at the same index, this optimizes cache usage, and something like
            if (!vfCompositeCunningham1[nMultiplier] ||
                !vfCompositeCunningham2[nMultiplier] ||
                !vfCompositeBiTwin[nMultiplier])
can be reduced to
if (newarray[nMultiplier]!=0)

@tacotime: I did a

    CSieveOfEratosthenes(unsigned int nBits, uint256 hashBlockHeader, CBigNum& bnFixedMultiplier):
        vfCompositeCunningham1{0},
        vfCompositeCunningham2{0},
        vfCompositeBiTwin{0}
    {

instead - the compiler gives a strange warning about me having to supply a switch but this behaviour being the default anyway (Huh) but all bools appear to be false, so it appears to be working (I did a loop in the initializer for a while that was checking whether alls are false and if not would have output a warning, and as I saw no warning after ~30mins I then removed the check and am assuming that all is fine...
prime.h:96:2: Warnung: erweiterte Initialisierungsliste nur mit -std=c++11 oder -std=gnu++11 verfügbar [standardmäÃig aktiviert]
(Roughly translates to : Enhanced initialization list only when using -std=c++11 or -std=gnu++11 available (activated by default)
But when I supply either, I get errors in other modules...
legendary
Activity: 1484
Merit: 1005
did you measure the speedup of every change? the first rule of optimization is: measure! otherwise you may be making it worse without knowing...
just a few comments:
those bool arrays are probably no much better than the vectors. Each bool takes 1 byte of RAM. If you want to make that faster you have use an array of ints that is 1/32 of the size, and encode each bool in a single bit of the 32 bits. That will make arrays shorter, speeding up mallocs and making more bools fit in the cache.

Also, probably changing the call to size() by a variable, and splitting the loop look like not worth it. However the upper and lower_bound are definitly better

Agreed
full member
Activity: 244
Merit: 101
So I just updated to the most recent version of the miner from Sunny King, and now my ten minute rolling average pps is ~600 pps (custom calculation script) on 8 threads. Woo!

Now to see if that correlates to finding more blocks.

Everyone update and stop bitching about people "stealing blocks", lol.



Edit: I forgot to mention that my previous average about 42 pps. Additionally, I compiled the update will the default options on OSX, running on a quad core 2.2 GHz Intel core i7 on macbook pro.
Sounds good!

How can a noob on windows 7 take Suny's github code and update their miner?

+1
legendary
Activity: 1484
Merit: 1005
-prime.h: Removed nSieveSize and replaced all occurrences with nMaxSieveSize. Removed nSieveSize from the initializer as it's become obsolete.

-prime.h: Changed
<     std::vector vfCompositeCunningham1;
<     std::vector vfCompositeCunningham2;
<     std::vector vfCompositeBiTwin;
---
>     bool vfCompositeCunningham1[1000000];
>     bool vfCompositeCunningham2[1000000];
>     bool vfCompositeBiTwin[1000000];
Thus, we should save on quite a few mallocs.

Don't forget to initialize the array by adding the following below in place of the other initialization

Code:
        for (int i = 0; i<1000000; ++i){
            vfCompositeCunningham1[i] = false;
            vfCompositeCunningham2[i] = false;
            vfCompositeBiTwin[i] = false;
        }
sr. member
Activity: 392
Merit: 250
Thanks Mike!
I did some change but i don't understand some of them, like those three :



-did the headerhash mod outlined earlier

-after creating vPrimes array in prime.cpp, store vPrimes.size() in a static variably and replace all calls to it with the variable

- in prime.cpp, CSieveOfEratosthenes::Weave() :
Split the for loop into two for loops, one going up to nChainLength, the other one going up to 2*nChainLength
Thus, you can safe the first first if+for clause in the first loop and leave away the if, and leave the if+for completely away in the second loop.


 Somebody can explain how to make those changes?
sr. member
Activity: 826
Merit: 250
CryptoTalk.Org - Get Paid for every Post!
Sorry for my earlier error on Market cap.  The newest exchange rates and coin total (168,186.31) yields a cap of around 68k which would be 15th just behind CNC.  The hourly mining rate is equivalent to 6 BTC (assuming 1 minute blocks and 20.41 coins per block as the last few have been) which wold be $500 at current exchange rates, still 3rd amongst all coins.
full member
Activity: 158
Merit: 100
Difficult to say, as Sunny King's change brings a huge boost to primes/sec measurement, and since we already know that this is not a real measurement....
I found three blocks in 1,5h whereas I only had 2 blocks in the 24h before that, but since we're talking luck this may well just be a statistical anomaly.

With all mentioned optimizations in place I now see
2013-07-10 19:54:30 primemeter   1229628 prime/h   9519403 test/h
2013-07-10 19:56:30 primemeter   1200380 prime/h   9385149 test/h
2013-07-10 19:58:31 primemeter   1059123 prime/h   8281702 test/h
2013-07-10 20:00:32 primemeter   1132707 prime/h   8806710 test/h

Whereas before I had
2013-07-09 16:01:06 primemeter    191133 prime/h   1230638 test/h
2013-07-09 16:03:07 primemeter    207187 prime/h   1333814 test/h
2013-07-09 16:05:07 primemeter    197670 prime/h   1302659 test/h
2013-07-09 16:07:08 primemeter    226844 prime/h   1538940 test/h
2013-07-09 16:08:09 primemeter    241089 prime/h   1653641 test/h
2013-07-09 16:10:09 primemeter    246644 prime/h   1574452 test/h

So if this helps, feel free to donate ;-)
XPM   ALLbe86QswwTRvx1LTVnKaVXCz6mcA6YUR
BTC   135oF6hF9uUbzq9x1vTuaoqKPab2j513f2

I'm thinking the biggest improvements can be done by
a) optimizing the algorithm itself, possible according to TacoTime's post
b) minimizing malloc()s
and I guess there's still lots of potential.
If I had more time I'd try looking into using smaller sieves that fit into CPU cache but doing more sieves (so that the range sieved remains the same)
hero member
Activity: 552
Merit: 500
whats a decent per sec rate? I'm seeing on my machine that im also working on.. "primespersec" : 76.. windows 7 64bit.. blah blah blah
sr. member
Activity: 476
Merit: 250
If you are running the original binary released by Sunny, then you have very little chance of finding a block in less than 15 seconds IMO
What does that mean?
hero member
Activity: 583
Merit: 505
CTO @ Flixxo, Riecoin dev
member
Activity: 70
Merit: 10
Thanks for sharing Mike270 Smiley

We are now onto the 4th day of mining (74 hours since launch)...

Some rough observations:

  • Day 1 block spacing was in the range 40 - 110 seconds | difficulty 7 - 7.2
  • Day 2 block spacing was in the range 10 - 40 seconds | difficulty 7.2 - 7.33
  • Day 3 block spacing was in the range 15 - 25 seconds | difficulty 7.4 - 7.6

If you are running the original binary released by Sunny, then you have very little chance of finding a block in less than 15 seconds IMO
full member
Activity: 213
Merit: 100
full member
Activity: 158
Merit: 100
Ok, so I'm probably hurting my income, but sharing some hints for optimizations:
A quick'n'dirty profiling run showed that we spend very much time in memory allocations, so I am trying to eliminate some of them.

-did the headerhash mod outlined earlier

-did the time limit on Weave() from Sunny King

-removed some printf's , especially
printf("MineProbablePrimeChain() : new sieve (%u/%u) ready in %uus\n", psieve->GetCandidateCount(), nMaxSieveSize, (unsigned int) (GetTimeMicros() - nStart));
from prime.cpp since it's the only call to the rather expensive GetCandidateCount

-after creating vPrimes array in prime.cpp, store vPrimes.size() in a static variably and replace all calls to it with the variable

-in prime.cpp, PrimeTableGetNextPrime and PrimeTableGetPreviousPrime: Use upper_bound/lower_bound instead of linear search:
// Get next prime number of p
bool PrimeTableGetNextPrime(unsigned int& p)
{
    std::vector::iterator foundelement=upper_bound(vPrimes.begin(), vPrimes.end(),p);
    if (foundelement!=vPrimes.end()){
        p=*foundelement;
        return true;
    }
    return false;
}

// Get previous prime number of p
bool PrimeTableGetPreviousPrime(unsigned int& p)
{
    std::vector::iterator foundelement=lower_bound(vPrimes.begin(), vPrimes.end(),p);
    if (foundelement!=vPrimes.end()){
        p=*--foundelement;
        return true;
    }
    return false;
}

- in prime.cpp, CSieveOfEratosthenes::Weave() :
Split the for loop into two for loops, one going up to nChainLength, the other one going up to 2*nChainLength
Thus, you can safe the first first if+for clause in the first loop and leave away the if, and leave the if+for completely away in the second loop.
Also,
if (((nBiTwinSeq & 1u) == 0))
...
if (((nBiTwinSeq & 1u) == 1u))

can be substituted by

if (((nBiTwinSeq & 1u) == 0))
....
else
.....


-prime.h: Removed nSieveSize and replaced all occurrences with nMaxSieveSize. Removed nSieveSize from the initializer as it's become obsolete.

-prime.h: Changed
<     std::vector vfCompositeCunningham1;
<     std::vector vfCompositeCunningham2;
<     std::vector vfCompositeBiTwin;
---
>     bool vfCompositeCunningham1[1000000];
>     bool vfCompositeCunningham2[1000000];
>     bool vfCompositeBiTwin[1000000];
Thus, we should save on quite a few mallocs.


full member
Activity: 238
Merit: 100
I am not.  Looking at the most current codebase, there is definitely more to be done...
full member
Activity: 213
Merit: 100


So seriously, are people sharing optimization tips somewhere?  I've made a few myself, but I'm sure other people have different ones.   If you want to trade optimizations, just PM me.

Possibly, but if you are referring to my post, the optimizations (what ever they were) were made by the original dev Sunny King, and are available in the git hub repo (linked in my post above).
full member
Activity: 238
Merit: 100


So seriously, are people sharing optimization tips somewhere?  I've made a few myself, but I'm sure other people have different ones.   If you want to trade optimizations, just PM me.
full member
Activity: 213
Merit: 100
So I just updated to the most recent version of the miner from Sunny King, and now my ten minute rolling average pps is ~600 pps (custom calculation script) on 8 threads. Woo!

Now to see if that correlates to finding more blocks.

Everyone update and stop bitching about people "stealing blocks", lol.



Edit: I forgot to mention that my previous average about 42 pps. Additionally, I compiled the update will the default options on OSX, running on a quad core 2.2 GHz Intel core i7 on macbook pro.

Did you do this without reducing the maxseivesize?  'cause I think that skews the pps number.  ...but there are definitely places all over for optimization...

I did not make any changes to the code.

To update: check the source out from github here: https://github.com/primecoin/primecoin

I'm not a windows guy, so I can't help you with compiling on windows, sorry.
legendary
Activity: 1792
Merit: 1000
So I just updated to the most recent version of the miner from Sunny King, and now my ten minute rolling average pps is ~600 pps (custom calculation script) on 8 threads. Woo!

Now to see if that correlates to finding more blocks.

Everyone update and stop bitching about people "stealing blocks", lol.



Edit: I forgot to mention that my previous average about 42 pps. Additionally, I compiled the update will the default options on OSX, running on a quad core 2.2 GHz Intel core i7 on macbook pro.
Sounds good!

How can a noob on windows 7 take Suny's github code and update their miner?
sr. member
Activity: 464
Merit: 252
Everyone update and stop bitching about people "stealing blocks", lol.
How to update? Most of the miners are not programmers to get into code.
Jump to: