Pages:
Author

Topic: [XPM] Primecoin Built-in Miner Sieve Performance Issue - page 16. (Read 69150 times)

member
Activity: 99
Merit: 10
Sweet Smiley testing now

...I think I removed all the extra printf statements I used to test it out, but let me know if your ~/.primecoin/debug.log file explodes in size...
legendary
Activity: 1064
Merit: 1000
I have added a Primecoin test net node at:

84.200.84.74

This node is compiled from the latest code at the official git repo.

hero member
Activity: 602
Merit: 500
Sweet Smiley testing now
member
Activity: 99
Merit: 10
Alright, should be up in the pull requests on https://github.com/primecoin/primecoin and is also available from https://github.com/Chemisist/primecoin if you're itching to have a look.  Compiled binaries will be available at https://www.dropbox.com/sh/1y7mwwc4asgfqfb/Eo2sKCsZor as I make them and upload them


(This was originally just going to be a message to Sunny, but I’ll post it pretty much as is because I’m feeling lazy and I need to get back to work on my thesis!)
[msg_to_sunny]

Hey Sunny King-
In studying your algorithm, I’ve learned quite a bit about prime numbers and modulus arithmetic, thanks!  Unfortunately, I’ve got to focus more on my thesis!  I’ll be spending a little less time thinking about how to parallelize this complex algorithm, but I’ve collected my current thoughts on parallelization and I was hoping that you could answer some of these questions.  Also, do you know if anyone else is working on a GPU implementation of your algorithm?

Also, while trying to figure out what the heck a multiplicative modulus inverse operation is I came across multiple people demonstrating that the modulus operator and division operators are very slow (10+ clock cycles) on nVidia GPUs (and I assume ATI as well).  Is it possible to weave the required sieve without the modulus operator?
Timing your Weave algorithm, it seems that about 10-15% of the time is being spent in the outer loop with the remaining 85-90% of the time being spent on the inner loop.  I’m stuck on how to parallelize this inner loop.  Ideally, I would be able to launch threads which correspond to all values of nBiTwinSeq less than nChainLength but I now realize that the value of bnFixedInverse is path dependent and cannot be parallelized in the direction that I initially thought.

My current thoughts on how to parallelize the implementation of the Weave algorithm is a two-stage process and decomposes the nested loop into sequential loops.  The values calculated in the outer loop, namely bnFixedInverse and bnTwoInverse can be calculated and stored in memory to be used in the inner loop.  The rationale for doing this is that the calculation of these values for the 10^6 prime numbers stored in the vPrimes array are all independent of each other and therefore perfect for parallelization.  Having calculated and stored all those values, each execution of the nested loop can (theoretically) be performed on individual threads also and since the algorithm is strictly switching bits from 0 to 1, I don’t think that race conditions which are usually so problematic in parallelization will be relevant.

Thinking about the relative time it takes to completely create the sieve (2-5 seconds depending on hardware) and the time it takes to test all primes found (approx. 5-20 ms, depending on the number of primes), it seems like the best implementation of parallelized code is going to have a single “master” thread testing the sieves woven in parallel by GPU’s or additional CPU cores.  If the GPU can weave a sieve in about as long as it takes to test the primes for chain properties then the program should be structured to pair off physical GPUs with individual cores to achieve maximum performance.  Assuming that weaving the complete sieve is slower than the weave testing (which it will probably always be) then creating and testing the incomplete sieve in m and n ms, respectively will be most efficient when m = n.  This is essentially maximizing the time that we’re testing primes instead of spending the overwhelming majority of time weaving the “perfect” sieve.  It is interesting to note that the sieve is woven to the point of only needing to test ~2000 primes after it has gone through less than 1% of the total vPrimes prime array.

The implementation of an evolving weave timer is primarily to force the sieve creation time to be equivalent to the sieve testing time.  This shows a pretty large performance boost in pps.  

A compiled binary is available at https://www.dropbox.com/sh/1y7mwwc4asgfqfb/Eo2sKCsZor for an -O3 flagged linux compilation (64 bit)

Though, these are of course just the thoughts of a total hack  Grin, let me know what you think!

Eric

[/msg_to_sunny]

(donations welcome Smiley )
member
Activity: 182
Merit: 10
is there an updated version for the fx8350 im still running on 8 with a pps of 1200-1400

Would this work for an A63400m mobile processor? I'm just using a universal client for it.
hero member
Activity: 602
Merit: 500
Testnet code found 10 of the last 14 blocks with the i7 and something like 5 or 6 with the Core 2 Duo before that.  Updating code for a pull request and I'll post links to my compiled executables for Core 2 Duo, Core 2 Quad and i7 bloomfield and i5 sandy bridge after that.

Great work, will pull request be public right away?

(I've never used github before so I have no idea... Sad Learning as I go!)

If it's not available and you're interested, send me a PM with your email and I'll send you my prime.cpp and prime.h files.  I'll also compose a post with what I changed.


It should show in the pull request section of the coin.
Ill donate on my first working block on main net Wink
member
Activity: 99
Merit: 10
Testnet code found 10 of the last 14 blocks with the i7 and something like 5 or 6 with the Core 2 Duo before that.  Updating code for a pull request and I'll post links to my compiled executables for Core 2 Duo, Core 2 Quad and i7 bloomfield and i5 sandy bridge after that.

Great work, will pull request be public right away?

(I've never used github before so I have no idea... Sad Learning as I go!)

If it's not available and you're interested, send me a PM with your email and I'll send you my prime.cpp and prime.h files.  I'll also compose a post with what I changed.

hero member
Activity: 602
Merit: 500
Testnet code found 10 of the last 14 blocks with the i7 and something like 5 or 6 with the Core 2 Duo before that.  Updating code for a pull request and I'll post links to my compiled executables for Core 2 Duo, Core 2 Quad and i7 bloomfield and i5 sandy bridge after that.

Great work, will pull request be public right away?
member
Activity: 84
Merit: 10
is there an updated version for the fx8350 im still running on 8 with a pps of 1200-1400
member
Activity: 99
Merit: 10
Testnet code found 10 of the last 14 blocks with the i7 and something like 5 or 6 with the Core 2 Duo before that.  Updating code for a pull request and I'll post links to my compiled executables for Core 2 Duo, Core 2 Quad and i7 bloomfield and i5 sandy bridge after that.
legendary
Activity: 1064
Merit: 1000
Code:
alert.cpp:6:53: fatal error: boost/algorithm/string/classification.hpp: No such file or directory
compilation terminated.
make: *** [obj/alert.o] Error 1

When I try to compile the primecoind i keep getting this error. Ubuntu 12.04.2 LTS

Thanks

Either your missing boost or your INCLUDEPATHS in makefile.unix are wrong.

I don't recall having to install boost though on a vanilla image of 12.04 server (I will look into it later).

Thanks,How I install boost?


With 12.4 the default repo version of boost is to low a version (1.46). You will need to install version 1.48.

sudo apt-get install libboost1.48-all-dev

Should get you what you need.  Smiley
member
Activity: 99
Merit: 10
Yeah I just got a message from Sunny informing me of this.  Testing now.
sr. member
Activity: 280
Merit: 250
Anyone know if the primecoin testnet is down?  If it's not, any chance of getting some testnet nodes posted?

The reason why I'm asking is because I've modified the original algorithm and it's currently running on my core i7 desktop at 1840(40) pps whereas the latest build runs at 1030(60) pps.  I'd like to share this with everyone, but I need to make sure that it will actually find blocks and I need access to the testnet to make sure it's actually working...

Oh, and the same compiled code runs on my Core 2 Duo laptop (T9300) at 400-600 pps

As soon as I get access to testnet or whenever this code finds a block on the real network I'll send a pull request to the github repository

primemeter results:

latest build compiled with -O3 flag
2013-07-12 14:18:14 primemeter   2011519 prime/h  13901785 test/h
2013-07-12 14:20:14 primemeter   3930894 prime/h  28700282 test/h
2013-07-12 14:22:14 primemeter   4302157 prime/h  32085310 test/h
2013-07-12 14:24:14 primemeter   3775965 prime/h  27785845 test/h
2013-07-12 14:26:14 primemeter   4119051 prime/h  30723224 test/h
2013-07-12 14:28:14 primemeter   4371463 prime/h  32873249 test/h
2013-07-12 14:30:14 primemeter   3816751 prime/h  28601906 test/h
2013-07-12 14:32:14 primemeter   3756402 prime/h  27542967 test/h
2013-07-12 14:34:14 primemeter   3184962 prime/h  23768423 test/h

my build compiled with -O3 flag
2013-07-12 13:31:48 primemeter   7189469 prime/h  54319461 test/h
2013-07-12 13:33:49 primemeter   6830908 prime/h  51281614 test/h
2013-07-12 13:35:49 primemeter   7052465 prime/h  53836478 test/h
2013-07-12 13:37:49 primemeter   6434811 prime/h  48536697 test/h
2013-07-12 13:39:50 primemeter   6266447 prime/h  47435825 test/h
2013-07-12 13:41:50 primemeter   6827085 prime/h  51539144 test/h
2013-07-12 13:43:50 primemeter   7355815 prime/h  55993030 test/h
2013-07-12 13:45:50 primemeter   7438305 prime/h  55507409 test/h
2013-07-12 13:47:50 primemeter   6506020 prime/h  48719032 test/h
2013-07-12 13:49:51 primemeter   6227709 prime/h  47225135 test/h
2013-07-12 13:51:51 primemeter   6079465 prime/h  45219277 test/h
2013-07-12 13:53:51 primemeter   7642553 prime/h  58080712 test/h
2013-07-12 13:55:51 primemeter   5616266 prime/h  42594130 test/h
2013-07-12 13:57:51 primemeter   6168969 prime/h  46249065 test/h
2013-07-12 13:58:52 primemeter   6354848 prime/h  47451299 test/h
2013-07-12 14:00:52 primemeter   6036007 prime/h  45036515 test/h
2013-07-12 14:02:52 primemeter   6836223 prime/h  51643364 test/h
2013-07-12 14:04:52 primemeter   6249085 prime/h  46962094 test/h

Testnet is working for me atleast, only 1 connection tho,
full member
Activity: 208
Merit: 100
Code:
alert.cpp:6:53: fatal error: boost/algorithm/string/classification.hpp: No such file or directory
compilation terminated.
make: *** [obj/alert.o] Error 1

When I try to compile the primecoind i keep getting this error. Ubuntu 12.04.2 LTS

Thanks

Either your missing boost or your INCLUDEPATHS in makefile.unix are wrong.

I don't recall having to install boost though on a vanilla image of 12.04 server (I will look into it later).

Thanks,How I install boost?
sr. member
Activity: 246
Merit: 250
My spoon is too big!
Any chance for a nehalem-based i7?
member
Activity: 99
Merit: 10
Anyone know if the primecoin testnet is down?  If it's not, any chance of getting some testnet nodes posted?

The reason why I'm asking is because I've modified the original algorithm and it's currently running on my core i7 desktop at 1840(40) pps whereas the latest build runs at 1030(60) pps.  I'd like to share this with everyone, but I need to make sure that it will actually find blocks and I need access to the testnet to make sure it's actually working...

Oh, and the same compiled code runs on my Core 2 Duo laptop (T9300) at 400-600 pps

As soon as I get access to testnet or whenever this code finds a block on the real network I'll send a pull request to the github repository

primemeter results:

latest build compiled with -O3 flag
2013-07-12 14:18:14 primemeter   2011519 prime/h  13901785 test/h
2013-07-12 14:20:14 primemeter   3930894 prime/h  28700282 test/h
2013-07-12 14:22:14 primemeter   4302157 prime/h  32085310 test/h
2013-07-12 14:24:14 primemeter   3775965 prime/h  27785845 test/h
2013-07-12 14:26:14 primemeter   4119051 prime/h  30723224 test/h
2013-07-12 14:28:14 primemeter   4371463 prime/h  32873249 test/h
2013-07-12 14:30:14 primemeter   3816751 prime/h  28601906 test/h
2013-07-12 14:32:14 primemeter   3756402 prime/h  27542967 test/h
2013-07-12 14:34:14 primemeter   3184962 prime/h  23768423 test/h

my build compiled with -O3 flag
2013-07-12 13:31:48 primemeter   7189469 prime/h  54319461 test/h
2013-07-12 13:33:49 primemeter   6830908 prime/h  51281614 test/h
2013-07-12 13:35:49 primemeter   7052465 prime/h  53836478 test/h
2013-07-12 13:37:49 primemeter   6434811 prime/h  48536697 test/h
2013-07-12 13:39:50 primemeter   6266447 prime/h  47435825 test/h
2013-07-12 13:41:50 primemeter   6827085 prime/h  51539144 test/h
2013-07-12 13:43:50 primemeter   7355815 prime/h  55993030 test/h
2013-07-12 13:45:50 primemeter   7438305 prime/h  55507409 test/h
2013-07-12 13:47:50 primemeter   6506020 prime/h  48719032 test/h
2013-07-12 13:49:51 primemeter   6227709 prime/h  47225135 test/h
2013-07-12 13:51:51 primemeter   6079465 prime/h  45219277 test/h
2013-07-12 13:53:51 primemeter   7642553 prime/h  58080712 test/h
2013-07-12 13:55:51 primemeter   5616266 prime/h  42594130 test/h
2013-07-12 13:57:51 primemeter   6168969 prime/h  46249065 test/h
2013-07-12 13:58:52 primemeter   6354848 prime/h  47451299 test/h
2013-07-12 14:00:52 primemeter   6036007 prime/h  45036515 test/h
2013-07-12 14:02:52 primemeter   6836223 prime/h  51643364 test/h
2013-07-12 14:04:52 primemeter   6249085 prime/h  46962094 test/h


legendary
Activity: 1862
Merit: 1011
Reverse engineer from time to time
I dont think theres much more can be squeezed out of the client without rewriting the mining code, theoretically we already have a more efficent miner in bfgminer, just no API in the client too interact with it.
Yeah, getwork is not suited for it.
sr. member
Activity: 280
Merit: 250
I dont think theres much more can be squeezed out of the client without rewriting the mining code, theoretically we already have a more efficent miner in bfgminer, just no API in the client too interact with it.
hero member
Activity: 756
Merit: 501
That's weird, maybe you have a 32 bit Windows? Anyways, the anonppc build should be good to go, since he confirmed it being on the official codebase.
newbie
Activity: 35
Merit: 0
Thanks for the Core 2 build, there are still people stuck with these things Smiley

Unfortunately I'm getting ~40% lower PPS than anonppcoin's build
Try this x86-x64 one https://bitcointalksearch.org/topic/m.2710575
Both have about the same performance on my Core2Duo and should have the same codebase.

Found that one earlier but get an immediate crash on open for some reason
Pages:
Jump to: