[XPM] Primecoin Built-in Miner Sieve Performance Issue - page 16.

Chemisist

member

Activity: 99

Merit: 10

Quote from: blastbob on July 12, 2013, 11:03:33 AM

Sweet

testing now

...I think I removed all the extra printf statements I used to test it out, but let me know if your ~/.primecoin/debug.log file explodes in size...

dreamwatcher

legendary

Activity: 1064

Merit: 1000

I have added a Primecoin test net node at:

84.200.84.74

This node is compiled from the latest code at the official git repo.

blastbob

hero member

Activity: 602

Merit: 500

Sweet

testing now

Chemisist

member

Activity: 99

Merit: 10

Alright, should be up in the pull requests on https://github.com/primecoin/primecoin and is also available from https://github.com/Chemisist/primecoin if you're itching to have a look. Compiled binaries will be available at https://www.dropbox.com/sh/1y7mwwc4asgfqfb/Eo2sKCsZor as I make them and upload them

(This was originally just going to be a message to Sunny, but I’ll post it pretty much as is because I’m feeling lazy and I need to get back to work on my thesis!)
[msg_to_sunny]

Hey Sunny King-
In studying your algorithm, I’ve learned quite a bit about prime numbers and modulus arithmetic, thanks! Unfortunately, I’ve got to focus more on my thesis! I’ll be spending a little less time thinking about how to parallelize this complex algorithm, but I’ve collected my current thoughts on parallelization and I was hoping that you could answer some of these questions. Also, do you know if anyone else is working on a GPU implementation of your algorithm?

Also, while trying to figure out what the heck a multiplicative modulus inverse operation is I came across multiple people demonstrating that the modulus operator and division operators are very slow (10+ clock cycles) on nVidia GPUs (and I assume ATI as well). Is it possible to weave the required sieve without the modulus operator?
Timing your Weave algorithm, it seems that about 10-15% of the time is being spent in the outer loop with the remaining 85-90% of the time being spent on the inner loop. I’m stuck on how to parallelize this inner loop. Ideally, I would be able to launch threads which correspond to all values of nBiTwinSeq less than nChainLength but I now realize that the value of bnFixedInverse is path dependent and cannot be parallelized in the direction that I initially thought.

My current thoughts on how to parallelize the implementation of the Weave algorithm is a two-stage process and decomposes the nested loop into sequential loops. The values calculated in the outer loop, namely bnFixedInverse and bnTwoInverse can be calculated and stored in memory to be used in the inner loop. The rationale for doing this is that the calculation of these values for the 10^6 prime numbers stored in the vPrimes array are all independent of each other and therefore perfect for parallelization. Having calculated and stored all those values, each execution of the nested loop can (theoretically) be performed on individual threads also and since the algorithm is strictly switching bits from 0 to 1, I don’t think that race conditions which are usually so problematic in parallelization will be relevant.

Thinking about the relative time it takes to completely create the sieve (2-5 seconds depending on hardware) and the time it takes to test all primes found (approx. 5-20 ms, depending on the number of primes), it seems like the best implementation of parallelized code is going to have a single “master” thread testing the sieves woven in parallel by GPU’s or additional CPU cores. If the GPU can weave a sieve in about as long as it takes to test the primes for chain properties then the program should be structured to pair off physical GPUs with individual cores to achieve maximum performance. Assuming that weaving the complete sieve is slower than the weave testing (which it will probably always be) then creating and testing the incomplete sieve in m and n ms, respectively will be most efficient when m = n. This is essentially maximizing the time that we’re testing primes instead of spending the overwhelming majority of time weaving the “perfect” sieve. It is interesting to note that the sieve is woven to the point of only needing to test ~2000 primes after it has gone through less than 1% of the total vPrimes prime array.

The implementation of an evolving weave timer is primarily to force the sieve creation time to be equivalent to the sieve testing time. This shows a pretty large performance boost in pps.

A compiled binary is available at https://www.dropbox.com/sh/1y7mwwc4asgfqfb/Eo2sKCsZor for an -O3 flagged linux compilation (64 bit)

Though, these are of course just the thoughts of a total hack Grin

, let me know what you think!

Eric

[/msg_to_sunny]

(donations welcome

)

dudeguy

member

Activity: 182

Merit: 10

Quote from: simichent on July 12, 2013, 10:22:25 AM

is there an updated version for the fx8350 im still running on 8 with a pps of 1200-1400

Would this work for an A63400m mobile processor? I'm just using a universal client for it.

blastbob

hero member

Activity: 602

Merit: 500

Quote from: Chemisist on July 12, 2013, 10:30:49 AM

Quote from: blastbob on July 12, 2013, 10:25:13 AM

Quote from: Chemisist on July 12, 2013, 10:19:34 AM

Testnet code found 10 of the last 14 blocks with the i7 and something like 5 or 6 with the Core 2 Duo before that. Updating code for a pull request and I'll post links to my compiled executables for Core 2 Duo, Core 2 Quad and i7 bloomfield and i5 sandy bridge after that.

Great work, will pull request be public right away?

(I've never used github before so I have no idea... Sad

Learning as I go!)

If it's not available and you're interested, send me a PM with your email and I'll send you my prime.cpp and prime.h files. I'll also compose a post with what I changed.

It should show in the pull request section of the coin.
Ill donate on my first working block on main net Wink

Chemisist

member

Activity: 99

Merit: 10

Quote from: blastbob on July 12, 2013, 10:25:13 AM

Quote from: Chemisist on July 12, 2013, 10:19:34 AM

Testnet code found 10 of the last 14 blocks with the i7 and something like 5 or 6 with the Core 2 Duo before that. Updating code for a pull request and I'll post links to my compiled executables for Core 2 Duo, Core 2 Quad and i7 bloomfield and i5 sandy bridge after that.

Great work, will pull request be public right away?

(I've never used github before so I have no idea... Sad

Learning as I go!)

If it's not available and you're interested, send me a PM with your email and I'll send you my prime.cpp and prime.h files. I'll also compose a post with what I changed.

blastbob

hero member

Activity: 602

Merit: 500

Quote from: Chemisist on July 12, 2013, 10:19:34 AM

Testnet code found 10 of the last 14 blocks with the i7 and something like 5 or 6 with the Core 2 Duo before that. Updating code for a pull request and I'll post links to my compiled executables for Core 2 Duo, Core 2 Quad and i7 bloomfield and i5 sandy bridge after that.

Great work, will pull request be public right away?

simichent

member

Activity: 84

Merit: 10

is there an updated version for the fx8350 im still running on 8 with a pps of 1200-1400

Chemisist

member

Activity: 99

Merit: 10

Testnet code found 10 of the last 14 blocks with the i7 and something like 5 or 6 with the Core 2 Duo before that. Updating code for a pull request and I'll post links to my compiled executables for Core 2 Duo, Core 2 Quad and i7 bloomfield and i5 sandy bridge after that.

dreamwatcher

legendary

Activity: 1064

Merit: 1000

Quote from: flowice on July 12, 2013, 09:48:08 AM

Quote from: shinkicker on July 12, 2013, 03:20:20 AM

Quote from: flowice on July 12, 2013, 03:16:28 AM

Code:

alert.cpp:6:53: fatal error: boost/algorithm/string/classification.hpp: No such file or directory
compilation terminated.
make: *** [obj/alert.o] Error 1

When I try to compile the primecoind i keep getting this error. Ubuntu 12.04.2 LTS

Thanks

Either your missing boost or your INCLUDEPATHS in makefile.unix are wrong.

I don't recall having to install boost though on a vanilla image of 12.04 server (I will look into it later).

Thanks,How I install boost?

With 12.4 the default repo version of boost is to low a version (1.46). You will need to install version 1.48.

sudo apt-get install libboost1.48-all-dev

Should get you what you need.

Chemisist

member

Activity: 99

Merit: 10

Yeah I just got a message from Sunny informing me of this. Testing now.

oroqen

sr. member

Activity: 280

Merit: 250

Quote from: Chemisist on July 12, 2013, 08:22:40 AM

Anyone know if the primecoin testnet is down? If it's not, any chance of getting some testnet nodes posted?

The reason why I'm asking is because I've modified the original algorithm and it's currently running on my core i7 desktop at 1840(40) pps whereas the latest build runs at 1030(60) pps. I'd like to share this with everyone, but I need to make sure that it will actually find blocks and I need access to the testnet to make sure it's actually working...

Oh, and the same compiled code runs on my Core 2 Duo laptop (T9300) at 400-600 pps

As soon as I get access to testnet or whenever this code finds a block on the real network I'll send a pull request to the github repository

primemeter results:

latest build compiled with -O3 flag
2013-07-12 14:18:14 primemeter 2011519 prime/h 13901785 test/h
2013-07-12 14:20:14 primemeter 3930894 prime/h 28700282 test/h
2013-07-12 14:22:14 primemeter 4302157 prime/h 32085310 test/h
2013-07-12 14:24:14 primemeter 3775965 prime/h 27785845 test/h
2013-07-12 14:26:14 primemeter 4119051 prime/h 30723224 test/h
2013-07-12 14:28:14 primemeter 4371463 prime/h 32873249 test/h
2013-07-12 14:30:14 primemeter 3816751 prime/h 28601906 test/h
2013-07-12 14:32:14 primemeter 3756402 prime/h 27542967 test/h
2013-07-12 14:34:14 primemeter 3184962 prime/h 23768423 test/h

my build compiled with -O3 flag
2013-07-12 13:31:48 primemeter 7189469 prime/h 54319461 test/h
2013-07-12 13:33:49 primemeter 6830908 prime/h 51281614 test/h
2013-07-12 13:35:49 primemeter 7052465 prime/h 53836478 test/h
2013-07-12 13:37:49 primemeter 6434811 prime/h 48536697 test/h
2013-07-12 13:39:50 primemeter 6266447 prime/h 47435825 test/h
2013-07-12 13:41:50 primemeter 6827085 prime/h 51539144 test/h
2013-07-12 13:43:50 primemeter 7355815 prime/h 55993030 test/h
2013-07-12 13:45:50 primemeter 7438305 prime/h 55507409 test/h
2013-07-12 13:47:50 primemeter 6506020 prime/h 48719032 test/h
2013-07-12 13:49:51 primemeter 6227709 prime/h 47225135 test/h
2013-07-12 13:51:51 primemeter 6079465 prime/h 45219277 test/h
2013-07-12 13:53:51 primemeter 7642553 prime/h 58080712 test/h
2013-07-12 13:55:51 primemeter 5616266 prime/h 42594130 test/h
2013-07-12 13:57:51 primemeter 6168969 prime/h 46249065 test/h
2013-07-12 13:58:52 primemeter 6354848 prime/h 47451299 test/h
2013-07-12 14:00:52 primemeter 6036007 prime/h 45036515 test/h
2013-07-12 14:02:52 primemeter 6836223 prime/h 51643364 test/h
2013-07-12 14:04:52 primemeter 6249085 prime/h 46962094 test/h

Testnet is working for me atleast, only 1 connection tho,

flowice

full member

Activity: 208

Merit: 100

Quote from: shinkicker on July 12, 2013, 03:20:20 AM

Quote from: flowice on July 12, 2013, 03:16:28 AM

Code:

alert.cpp:6:53: fatal error: boost/algorithm/string/classification.hpp: No such file or directory
compilation terminated.
make: *** [obj/alert.o] Error 1

When I try to compile the primecoind i keep getting this error. Ubuntu 12.04.2 LTS

Thanks

Either your missing boost or your INCLUDEPATHS in makefile.unix are wrong.

I don't recall having to install boost though on a vanilla image of 12.04 server (I will look into it later).

Thanks,How I install boost?

redphlegm

sr. member

Activity: 246

Merit: 250

My spoon is too big!

Any chance for a nehalem-based i7?

Chemisist

member

Activity: 99

Merit: 10

Anyone know if the primecoin testnet is down? If it's not, any chance of getting some testnet nodes posted?

The reason why I'm asking is because I've modified the original algorithm and it's currently running on my core i7 desktop at 1840(40) pps whereas the latest build runs at 1030(60) pps. I'd like to share this with everyone, but I need to make sure that it will actually find blocks and I need access to the testnet to make sure it's actually working...

Oh, and the same compiled code runs on my Core 2 Duo laptop (T9300) at 400-600 pps

As soon as I get access to testnet or whenever this code finds a block on the real network I'll send a pull request to the github repository

primemeter results:

latest build compiled with -O3 flag
2013-07-12 14:18:14 primemeter 2011519 prime/h 13901785 test/h
2013-07-12 14:20:14 primemeter 3930894 prime/h 28700282 test/h
2013-07-12 14:22:14 primemeter 4302157 prime/h 32085310 test/h
2013-07-12 14:24:14 primemeter 3775965 prime/h 27785845 test/h
2013-07-12 14:26:14 primemeter 4119051 prime/h 30723224 test/h
2013-07-12 14:28:14 primemeter 4371463 prime/h 32873249 test/h
2013-07-12 14:30:14 primemeter 3816751 prime/h 28601906 test/h
2013-07-12 14:32:14 primemeter 3756402 prime/h 27542967 test/h
2013-07-12 14:34:14 primemeter 3184962 prime/h 23768423 test/h

my build compiled with -O3 flag
2013-07-12 13:31:48 primemeter 7189469 prime/h 54319461 test/h
2013-07-12 13:33:49 primemeter 6830908 prime/h 51281614 test/h
2013-07-12 13:35:49 primemeter 7052465 prime/h 53836478 test/h
2013-07-12 13:37:49 primemeter 6434811 prime/h 48536697 test/h
2013-07-12 13:39:50 primemeter 6266447 prime/h 47435825 test/h
2013-07-12 13:41:50 primemeter 6827085 prime/h 51539144 test/h
2013-07-12 13:43:50 primemeter 7355815 prime/h 55993030 test/h
2013-07-12 13:45:50 primemeter 7438305 prime/h 55507409 test/h
2013-07-12 13:47:50 primemeter 6506020 prime/h 48719032 test/h
2013-07-12 13:49:51 primemeter 6227709 prime/h 47225135 test/h
2013-07-12 13:51:51 primemeter 6079465 prime/h 45219277 test/h
2013-07-12 13:53:51 primemeter 7642553 prime/h 58080712 test/h
2013-07-12 13:55:51 primemeter 5616266 prime/h 42594130 test/h
2013-07-12 13:57:51 primemeter 6168969 prime/h 46249065 test/h
2013-07-12 13:58:52 primemeter 6354848 prime/h 47451299 test/h
2013-07-12 14:00:52 primemeter 6036007 prime/h 45036515 test/h
2013-07-12 14:02:52 primemeter 6836223 prime/h 51643364 test/h
2013-07-12 14:04:52 primemeter 6249085 prime/h 46962094 test/h

Remember remember the 5th of November

legendary

Activity: 1862

Merit: 1014

Reverse engineer from time to time

Quote from: oroqen on July 12, 2013, 07:16:41 AM

I dont think theres much more can be squeezed out of the client without rewriting the mining code, theoretically we already have a more efficent miner in bfgminer, just no API in the client too interact with it.

Yeah, getwork is not suited for it.

oroqen

sr. member

Activity: 280

Merit: 250

I dont think theres much more can be squeezed out of the client without rewriting the mining code, theoretically we already have a more efficent miner in bfgminer, just no API in the client too interact with it.

eule

hero member

Activity: 756

Merit: 501

That's weird, maybe you have a 32 bit Windows? Anyways, the anonppc build should be good to go, since he confirmed it being on the official codebase.

j_gillard

newbie

Activity: 35

Merit: 0

Quote from: eule on July 12, 2013, 06:11:36 AM

Quote from: j_gillard on July 12, 2013, 04:11:48 AM

Thanks for the Core 2 build, there are still people stuck with these things

Unfortunately I'm getting ~40% lower PPS than anonppcoin's build

Try this x86-x64 one https://bitcointalksearch.org/topic/m.2710575
Both have about the same performance on my Core2Duo and should have the same codebase.

Found that one earlier but get an immediate crash on open for some reason

Topic: [XPM] Primecoin Built-in Miner Sieve Performance Issue - page 16. (Read 69182 times)