Topic: [ANN][RIC] Riecoin: constellations POW *CPU* HARD FORK successful, world record - page 217. (Read 685362 times)

Quote from: Supercomputing on March 04, 2014, 09:30:26 AM

- Generate up to a certain size polynomial. I use 200560490130 or the next as my base primorial and store a vector of all 48923875 entries.
- Sieve *this* out up to the huge primorial in advance.
- Do your operations relative to the huge primorial.

But, as warned - the simple bitvector is still working better for me. Wink

Cool, that is what I am going but looking at your numbers I also pre sieve the possible p6 chains reducing my candidate count by ~128 times:
const uint64_t primorial = 7420738134810;
const uint32_t sexcount = 14 243 984;

Then I run a second scan inline to catch the next 2 dozen or so primes (lets me avoid gmp and use simple 64bit math) before I hit the expensive code. General idea was to get a list of candidates which could be feed into something else (GPU was the thought).

It is much faster than reference but it is reaching the limit of how fast I can push it.

I have probably made a horrendous error in my algorithm... but coding again was fun...

Regards,

My implementation is a little different from both implementations mentioned above. If fact, the overhead is much less than that of jh00's implementation. My implementation is almost identical to Kim Walisch's primesieve implementation with a few minor exceptions.

Please see Kim Walisch's description of wheel factorization if you would like to know exactly what I am doing:
http://primesieve.org/

@bsunau7 - mine does the same. I kill any location that fails to produce a six-set. I wonder which of us has a bug? *grin* I'll check my sieving code again. As one way to start comparing, the polynomials for the first few primorials are:

Generator at Pn7 (210)
97

Generator at Pn11 (2310)
97 937 1147 1357 2197

Generator at Pn13 (30030)
97 1357 2407 3457 4717 5557 5767 6817 7867 8077 8287 10177 10597 11647 12907 13747 13957 15007 16057 16267 17107 18367 19417 19837 21727 21937 22147 23197 24247 24457 25297 26557 27607 28657 29917

@Supercomputing - Did you figure out a way to combine wheel factorization with storing a dense bitvector div 2310 (or div 210)? Or do you just allow a large bitvector and handle it through segmentation? I liked the way the jh implementation saved a lot of sieve space that way, and a straightforward prime sieve achieves a less dense packing (3-4x).

Well, think of a primorial as a wheel with no pre-sieving. For example, 43# guaranties that k, k+4, k+6, k+10, k+12, and k+16 will have no divisors less than or equal to 43. Therefore, the bigger the primorial, the more efficiently the sieve will run. Each bit in your sieve array already represents a potential chain k, and the trick is to segment the sieve so that you do not keep eliminating the same false chains over and over again within your sieve array.

Only a single static table of 32-bit integers (primes interleaved with prime inverses) is needed to coalesce memory access.

dga

hero member

Activity: 737

Merit: 511

Quote from: bsunau7 on March 04, 2014, 08:13:33 AM

Quote from: surfer43 on March 04, 2014, 08:14:48 AM

- Generate up to a certain size polynomial. I use 200560490130 or the next as my base primorial and store a vector of all 48923875 entries.
- Sieve *this* out up to the huge primorial in advance.
- Do your operations relative to the huge primorial.

But, as warned - the simple bitvector is still working better for me. Wink

Cool, that is what I am going but looking at your numbers I also pre sieve the possible p6 chains reducing my candidate count by ~128 times:
const uint64_t primorial = 7420738134810;
const uint32_t sexcount = 14 243 984;

Then I run a second scan inline to catch the next 2 dozen or so primes (lets me avoid gmp and use simple 64bit math) before I hit the expensive code. General idea was to get a list of candidates which could be feed into something else (GPU was the thought).

It is much faster than reference but it is reaching the limit of how fast I can push it.

I have probably made a horrendous error in my algorithm... but coding again was fun...

Regards,

My implementation is a little different from both implementations mentioned above. If fact, the overhead is much less than that of jh00's implementation. My implementation is almost identical to Kim Walisch's primesieve implementation with a few minor exceptions.

Please see Kim Walisch's description of wheel factorization if you would like to know exactly what I am doing:
http://primesieve.org/

@bsunau7 - mine does the same. I kill any location that fails to produce a six-set. I wonder which of us has a bug? *grin* I'll check my sieving code again. As one way to start comparing, the polynomials for the first few primorials are:

Generator at Pn7 (210)
97

Generator at Pn11 (2310)
97 937 1147 1357 2197

Generator at Pn13 (30030)
97 1357 2407 3457 4717 5557 5767 6817 7867 8077 8287 10177 10597 11647 12907 13747 13957 15007 16057 16267 17107 18367 19417 19837 21727 21937 22147 23197 24247 24457 25297 26557 27607 28657 29917

@Supercomputing - Did you figure out a way to combine wheel factorization with storing a dense bitvector div 2310 (or div 210)? Or do you just allow a large bitvector and handle it through segmentation? I liked the way the jh implementation saved a lot of sieve space that way, and a straightforward prime sieve achieves a less dense packing (3-4x).

GordonSSS

member

Activity: 63

Merit: 10

Any chance of Windows binaries?

Quote from: ?? on ??

I also made some more optimized changes to my modded xptminer build to get even more 4ch/s for comparsion with dga's new release.

static linux bins for different arch you can grab here: http://go.ispire.me/1vo
Fee: 2%

beatfried

newbie

Activity: 28

Merit: 0

wtf the ypool shares per second just doubled to 20 Angry

yeah... I think many people (including me) pointed their rigs back to ypool after solomining while they had troubles...

Supercomputing

sr. member

Activity: 278

Merit: 250

Quote from: bsunau7 on March 04, 2014, 08:13:33 AM

Quote from: bsunau7 on March 04, 2014, 04:06:28 AM

As I think Supercomputing mentioned, using larger numbers instead of 2310*n+97, say some number 200 bits long instead of 2310, could go a long way.
Regarding the metric, I proposed "range scanned / s @ diff", but "time per 2*32 nonce @ difficulty" would probably work just as well, just don't forget to adjust for the numbers you are skipping (4 out of 5 in the 2310 case). I agree that it's difficult to compare between different difficulties...

I never liked adjusting for skipping numbers, if I can code something which only one in a million numbers are considered p6 candidates (i.e skips over 1m numbers on average) than I shouldn't have to adjust my rate by a million. This is why the "range/s" @ difficulty seems like the best fit; clever (or not so clever) algorithms can be rated.

@SC I'd love to know how you maintain primordials of that size (you don't have to tell, but I've kept mine to less than 64bits to help with other parts of my code).

What do you mean specifically by "maintain"? I'm happy to spill the beans on my big primorial version, since my hacked verison of jh's is faster at this point. ;-)

-Dave

I use a static (calculate once and use many time) primorial based sieve and the sieve consumes quite a lot of memory. Right now I am using a hybrid approach just to keep the memory footprint down... I assume that a sieve using 200bit numbers is going to consume significantly more space than I am and as I am getting close to not fitting in memory it has my interested..

Nah, the trick is:
- Generate up to a certain size polynomial. I use 200560490130 or the next as my base primorial and store a vector of all 48923875 entries.
- Sieve *this* out up to the huge primorial in advance.
- Do your operations relative to the huge primorial.

But, as warned - the simple bitvector is still working better for me. Wink

Cool, that is what I am going but looking at your numbers I also pre sieve the possible p6 chains reducing my candidate count by ~128 times:

const uint64_t primorial = 7420738134810;
const uint32_t sexcount = 14243984;

Then I run a second scan inline to catch the next 2 dozen or so primes (lets me avoid gmp and use simple 64bit math) before I hit the expensive code. General idea was to get a list of candidates which could be feed into something else (GPU was the thought).

It is much faster than reference but it is reaching the limit of how fast I can push it.

I have probably made a horrendous error in my algorithm... but coding again was fun...

Regards,

--
bsunau7

My implementation is a little different from both implementations mentioned above. If fact, the overhead is much less than that of jh00's implementation. My implementation is almost identical to Kim Walisch's primesieve implementation with a few minor exceptions.

Please see Kim Walisch's description of wheel factorization if you would like to know exactly what I am doing:
http://primesieve.org/

surfer43

sr. member

Activity: 560

Merit: 250

"Trading Platform of The Future!"

wtf the ypool shares per second just doubled to 20 Angry

bsunau7

member

Activity: 114

Merit: 10

Quote from: bsunau7 on March 04, 2014, 04:06:28 AM

Quote from: bsunau7 on March 04, 2014, 04:06:28 AM

As I think Supercomputing mentioned, using larger numbers instead of 2310*n+97, say some number 200 bits long instead of 2310, could go a long way.
Regarding the metric, I proposed "range scanned / s @ diff", but "time per 2*32 nonce @ difficulty" would probably work just as well, just don't forget to adjust for the numbers you are skipping (4 out of 5 in the 2310 case). I agree that it's difficult to compare between different difficulties...

I never liked adjusting for skipping numbers, if I can code something which only one in a million numbers are considered p6 candidates (i.e skips over 1m numbers on average) than I shouldn't have to adjust my rate by a million. This is why the "range/s" @ difficulty seems like the best fit; clever (or not so clever) algorithms can be rated.

@SC I'd love to know how you maintain primordials of that size (you don't have to tell, but I've kept mine to less than 64bits to help with other parts of my code).

What do you mean specifically by "maintain"? I'm happy to spill the beans on my big primorial version, since my hacked verison of jh's is faster at this point. ;-)

-Dave

I use a static (calculate once and use many time) primorial based sieve and the sieve consumes quite a lot of memory. Right now I am using a hybrid approach just to keep the memory footprint down... I assume that a sieve using 200bit numbers is going to consume significantly more space than I am and as I am getting close to not fitting in memory it has my interested..

Nah, the trick is:
- Generate up to a certain size polynomial. I use 200560490130 or the next as my base primorial and store a vector of all 48923875 entries.
- Sieve *this* out up to the huge primorial in advance.
- Do your operations relative to the huge primorial.

But, as warned - the simple bitvector is still working better for me. Wink

Cool, that is what I am going but looking at your numbers I also pre sieve the possible p6 chains reducing my candidate count by ~128 times:

const uint64_t primorial = 7420738134810;
const uint32_t sexcount = 14243984;

Then I run a second scan inline to catch the next 2 dozen or so primes (lets me avoid gmp and use simple 64bit math) before I hit the expensive code. General idea was to get a list of candidates which could be feed into something else (GPU was the thought).

It is much faster than reference but it is reaching the limit of how fast I can push it.

I have probably made a horrendous error in my algorithm... but coding again was fun...

Regards,

--
bsunau7

dga

hero member

Activity: 737

Merit: 511

As I think Supercomputing mentioned, using larger numbers instead of 2310*n+97, say some number 200 bits long instead of 2310, could go a long way.
Regarding the metric, I proposed "range scanned / s @ diff", but "time per 2*32 nonce @ difficulty" would probably work just as well, just don't forget to adjust for the numbers you are skipping (4 out of 5 in the 2310 case). I agree that it's difficult to compare between different difficulties...

I never liked adjusting for skipping numbers, if I can code something which only one in a million numbers are considered p6 candidates (i.e skips over 1m numbers on average) than I shouldn't have to adjust my rate by a million. This is why the "range/s" @ difficulty seems like the best fit; clever (or not so clever) algorithms can be rated.

@SC I'd love to know how you maintain primordials of that size (you don't have to tell, but I've kept mine to less than 64bits to help with other parts of my code).

What do you mean specifically by "maintain"? I'm happy to spill the beans on my big primorial version, since my hacked verison of jh's is faster at this point. ;-)

-Dave

I use a static (calculate once and use many time) primorial based sieve and the sieve consumes quite a lot of memory. Right now I am using a hybrid approach just to keep the memory footprint down... I assume that a sieve using 200bit numbers is going to consume significantly more space than I am and as I am getting close to not fitting in memory it has my interested..

Nah, the trick is:
- Generate up to a certain size polynomial. I use 200560490130 or the next as my base primorial and store a vector of all 48923875 entries.
- Sieve *this* out up to the huge primorial in advance.
- Do your operations relative to the huge primorial.

But, as warned - the simple bitvector is still working better for me. Wink

dga

hero member

Activity: 737

Merit: 511

Quote from: ?? on ??

I also made some more optimized changes to my modded xptminer build to get even more 4ch/s for comparsion with dga's new release.

static linux bins for different arch you can grab here: http://go.ispire.me/1vo
Fee: 2%

Did you follow the "sieve all six to 50k" guideline? I'd encourage you and other miner creators to be very clear about the size of the all-six sieve used -- we all benefit from having blocks be found at a fair rate.

I ask because a miner that violates that will be easy to detect and penalize server-side - so I'd caution people to always make sure their miners are playing by the game.

bsunau7

member

Activity: 114

Merit: 10