Is your miner publicly available?
I can now say with absolute certainty, that if I had my current solo miner implementation on launch day, I would have solved the first 576 blocks in under 15 minutes using only 20 Intel E5-2697 v2 CPUs Was the launch really all that fair? I guess we will never know.
The implementations from gatra, dga, and jh00 will be just as fast if they used slightly larger primorials and AVX2 instructions for multiple-precision arithmetic. So in a few weeks from now, the performance advantage from all of the miners should be negligible.
I will begin work on an Nvidia GPU (sm_20 - sm_35) version this weekend. Also, I plan on releasing that version in binary format with a small developer's fee if the performance gain against a high-end CPU is significant. Sorry, I do not know how to program AMD GPUs for HPC related to multiple-precision arithmetic (Karatsuba multiplication + Montgomery Reduction).
Credit where due - I really wanted to make an alternate, pre-computation and memory-heavy approach to this work, but I've given up. It's tough to beat straightforward sieving, and you (@supercomputing) were right - I ended up just going to a big primorial with a highly optimized sieve. I haven't rewritten any of the routines currently handled by gmp (and my GMP is probably compiled horribly), but my pool miner is now somewhere in that ballpark. I've solved 14 or so blocks in 24 hours using 14 machines, which is about in the same range you mentioned for your miner. Block 17920 is an example, if you're curious to extract my primorial.
I'd still love to find a more satisfying way to crack this nut, though. But for now, the sieves and fast prime tests have it.
Gatra, thank you for creating Riecoin. It's terribly fun, has one of the more intellectually interesting proof-of-work cores (as does XPM, in fairness), and I hope it starts producing record prime tuples one of these days. I'm planning on proposing high-speed RIC sieving cores as a class project for our parallel computing class next week. Should be fun!
-Dave
Congratulations, I see that you've released your YPOOL miner and I believe that the RIC network difficulty will be at its highest this week.
Last weekend, I fully implemented my miner using x86-64 assembly code in order to have a fair comparison with my GPU implementation. On the same hardware, with 20 Intel E5-2697 v2 CPUs total, I can now solve 24-32 blocks per day (@1300-bit difficulty) when taking the orphaned blocks into account. I currently do not have any server hardware that can support the AVX2 instructions which would have resulted in a 30 percent increase in performance.
Congratulations again - I will be back to this hobby late April or early May.
Sorry for the typos, I am on my smart phone as usual.