Hi Christian,
Whilst I applaud you for once again doing some amazing work to get this working, it's actually going to threaten one of the foundations of this coin - the CPU only mining channel. Now that you have made this breakthrough, would you be willing to help the developers modify the algorithm to make it less viable to have the GPU's assist mining on the CPU channel? At the same time, you could use the work you have done so far to assist in building the GPU miner for the GPU channel?
This high end mining rig uses 6-7 times the power of a Single core i7 in a power efficient mainboard. So overall the efficiency gain isn't really that great. It heats my entire living room. Some of the optimizations we found are also applicable to the CPU miner.
The computations of the base remainder needs to be taken out of the for (j...) loop and an appropriate offset can be added during the sieve generation instead. The fermat check for Position p1 is redundantly done in the Supercomputing miner. Minimize overall run time by balancing sieve efficieny vs. the time spent in the Fermat test (adjust number of primes going into the sieve accordingly). During sieve Generation, segment the sieve into parts that fit into the CPUs L1 cache (for the smallest primes in in particular). Go fix this first before complaining about our GPU superiority :-)
When I port this algorithm to my Xeon Phi card, it might even run circles around my GPUs... Who knows.
You'd have to abandon the concept of primality searching to make the CPU channel ineligible for GPU assist. There are things in Computer science where GPUs are really bad. Do the research before advertising a CPU only mining opportunity. :-)
We're not going to release this miner anytime soon and we're not taking this to GPU farms or AWS. The overall impact on this coin will be minimal. We might take our skills to other prime based coins (Riecoin etc) because these actually have a bit of trade value.