As far as I understood it, the algorithm is split in two parts : building a sieve, and trying each factor stored in the sieve with three different tests.
Each part has a variable execution time (determined by the number of prime factors you can find).
Recent modification of the code (and Chemisist's proposal) put a timeout in these parts so that the execution time invested in a possible solution is capped.
By capping these parts, there is a chance to abort the testing of a valid solution.
Thus, a complex trade-off arises : invest more time in the current candidate, or jump to the next candidate after a given period of time.
Chemisist suggests an adaptative approach (I just skimmed through your code, I may have misunderstood), that's very interesting
However, IMHO, that does only marginally improve the chances of finding a solution (read block), especially as difficulty rises.
Why ? Because the distribution of prime factors is very very difficult to predict.
Play along with the timeout values, this can lead to multiply or divide your PPS by 10.
But that does not improve your chances of finding a solution.
Anyway, here is my advice : do not focus on the PPS, it is not reliable measure of performance.
Thumbs up to Sunny King (and his team) for designing this coin. The proof-of-work proposed is brilliant and very interesting to play with.
The high pps number is due to the very low hard cap on the time set to check the actual sieve that has been produced (it's set to 10 ms in the current master branch on github, line 372 in prime.cpp). So with the very short weaving time of whatever you decide to set, the sieve has a very large number of prime candidates, most of which satisfy the following check:
nPrimesHit++;
but many of which are not actually primes. Anyway, I'm currently testing my code against Sunny's on the testnet (with the large thread count issue potentially fixed, fingers crossed) to see which can find more blocks in 10 minutes on my T9300 laptop. Results to come shortly