I am using this solution candidate skipping approach in my custom solvers for long years.
The idea is that it should not be looked at given key range as hexadecimal representation of number and instead it should be seen as set of '0..9' and 'a..f' characters, numbers and letters, a 'string'.
Considering the fact that BTC puzzle keys were generated using Deterministic wallet and not by human mind, not intentionally formed, it is very unlikely that private keys, hexadecimal strings of longer range puzzle keys will be numbers only ('0..9') or letters only ('a..f').
Such as 0x20124579147074121, or 0x3FCAFDCBEDCABEFDA in case of puzzle #66, no way.
In 17 characters long key range, that is randomly generated, it is so unlikely that you can put your hand in the fire for the fact that there are numbers as well as letters in range string of correct solution leading to private key of puzzle #66 BTC address.
So why to waste hardware device resources to compute something that will never lead to desired solution? There is no point for that at all.
Those private key candidates can be effectively skipped without any hashing computation, thus resulting in solution finding speed-up.
How much speed-up depends on amount of ruled out potential key candidates that will very likely never lead to desired solution and doing it by far more parameters than just simple numbers only ('0..9') or letters only ('a..f').
Choosing correct prediction is the most essential in the way that will minimize the risk of skipping the right solution.
Such as guessing that not only there will be at least some letters ('a..f'), but also how much of those, will there be at least one letter, two letters, three letters?
Will be some letters repeated? Or next to each other, such as 'FF' or 'AA', or in-between 'letter-number-letter', 'letter-letter-number-number' somewhere in private key hexadecimal string, etc.
Same for numbers.
Ruling out key candidates like that results in hardware computational speed-ups that are tens, hundreds or even thousands multiplies of the original speed of the device, but naturally depending on parameters greatly increase risk of skipping the right solution.
This is of course primitive approach just for sake of example.
I went further with it and made approach tunable by far more advanced parameters, probability and combination of prediction systems that resulted in significant speed-up per hardware device, but is keeping it in relatively safe area for risk of skipping the right solution.
Safety and approach being tunable to reach right balance between risk and speed-up.
It is definitely far more fun than plain brute-force of one-by-one scanhashing and much more effective when keeping within safe parameters.
Btw. I effectively implemented the same method to cryptocurrency mining in various mining algorithms where my mining kernels by fine-tuning probability using prediction systems approach algorithmically manipulate chances, move chances of finding nonce to miner side, skipping non-interesting nonce candidates without any hashing at all giving miner the house edge, fine-tuneable to balance risk and the right solution.
That results in nonce being found more often in contrast to old-fashioned one-by-one scanhash brute-force mining, thus overall faster miner, how faster depending on internal fine-tuning.
Bruteforce mining one-by-one scanhash is sooooooo boring, 'prediction mining' and skipping nonce candidates without hashing is fully according to my taste.
But well that is another story.