Tbh, If i would try to crack the puzzle I would fix the OpenCL Bug and implement the various optimizations from different forks of BitCrack. Then I would modify the code and instead of incrementing the key, i would decrement the key. I mean... there are alot of people scanning the whole 2^63 - 2^64 Range and they already scanned about 10-20% of it from the beginning already. So scanning it again makes no sense. But I did not see a modification of BitCrack which implements decrementing the keys. So you would have a better chance to find the solution if you begin from the middle and increment the key or from the end and decrementing the key. Picking some random ranges in the big range as some people mentioned gives imho no actual benefit. In contrary: The Key could be in any subrange. And if you do it randomly, you would have to store the information which ranges you already scanned in your statusfile (--continue FILENAME). This file would just consume more of your VRAM and depending on the amount of subranges could reduce your speed. But if you would decrement the key you could still use the statusfile-logic as it would only store the information about where you stopped the last time and you could pick up from there very fast and continue decrementing the key. Especially because you mentioned you like to hold the search now and then for gaming and watching TV.
------------------
Regarding Code Optimizations:
I tried the whole night to understand how i can get more performance.
I have a Vega 56 and a Quadro P620 in my PC.
Under Linux my Vega56 has about 45 MKeys/s (with buggy clBitCrack) and my P620 has about 26 MKeys/s with cuBitCrack
Under Windows I have about 54 MKeys with cuBitCrack and clBitCrack did not really run...
So... I dont know why I have poor performance under Linux. I am not interested to further investigate this. So Windows it is.
On the other hand, It seems, that NVidia is also suggesting that lto is improving the performance
https://developer.nvidia.com/blog/improving-gpu-app-performance-with-cuda-11-2-device-lto/I tried alot to get it compiled in Linux, but I guess I would need to further investigate into this as I was getting compilation errors. But as I said, I will use Windows... So maybe there I can improve it.