I don't think that they can be effectively searched in parallel. You have to divide each pubkey and check it with the babysteps. So not only do you need to make very expensive global memory lookup (GPU has slow global and super fast local memory) and load each key.
So if you would search multiple keys you would effectively reduce the performance by them. Like 10 keys in parallel means 10 times slower. 2800 keys means 2800 times slower.
I am sure there will be a limit, but it is probably in the millions. For similar programs, it usually caps out at around 30 million addresses, pubkeys, xpoints...
a.a.
Have you ran this program yet? It's just that some of your answers make it seem like you have not ran it at all.
The program does not check keys in parallel, it runs range with one pubkey, once finished, it moves to the next, until the last pubkey has been checked for that specific range.
So multiple xpoints checking (in parallel) is only possible with KeyHunt-CUDA.