Good afternoon, I´m positive about only needing to calculate 2^23 coordinates if you use the BSGS algorithm and pre compute 2^41 points.
The set I propose is like this:
You precompute points from k =1 to k=2^41 with a step of 1 (small step sequentially), store the last 64 bits of every x coordinate in a database.
Because of the symmetry of the secp256k1 curve over the x axis, you can set the big step to 2*2^41= 2^42. Every +k point is equivalent to -k.
If the target coordinate of the unknown K is near 2^66 (worst case), first you subtract the point (x,y) = 2^65, so now your range is from 0 to 2^65.
Then beginning from your unknown (x,y) you will subtract sequentially (x,y)= 2^42 * n times, checking each step x coordinate against the x coordinates on the database.
Because of symmetry, k = (x,y) and -k = (x, -y) have the same x coordinate, so computing the positive x's is equivalent to compute the negative x's.
If you jump 2^23 jumps *2^42 distance, you get 2^65 total distance, and at most 2^23 you have to land in the database in a deterministic way.
I was able to discard the first 2^65 Keys in puzze 130 , with my own implementation of the BSGS algorithm in python on CPU only, and a 2TB database.
I see it as a reverse BSGS because you go backwards from the unknown K to k = 0 in big jumps.
I understand Shank's algorithm is at most 2*sqrt(N) operations with no use of symmetry.
If you use symmetry, every point is like calculating 2, and now the total number of operations should be at most 2*sqrt(N/2) = sqrt(2)*sqrt(N) approx. 1.4 sqrt(N) operations.
Instead of calculating 2^32 for the database and 2^32 points, you trade more storage and more operations now, for less operations later.
My very slow implementation in 1 CPU and python manages to calculate 200.000 coordinates/sec, I have 12 cores, I make 2.4 million keys/second.
I believe I could solve it in less than 4 seconds, after the public key is made public, if I manage to store a 16 TB database.