Maybe you can look at it from a different angle then trying to use the same concept that is used for mining and the use of ASIC/FPGA for these tasks.
It does not have to be 256 bit integers, it only has to be 256 bit integers if you are trying to travel to your public point that belongs to your known 256 bit private key.
You will visit a lot of intermediate points, which all correspond to valid bitcoin addresses on your way to bit 256.
It's possible that you visited some point/address at for example bit 122 that had 100 BTC in it, but you will never know it if you don't look for it and just skip it and discard the data.
But make sure that you understand that the nearest public point, if you are not trying to solve a known 256 bit key, is not 256 bits away but just one bit away since it is K+1 or Q+1G.
So that is 1 bit and not 256 bits.
Make sure you understand this so that you can use it to optimize your ideas, when it comes to collisions it can also be modified from SECP256K1 to SECP1K1 so realize that its not always 256 or an average of 128 point calculations there is a difference between talking about a specific point, and talking about any point.
A specific point will need to go through the entire loop of 256 bits, but any point is just any point on which you can throw any number of *G's.
For example if the (first) last bit of your private key is a 1 then this will rule out the entire key space below that number and it's only one step to get into that range, not 256 or 128
Sorry but i'm not sure to fully understand what you said about bits (1 vs 256).
I understand about the kangaroo algorithm that it forces you to perform a group addition at every jumps of every kangaroo of the herd (wild or tame)
in affine coordinate you have to calculate the next jump with this function (whatever the range size)
X(i+1)=X(i)+deterministic_random_walk[G,2G,4G...]
m = (y1 - y2) * inverse_mod(x - xG,P)
xQ = pow(m,2,P) - x1 - xG
yQ = y1 + m * (xQ - x1)
Q = (xQ %P -yQ %P)
knowing that P is about 2^256, you cannot reduce the size of the integers used in this function
below 256bits because the result coordinate of point Q will be (about randomly) between
(x,y) = ([1 -- 2^256],[1 -- 2^256])
and u need all this information to compute the next point (whatever DP bits used)
it is this costly addition function that i want to implement in my hypothetical ASICs of FPGA's chips (parrellised).
Storing the X coordinates (masked with DP) of every jump (calculate with the dedicated devices) in a hash table could be achieved with a centralized computer with a lot of ram.