f you use a 24bit hashtable and 2^30 keys for baby step:
You will have (in average) a list of 2^(30-24) = 64 items per hash.
Each items of this list contains the generator index and an additional part of the key.
typedef struct {
uint32_t b0; // LSB key (24 bit) + HSB Offset (8 bit)
uint32_t p; // LSB Offset
} ENTRY;
The hash is 3 byte of MSB of the key.
b0 is 3 byte of LSB of the key.
So we store 48 bits of the key.
p (offset) is the "generator index" corresponding to the key p.G
Then to access it , you compute the hash (one simple & operation) and you have direct access to this small list.
You can sort each list according to the part of key and you will need 6 operations (in average) to get the generator index.
I did like this in HashTable::Get().
Then as I don't store the full X value of p.G, i have some false collision, than i need to check by a computePublicKey.
A false collision happens every 2^(24+24 - 30) = 2^18 giant steps, that means that I need to perform an extra EC multiplication every 262144 giant steps.
According to the memory you have, you may need to adjust the HASH size, and the number of bit of the key stored in the entry in order to have a minimum of false collision.