Now I get your idea.
You should search for cycles of 250,000 keys to cover the entire database with a very low margin of error, but you can explore with fewer keys: 150 , 100 , 50.
150 , 100 , 50. Thousands, right?
Here is what I understand
Your Ultra-Lightweight Database is more like a reference for some Patterns and their offsets related to next pattern right?
So each time that we need to query if some Random Key X is in our database we need to calculate some 50K to 200K Keys next to our target and looking for those Patterns
If some Patterns match with out database (Pattern and offset against other patterns in our database), we can easily calculate the value of that X Key.
As far I see to get this database of distances and patterns small you are paying with more process to find those matches.
If you do 4 cycles per second, you will have 12,000,000,000 * 4 / Ks. By increasing the database, you will have even more speed.
You need a point of comparation, also you need to calculate what is the point where your approach is better than exising methods.
For example, in the test that I send to you using my approach, bloom filter + BSGS the speed was near 16 Petakeys/s ( 15,904,076,907,774,986 keys/s) Using 8GB or RAM
But the current version of keyhunt did 26 Petakeys/s with 4GB of RAM, if I use 8 like the other Test it would do some 50 Petakeys/s
And my program is slow compared it against the kangaroo (kTimesG's version) for CPU than can solve keys under 80 bits in 1 minute.
So I think that we need to calculate how many keys do you need to have in your database just to become competitive against the existing tools.
First thinks that you need to do is measure how many time a CPU or GPU can calculate the 200K keys sequentially and search for the patterns.
And then calculate how many keys need the database to reach some Exakey/s, Zettakeys/s o Yottakey/s.
One good think Good about your approach is that the database can grows on demand you only need to keep calculating keys and search the patterns once that you found one just append it to the DB.
To be honest I think that It would be difficult beat to kangaroo but with enough DB it may be possible.
Lets do some calculation:
(2**80) / 60 seconds = 20148763660243819578436
Suppose that you take 1 Second in determine if the pattern is in our database or not
That would mean that you need a database with at least 20148763660243819578436 elements
>>> 2**75 <= 20148763660243819578436
False
>>> 2**74 <= 20148763660243819578436
True
So its something between 2**75 and 2**74, well those are just rounded numbers real database maybe near 2^70 or something low
But one thing comes to my mind, we can't solve a 2^65 with pure brute-force alone.
How you plan to build a database that rounds the 2^60 items to compete against kangaroo.
If we have enough computer power to build a 2^60 or 2^70 database we may have some chances to be able to try something against 135.
Well those are just round numbers that comes to my mind in this moment.
Yes, in 50,100, 150 I'm talking about thousands.
Focusing on your tests with your approach:original bsgs 1,073,741,824 keys stored
new bsgs 146,028,888,064 keys stored
dividing 146,028,888,064 by 128 is 1,140,850,688.
I think you would have to be able to process more keys in the same time frame.
and that's why it becomes slower because of all the processes that involve the 128 additional processes.
to equal efficiency.
In this db 146,028,888,064 fit in less than 20mb
In my database with 120 million elements, the sequence rarely exceeds
200,000 keys between patterns. This suggests that 200,000 can be an average maximum value for key cycles and
the way patterns are packed allows saving large amounts of keys or their benchmarks, which is efficient in terms of storage
unlike methods with fixed computing power like Kangaroo, we can keep moving forward by adding more keys to the database
and gain a significant advantage in terms of scalability.
We can also create an additional mini database with subtractions and additions to your target and have each of those pubs calculate certain amounts of keys and patterns join to the main database.
Example:
pub-a -13636775
pub-b +1368754if the pattern is found and you get pub-a, you add 13636775 and you get the target pk.
if the pattern is found and you get pub-b, you subtract 1368754 and you get the target pk.
then you could have multiple reference points along the range which improves
the search system.
logic tells me that to achieve a favorable average for this database you should
be able to store more than 200k times the number of keys(I'm not entirely sure) that your current
database can handle (which in terms of space with this db is possible because in this
database 200k times more is not a significant increase compared to your current db)
but on the other hand in theory, there would come a point where the cost of 200k keys
would be profitable because you would only have to exceed the limit of equivalence in effort.
Now the complex thing is, what is the limit of patterns to store? I know that with Python it is
approximately 130kb for 120 million keys in a little more than 3 thousand elements.
What algorithm can this be integrated into? In a filter, how many searches can you do
per second on average? Can it be part of bsgs?
I would have to investigate what would be the most efficient way, because it is linked to how big
the database is, so will your speed.
In the end this database is related to scalability, and I think the barrier of added effort can be broken and improvements can be seen.