Setup...
- Create a database to hold your bitcoin addresses and other data, tableX, columnY, columnZ, columnZa,etc
- Index Y, Z
- Use a language that allows the max size of an array by memory (c++, etc)
- Store all your millions of bitcoin addresses into columnY.
- Create an array (A) in memory to hold all your stored addresses (10m, etc).
- Read your database of bitcoin addresses, hash the addresses (something fast and small, like fnv1a64 to reduce of the number of bytes and still have a reasonable collision acceptance 2^64), if your're going to have more than 2^64 bitcoin pattern addresses, just use another small hashing algorithm. Note that it doesn't have to be cryptographic hash)
- For each of your addresses, store the hash of the address into the array (the number of bytes should ideally be less than the length of the bitcoin address). Also store the hash of the address back into your database (if it already doesn't exist), columnZ.
Process the blockchain...
- Look in the database to see what block was processed last, or if first run, block 0
- For each block, read in all the addresses, hash them and store them into another memory array (B)
- Search through array (A) for values in (B)
- If you have a match, append the block number into columnZa (block0:block22:block88842:etc)
store the last block searched into the database
I'd use an OO language, 64 bit based libraries.
Your DB is going to be doing more reads than writes, so optimize accordingly.
You're going to hog up a lot of memory on the array where you have millions of records, so go for 128GB or higher
Don't use a virtual server, use a physical server where you can optimize your CPU configuration
That's a lot of really good information. Thanks for that.