Hi all,
I have tried a couple of clients and libs but none of them allows smoothly generate balances all non-zero balances from the blockchain.
Some do fail on the latest block which is probably related to some format change, and some just throw 'segmentation fault' error and do not provide any additional data.
For instance, blcokparser tool only allows me to get all the transactions till block 488466. Does anyone know solution working out of the box?
Regards
Very good question, I think the first step for all bitcoin 'hackers' is having the database's
Snort is a good start, its bundled with brainflayer, but as you already known starting from genesis block 1 to N is easy, around 400k most parsers break
Lots of ppl wrote code in 2013, so my first 'advice' is work backwards, anybody can start at zero and bomb at +400,000 all the SHIT on github can do that, so ...
1.) get your full node running with TXINDEX set to 1, this gets you all the transactions decoded into script
2.) run your parser backwards from 530,000 ( where ever now ) and decrement, ... do the hard first, it only get easier
3.) addresses are the easy part, but you really don't want them, because the good stuff is the hashed public keys and r-values that can actually tell you stuff
4.) python is probably best cuz its easy to hack, and if your going to parse the entire btc blockchain from 5xxxxxx to 1, your going to be doing a lot of hacking
5.) don't bother with these databases, they all bomb out on the 200GB of required data, have many databases, I mainly just have .txt and csv, and then use bloom-filters as my datbase front-end that way all my querys have zero latency, filling the bloom with data is one time, and acquiring the data is what takes time, but once you have the data, its zero time making a decision
6.) sadly its probably going to be RPC with python all the way down hill and then work in JSON, the script is a pain in the ass as its not documented except in the C++ bitcoin core, and they break it on every version, thus if your running RPC at least you have a chance of reading all the data down to one
7. ) u say u want addresses, but once you have them you'll find them more than useless, most addresses hold no bitcoins, have never been used,
8.) U may want to look at snort and brainflayer they include the 'pristine' and all account with balance address list up to about 2015, that's a good place to start so you have something to look at now, pristine means blocks mined from day one, but never spent
9.) again addresses in themselves are rather worthless, what I find useful is to keep a running system that keeps a BLOOM-FILTER fed with all addresses that have a balance, once the balance goes to zero then that bloom-filter for that address is set to zero, thus I have an instant way to know for any address 1 to N ( base 10 ), I can tell u if that address has a balance today, this is useful, because in actuality your running on the memory pool looking at addresses and want to know how to update your bloom-filter for addresses
10.) most useful is deriving private-key/pairs & public-key, say in order of 10 million, and then watch your address bloom-filter to see if an address is used, then you know whether to run more software and go down that rabbit hole
I think I attempt to say more than is needed, just parsing is just in PYTHON
import pycoin/bitcoin ( your favorite shit coin lib on github that none work very well, and all were written 5+ years ago )
rpc=openrpc on port ( 127.0.0.1:8545 )
blkn=rpc.getblocknum('latest')
for blk blkn to 1: # yes we start with the most recent blocks and work back to dinosaur age
tx = rpc.gettransaction list from block
for all txid in tx ...
for all script in tx
if tx['value'] > 100M Satoshi THEN # U said high value right?
print address in script ( or write to file txt/csv )
Easy peezy, whats to say? nothing to it, ... the problem is it takes a long time, I mean 'getting your list is easy'
Sure you can read the raw blk000n.dat files N to 1 and read the raw script and get the data, but you will find a mess of spaghetti code that makes you insane, its not 'python like' to fuss with hexadecimal, and its no fun in C/C++ given the fact that the entire bitcoin legacy code from day one is a KLUDGE, hack, mess and sucks real bad
Well I make it sound sort of easy, addresses and value come in JSON, but when you want R/S & Public Keys, and hashed keys then you must decode the asm/hex script, which means you must read the C++ source cuz its the only place that documents the script 03/02/01/N/R,...
Besides Adresses in themselves are useless, most addresses you gather have no meaning, here I will give you some numbers
I parse the block chain from N to 1 ( 500k) blocks, about 4,000 transactions per block, and about 2-3 addresses per tx, so that 20 million addresses, of those maybe 5-10% are interesting and less than 0.1% have a balance ( remember ppl are told not to use same addresss twice )
**
I have said it before and I will say it again NONE of the CODE on github works, its all shit, and its all not maintained, as the kids like BUTERIN seem to get bored 1-2 years after they write their library and get 'famous', and then move on and never look back, and some of these libs are just out&out read-only non-maintainble, I have spents MONTHS trying to get ABE ( and many other of these so called bitcoin pasers databases ) to work just to realize that its hard-coded to a specific bTc fork, which means that is worthless if you want to parse 1-N
I wish to say there was a fast way to parse, and even ABE sure it works great 1 to 410,000, and then it stops and never works again, cuz the script is so convoluted, and the code is so tightly wound on early BTC data-structures that its all hopeless, SEGWIT and all these new TX scripts seem to have broken all the shit libs
I find that most of the time i have to parse my own script, that's why I have one parser for each task, and don't bother with one fits all, cuz its too much hacking, ( but its easy as shown above the writer you own parser to gather all high-value address on btc is only 10 lines of python )
One parser to get ALL the addresses, another to get all public-key and their hashes, and another to to get all the R&S values for EcDSA hacking, ... and other databases for other good stuff like dates and time for unusual transactions and all these different databases go into different bloom-filters and then in productions all my main code just uses the bloom filters,
I might add the bloom filters should be updated every 10 minutes, but actually earlier, cuz the early bird is always two steps ahead of consensus