Something we must fight against if we want Bitcoin to retain it's utility in the future. No one uses a system of money with fungibility and privacy as bad as a bitcoin that has lost it's properties.
Of course and I dont believe anything I have done would cause things to get any worse. Just trying to create a scalable framework based on bitcoin that can also be privacy enhanced.
We do not store the blockchain in a database in Bitcoin Core.
Ok, so storing the raw network data in a raw file isnt a database, my point was that to do most all RPC, the DB is involved and I think it is fair to say that to most people the RPC path is what is used.
txindex=1 is not very compatible with scalablity; you end up with an ever-growing index. It makes some operations faster, but at a considerable cost.
In Bitcoin core reindex spends a fair amount of time rehashing things to check data integrity, because later it is assumed to be true. It could easily be made very fast-- but at the cost of additional indexes. Right now the marginal cost of a full node wallet that supports arbritary key import is about 65GB and rapidly growing. (Pruned vs non-pruned plus txindex).
Certainly txindex=1 adds overhead, but the method of creating unchanging bundle files each that has hashtables to find everything internally and where all unspents to the same destination are in a linked list, the overhead is contained within each bundle. I dont see why anything has to ever be verified again once it has been verified and put into a read-only file. For the paranoid, a hash of each bundle file could be made and verified before using it.
The tracking of the status of unspents is the only global data that needs to be updated and it is a write once for each entry.
Over time, an operation might need to scan all bundles, but I just got benchmarks on a 1.4Ghz i5, taking 2.5 milliseconds to scan 1900 bundles (for a coin using 500 block bundles). So it is not very fast, but would be fast enough for most use cases and comparable to RPC overhead where things are serialized with system locks. And this with the ability to find all utxo for any address. Also note the parallel design makes multithreading ideal and speedups using multiple cores would be trivial to do.
If even more performance is needed, each speed boosted address can just keep things cached.
Without sigs, my data set for txindex=1 is about 25GB uncompressed and 15GB compressed. I am thinking of defaulting to not storing the sigs, as once a spend is validated and the bundle is marked as sigs and inputs verified, I dont see so many cases where the sigs are needed anymore. Certainly not at the cost of doubling the total space.
If it's purely local, you could probably always number that way, so long as you had code to fix up the numbering during reorg.
What number of blocks is beyond the common sense reorg danger? I know theoretically very long reorgs are possible, but from what I have seen it is not so common for any big reorg. I think by setting a delay of N blocks before creating the most recent bundle, then the odds of having to regenerate that bundle would be very low. Since it takes about a minute to regenerate it, it isnt a giant disaster if it has to, but best to default things to almost never has to do it.
Then by keeping the full sized form of the data for N blocks, I think the advantage of compactness is achieved without any negatives [with the caveat that any reorg past the N blocks requires to regenerate the bundle and everything past it], and these are all purely local optimizations.
I was wondering if it was possible to get a networkservices bit? That would indicate a node is available for tx based queries for other nodes. Since txindex=1 comes for free I figure it would be a good thing to make available for other nodes, in addition to sharing hashes about the bundles. This would make things not purely local, but it would operate orthogonally to existing peers and just help bootstrap new nodes and lightweight nodes, but it would only be affecting other nodes that have this bit set. I saw in the docs to say to ask about such bits or just start using one. Not sure the exact process to get a bit allocated
There is an additional test harness built from BitcoinJ that simulates a bitcoin network and puts the node through its paces. It's called blocktester. Look at the travis ci configuration in Bitcoin core to get an idea of the tests that are automatically run on it.
Fantastic! this will help a lot. thank you
Standard warnings about the near impossibility of writing new code which is consensus compatible with other code apply-- if you don't find some new surprising behavior in the network that we don't have tests for, you're almost certainly not testing hard enough to have a chance at it.
I know. Unfortunately I am used to doing the impossible stuff and external events forced me on this path in the last months. I started the iguana iteration last November.
The first production iteration wont be a mining enable version, my target audience (mass market) isnt really having so many ASIC's handy. So by following the network consensus it avoids having to recreate all the intricate consensus rules for all the variations in the installed base.
I can recommend everyone to write their own bitcoin core equivalent. It sure is a good way to learn how it all works
James