Pages:
Author

Topic: Fast parallel block validation without UTXO-index (Read 4668 times)

staff
Activity: 4242
Merit: 8672
This is semantics.
It isn't-- because the speedups you conjecture creating already exist.

You propose that your design moves accessing the UTXO data out of the critical path for block validation, on the basis that you will have already validated the transactions and so don't need to access that data.

Bitcoin Core does not have UTXO accesses in the critical path in the same case and doesn't write in the critical path either. So it would not see the benefit you expect for this reason. (not that it wouldn't potentially have benefit, just not the benefit you are suggesting at least not for the reason you suggest)

Cheers
jr. member
Activity: 38
Merit: 18
I don't understand the specifics of the code, so can you enlighten me? What you're saying sounds like semantics: "It's after in separate batch steps" sounds like it is still done when a block validates,

It's not semantics. When validating blocks the vast majority of the time the UTXO set is not used by bitcoin core; because the UTXOs are in the cache already (due to mempool) and nor are the result of the block's changes written out-- they're buffered until the node flushes-- and very often never get written at all because the newly created outputs are spent by a later block before the system flushes them.

OP suggests that the structure he proposes reduces the delay at the time a block is processed compared to Bitcoin Core's UTXO database, but it shouldn't-- at least not for that reason-- because the database is only used at block processing times in the exceptional case where a transaction shows up by surprise.

This is semantics. I am aware that Core uses a cache to make most UTXO read and writes faster, but I would still call them UTXO read writes (or CoinView read/writes?), and as I understand, they still need to be handled sequentially.

Any reasonable database will attempt to use RAM for frequently used items, regardless of whether this is managed by the application, by a dedicated database component or the OS.

staff
Activity: 4242
Merit: 8672
I don't understand the specifics of the code, so can you enlighten me? What you're saying sounds like semantics: "It's after in separate batch steps" sounds like it is still done when a block validates,

It's not semantics. When validating blocks the vast majority of the time the UTXO set is not used by bitcoin core; because the UTXOs are in the cache already (due to mempool) and nor are the result of the block's changes written out-- they're buffered until the node flushes-- and very often never get written at all because the newly created outputs are spent by a later block before the system flushes them.

OP suggests that the structure he proposes reduces the delay at the time a block is processed compared to Bitcoin Core's UTXO database, but it shouldn't-- at least not for that reason-- because the database is only used at block processing times in the exceptional case where a transaction shows up by surprise.
legendary
Activity: 1120
Merit: 1003
I think what tomtomtom7 is trying to say is that the common approach (used not only by core) is that even though the node verifies the transactions before a block is announced, the actual moment when the UTXO database is updated (and the changes committed to it)  is during the block's validation.
That is what I understand him to be saying, but it isn't true for Bitcoin Core. For Core, the moment when the UTXO database is updated/and changes committed is _not_ during block validation.  It's after in separate batch steps which apply big batches of changes at a time; specifically to mostly get interacting with the database out of the critical path.

This is also why if you shut your node off uncleanly you'll see it going and reprocessing dozens of blocks-- their effects hadn't been applied to the database yet.

I don't understand the specifics of the code, so can you enlighten me? What you're saying sounds like semantics: "It's after in separate batch steps" sounds like it is still done when a block validates, just maybe not part of a certain function. I mean, it has to be done before the next block anyway. Is the UTXO updated before the node officially validates the block??

It sounds like this new way validates the TX as they happen, and this saves time later. I wish people could be a little more specific in a tech sub-forum -like with code snippets.
staff
Activity: 4242
Merit: 8672
I think what tomtomtom7 is trying to say is that the common approach (used not only by core) is that even though the node verifies the transactions before a block is announced, the actual moment when the UTXO database is updated (and the changes committed to it)  is during the block's validation.
That is what I understand him to be saying, but it isn't true for Bitcoin Core. For Core, the moment when the UTXO database is updated/and changes committed is _not_ during block validation.  It's after in separate batch steps which apply big batches of changes at a time; specifically to mostly get interacting with the database out of the critical path.

This is also why if you shut your node off uncleanly you'll see it going and reprocessing dozens of blocks-- their effects hadn't been applied to the database yet.
legendary
Activity: 2053
Merit: 1356
aka tonikt
I think what tomtomtom7 is trying to say is that the common approach (used not only by core) is that even though the node verifies the transactions before a block is announced, the actual moment when the UTXO database is updated (and the changes committed to it)  is during the block's validation.

Whilst his solution updates (appends)  TXO database while verifying a transaction in the first place. Later,  during the block's validation, it's just logging  the information of which of the outputs from the TXO database has been spent by the block.
staff
Activity: 4242
Merit: 8672
This in contrast to Core's model which needs sequential reads and writes to the UTXO index, even if all scripts are already verified.
This is not how Bitcoin Core works. No access to the disk normally at all happens when a block is verified except for transactions that have not yet been seen, reading or writing.
I am not talking about disk reads and writes, I am talking about UTXO index reads and writes.
There are no UTXO index reads or writes in the normal case (transaction recently seen.) in Bitcoin Core.
jr. member
Activity: 38
Merit: 18
This in contrast to Core's model which needs sequential reads and writes to the UTXO index, even if all scripts are already verified.
This is not how Bitcoin Core works. No access to the disk normally at all happens when a block is verified except for transactions that have not yet been seen, reading or writing.

I am not talking about disk reads and writes, I am talking about UTXO index reads and writes.
staff
Activity: 4242
Merit: 8672
This in contrast to Core's model which needs sequential reads and writes to the UTXO index, even if all scripts are already verified.
This is not how Bitcoin Core works. No access to the disk normally at all happens when a block is verified except for transactions that have not yet been seen, reading or writing.
jr. member
Activity: 38
Merit: 18
So you basically extract all the transactions from the blocks and store them in a separate files (transactions1+transactions2), giving each transaction a unique sequencional "numeric index".

Then you have the index file (tx-index) where each different TXID points to the "numeric index" - that's actually your equivalent of UTXO-index, except that it includes also spent transactions.

And then you have the spend-index (addressed by the "numeric index") which basically tells you whether a specific output of a specific transaction has been spent.

Is that correct?

Yes. Though you leave out the spend-tree.


First of all, you did not get rid of UTXO-index.
You still have it.
You just extended it into TXO-index meaning that you index not only unspent transactions, but also the spent ones.

Now, how is it possible that your code performs better than the leveldb solution used by core..?

It has nothing to do with any "fast concurrent spend tree".
It has all to do with the (U)TXO index itself.
You use a hashmap as the index, which is much faster than the index used by core's leveldb engine.
But it also takes much more system memory.

I know, because I also use hashmap based UTXO index in my gocoin s/w.
So you don't have to tell me that it's much faster.
But it comes at the cost of the memory - you can't run this with e.g. 2GB of RAM.
From my personal experience I'd say you need at least 8GB.
And because you index both; spent and unspent transactions, you need even more memory than the UTXO-index solution, which you are trying to beat.

The bottom line is: no way this can perform better than a simple hashmap based UTXO-index.

The important part is to split base load transaction validation (when a transaction comes in) with peak load order validation (when a block comes in).

For the first, this design is in itself not faster as it indeed uses an index which includes spent outputs (though it can be pruned to be even more similar to the UTXO). This is also much less relevant because if no block is coming in, neither Core nor Bitcrust are needing a lot of resources.

For the latter, when a block comes in, the output scripts are not needed as they are already verified. The only thing that needs to be accessed is reads to the transaction index which only maps hashes to file offsets. And reads and writes to the spend tree and spent index which are very compact and concurrently accessible.

This in contrast to Core's model which needs sequential reads and writes to the UTXO index, even if all scripts are already verified.

Essentially Bitcrust simply ensures that the work and resources needed when a block comes in are minimized.
legendary
Activity: 2053
Merit: 1356
aka tonikt
And because you index both; spent and unspent transactions, you need even more memory than the UTXO-index solution, which you are trying to beat.

The bottom line is: no way this can perform better than a simple hashmap based UTXO-index.

Let me add some numbers here.

At this moment (block 461402) a typical UTXO index has 20438271 records (transactions) - that's 20 millions.
In the meantime a TXO index used by your solution needs more than 210 millions of records.
https://blockchain.info/charts/n-transactions-total
legendary
Activity: 2053
Merit: 1356
aka tonikt
So you basically extract all the transactions from the blocks and store them in a separate files (transactions1+transactions2), giving each transaction a unique sequencional "numeric index".

Then you have the index file (tx-index) where each different TXID points to the "numeric index" - that's actually your equivalent of UTXO-index, except that it includes also spent transactions.

And then you have the spend-index (addressed by the "numeric index") which basically tells you whether a specific output of a specific transaction has been spent.

Is that correct?

I guess that means 'yes'.

In which case I can give a more accurate feedback.

First of all, you did not get rid of UTXO-index.
You still have it.
You just extended it into TXO-index meaning that you index not only unspent transactions, but also the spent ones.


Now, how is it possible that your code performs better than the leveldb solution used by core..?

It has nothing to do with any "fast concurrent spend tree".
It has all to do with the (U)TXO index itself.
You use a hashmap as the index, which is much faster than the index used by core's leveldb engine.
But it also takes much more system memory.

I know, because I also use hashmap based UTXO index in my gocoin s/w.
So you don't have to tell me that it's much faster.
But it comes at the cost of the memory - you can't run this with e.g. 2GB of RAM.
From my personal experience I'd say you need at least 8GB.
And because you index both; spent and unspent transactions, you need even more memory than the UTXO-index solution, which you are trying to beat.

The bottom line is: no way this can perform better than a simple hashmap based UTXO-index.
legendary
Activity: 2053
Merit: 1356
aka tonikt
I think I get it. Smiley

So you basically extract all the transactions from the blocks and store them in a separate files (transactions1+transactions2), giving each transaction a unique sequencional "numeric index".

Then you have the index file (tx-index) where each different TXID points to the "numeric index" - that's actually your equivalent of UTXO-index, except that it includes also spent transactions.

And then you have the spend-index (addressed by the "numeric index") which basically tells you whether a specific output of a specific transaction has been spent.

Is that correct?
legendary
Activity: 2053
Merit: 1356
aka tonikt
I essentially transform transactions and outputs to a "hash" with:

Code:
hash(tx) = tx.filepos / 16
hash(output) = tx.filepos / 16 + 1 + output-index
(guarded for exceptional overflows by a (now empty) map of exceptions for if output-count +1> tx-size/16)

And use the "hash" as a direct index in the bit-index.

The spend-index represents the spent state a few blocks from the tip; a set bit at a location means the transaction exists there or an output is spent there. When the spend tree is searched and reaches the position spend-index, a positive lookup for transaction and a negative lookup for  output is enough.

The spent index grows with 1 bit per every every 16 bytes, or 1/128th of the blockchain size. Just like with the spend-tree, mostly the end is accessed, so the start can be kept on disk.

Code of the spend-index itself is pretty simple here: https://github.com/tomasvdw/bitcrust/blob/master/src/store/spend_index.rs

I honestly can't say that I understand half of what you've said.
But I'm trying.

Let's say you receive a new transaction and you have to verify it.
All you know is that it spends one output described as TXID-VOUT.

Correct me if I'm wrong: first you go through the last X blocks up the spend tree trying to find TXID there.
But say the TXID wasn't there - so then you have to go to the spend-index, right?

Now, how do you look for the TXID inside the spend-index?
How do you transform the TXID into the hash?
jr. member
Activity: 38
Merit: 18
Oh, ok. Sorry, I didn't get the spend-index idea at first.

Can you please say something more about how it actually works inside?


I essentially transform transactions and outputs to a "hash" with:

Code:
hash(tx) = tx.filepos / 16
hash(output) = tx.filepos / 16 + 1 + output-index
(guarded for exceptional overflows by a (now empty) map of exceptions for if output-count +1> tx-size/16)

And use the "hash" as a direct index in the bit-index.

The spend-index represents the spent state a few blocks from the tip; a set bit at a location means the transaction exists there or an output is spent there. When the spend tree is searched and reaches the position spend-index, a positive lookup for transaction and a negative lookup for  output is enough.

The spent index grows with 1 bit per every every 16 bytes, or 1/128th of the blockchain size. Just like with the spend-tree, mostly the end is accessed, so the start can be kept on disk.

Code of the spend-index itself is pretty simple here: https://github.com/tomasvdw/bitcrust/blob/master/src/store/spend_index.rs
jr. member
Activity: 38
Merit: 18
I see the UTXO approach actually as a left over from the architecture before Compact-Blocks/XThin.

If all transaction are pre-synchronized,

Point of fact: transactions being validated in advance has always been true and is completely orthogonal with compact blocks.  Bitcoin has also directly exploited this fact via signature caching (which your benchmarks disables, btw, because you use the undocumented blocks only mode) since applying Satoshi's patch for it in May 2012, and further with the input cache since July 2012.

I've seen Peter R make the same argument a couple times and I've asked him why he presents it that way but AFAIR he has never responded to one of those messages.



I understand Core  pre validates signatures with our without Compact Blocks/XThin. I also understand the benches aren't comparing this, but only compare full block verification which I explain on that very page.

But without Compact Blocks/XThin, optimizing for order comparison is a a lot less relevant because of the high bandwidth cost.
staff
Activity: 4242
Merit: 8672
I see the UTXO approach actually as a left over from the architecture before Compact-Blocks/XThin.

If all transaction are pre-synchronized,

Point of fact: transactions being validated in advance has always been true and is completely orthogonal with compact blocks.  Bitcoin has also directly exploited this fact via signature caching (which your benchmarks disables, btw, because you use the undocumented blocks only mode) since applying Satoshi's patch for it in May 2012, and further with the input cache since July 2012.

I've seen Peter R make the same argument a couple times and I've asked him why he presents it that way but AFAIR he has never responded to one of those messages.

legendary
Activity: 2053
Merit: 1356
aka tonikt
Oh, ok. Sorry, I didn't get the spend-index idea at first.

Can you please say something more about how it actually works inside?

Quote
This is a very compact bit index with one bit for each transaction and one for each output
jr. member
Activity: 38
Merit: 18
Thanks for your feedback.

The UTXO index is there for a reason.
Personally I'd rather see some work on improving the index and the UTXO-db itself, rather than getting rid of it, which I think is not the right way to go.

I see the UTXO approach actually as a left over from the architecture before Compact-Blocks/XThin.

If all transaction are pre-synchronized, we can distinguish base load script validation (when a tx comes in) and peak load order validation (when a block comes), the UTXO-index doesn't make all that much sense.

This is because for script validation, when output scripts are needed the distinction between spent and unspent doesn't really exists as it can be spent on one branch and not spent on another. Using a transactionally maintained UTXO index there   only complication things (with reorgs).

And for order validation, the output scripts are not needed at all.

Looking at the "performance results" one must remember that core (and the leveldb solution) use a limited amount of system memory, while this solution seems to be using all of it.

I don't really think this is the case. Bitcrust uses memory mapped pages which show up as lots of virtual memory, but are swapped by the OS as needed. If instead you just traditional disk IO, the data is cached by the OS as needed.

Either way, MRU pages are kept  in memory.

Also, when the core gets to verify a transaction that is trying to spend a non-existing input, it won't consume so much resources. This is a potential DoS problem.

It is never needed to scan the entire chain as after a few blocks the spend-index catches a seek and performs a direct lookup. There may still be potential DoS problems, but not with spending a non-existing output.
legendary
Activity: 2053
Merit: 1356
aka tonikt
As much as I like out of the box approaches to topics that interest me, to be honest I don't think this one can improve the overall speed of "common operations when validating transactions and blocks".

The UTXO index is there for a reason.
Personally I'd rather see some work on improving the index and the UTXO-db itself, rather than getting rid of it, which I think is not the right way to go.

Looking at the "performance results" one must remember that core (and the leveldb solution) use a limited amount of system memory, while this solution seems to be using all of it. Also, when the core gets to verify a transaction that is trying to spend a non-existing input, it won't consume so much resources. This is a potential DoS problem.

Sorry, I don't mean to be a mean person - it's just my cold professional opinion that it won't work well for you, for an actual real bitcoin node.
I wish you all the best with your project, @tomtomtom7, but I don't think this is a proper approach to optimise the performance.
Pages:
Jump to: