Pages:
Author

Topic: An optimal engine for UTXO db - page 2. (Read 3171 times)

legendary
Activity: 1708
Merit: 1049
March 07, 2016, 02:17:48 PM
#22
There are some intel chips with 128MB L4 cache (shared cpu/gpu). Obviously the further "away" it is from the cores, the slower it will be, but still you'd expect it to be at least twice as fast as the ram.
legendary
Activity: 1176
Merit: 1134
March 06, 2016, 03:22:06 PM
#21
Looks pretty interesting. I was looking for a lightweight DB to use for storing the privkeys for the wallet.

Thanks!

from the custom RAM data structures I have achieve millions of operations per second, but it all depends on the specifics as to what performance level is possible. Getting the entire dataset to fit inside the CPU's Lx cache is what gets things going really, really fast. Once things need to "swap" out to the DRAM, it slows down sharply. And if it has to hit the HDD... even if it is SSD, it is an order of magnitude or more, slower.

But for non-performance critical things like tracking wallet privkeys, these performance issues are not really an issue. (unless you are satoshi and have thousands and thousands of privkeys)

James
legendary
Activity: 1260
Merit: 1019
legendary
Activity: 1176
Merit: 1134
March 05, 2016, 11:53:28 PM
#19
legendary
Activity: 2128
Merit: 1074
March 05, 2016, 08:35:51 PM
#18
You're mixing up things. Probably because you don't really understand what the UTXO database actually does inside the bitcoin software.

Spreed of accessing the records is important for obvious reasons.
The traditional reason is that you want to verify new transactions/blocks quickly and efficiently.
And the new reasons, like hardware wallets or any kind of stealth payments.

The current approach from the satoshi's client, where the wallet is glued to the node and keeps track of its addresses, updating their balance as new blocks appear - IMHO this solution has proven to be unreliable. Plus is totally unfeasible for private offline wallets.

An efficient UTXO database that can quickly browse through all the records is definitely a must for the next generation of bitcoin node software.
I am just not aware of any existing engines that would be ready to replace the currently used leveldb.
I think one eventually needs to be (and will be) created explicitly for Bitcoin. Just like sipa's secp256k1 lib was created explicitly for Bitcoin, as the originally used openssl's implementation wasn't good enough to handle a crucial part of the software.
I had this discussion about 3 years ago with etotheipi, the ex-CEO of ex-Armory:

New blockchain management in Armory

https://bitcointalksearch.org/topic/new-blockchain-management-in-armory-144015

Now we are sure that Armory was a total loss as a business: they delivered promised software but achieved exactly zero sales, not even a consulting contract for the ex-employees.

I'm not going to repeat the arguments from 3 years ago, interested readers could study the Armory trainwreck.

AFAIK Armory sometime in 2015 also understood that LevelDB is not really suitable and attempted to change the underlying engine to LightningDB:

https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database

Notwithstanding the above I agree with you that distributed cyptocurrencies both pose unique requirements and provide unique opportunities for optimization. In particular the traditional database ACID is an overkill for UTxO, something much simpler will work due to extraordinary replication that is fundamental to the concept of distributed cryptocurrencies.

I always like to collect the losers who deny the need for transactional consistency is critical. GLBSE and ASICMINER were the two most well known occurrences of double payment. Couple of days ago BitBet joined the club, this time blaming it on miners. I'm going to quote the whole Qntra article for my future reference here:
A Miner Problem

Posted on March 2, 2016  by  Mircea Popescu   
  
As announced in #bitcoin-assets, BitBet was attacked earlier today through a transaction withholding mechanism. The attack unfolded as follows :

1. A BitBet bet (Jeb Bush will be Republicans' 2016 Presidential Nominee) was closed and resolved on 21 February. This created a list of winners with a reasonable expectation to be paid their winnings.

2. A first transaction was broadcast, to satisfy the claims of the winners, spending some inputs, and offering a fee of 0. This transaction was, as you'd expect, neither mined nor included in the mempools of most Bitcoin nodes.

3. A second transaction was broadcast, spending the same inputs as A1, including a fee of 0.0001, call it A2. For a ~1.5kb transaction, this fee is on the low side, so it perhaps could be argued that it not being mined would be expected. Nevertheless, transaction A2 was also not included in the mempools of most Bitcoin nodes.

4. As neither transaction A1 or A2 were mined after 54 (48) hours, a further transaction was broadcast, spending the same inputs as A1 and A2, and including a fee of 0.000175, call it A3. By any measure, a fee in excess of 10 satoshi per byte should be sufficient to have transactions mined. Nevertheless, contrary to expectation, transaction A3 was not included in either a block or the mempools of most Bitcoin nodes.

5. After a further 48 hours, a fourth transaction was broadcast, spending the same inputs as A1, A2 and A3, and including a fee of 0.00022, call it A4. Just like the previous three, transaction A4 was not either included in a block or advertised by most Bitcoin nodes.

6. After a further 16 hours, transaction B was broadcast, that included the same outputs as transactions A1-A4, but different inputs. Transaction B, like transaction A4, included a fee of 0.00022. Transaction B was advertised by most Bitcoin nodes immediately thereafter, and was included in a block within half hour of being broadcast.

7. Two hours after B was included in a block, transaction A1 was re-broadcast by an unknown third party. Twenty minutes later, the 0 fee, week old, not-advertised-by-anyone-ever transaction was included in a block.

On the basis of these events, the following allegations can readily be supported :

• That the notion of "a majority of Bitcoin nodes" is void of content, most of the relay network being under the control of the same entity and supporting functionality not contemplated by the Bitcoin protocol (such as selective mothballing of specific transactions).(1) This specifically means that "someone"(2) has the ability to nuke transactions out of the network irrespective of whether they are validly signed or not, directly replicating functionality already available to fiat governments in fiat payment processors such as Visa or Paypal.

• That a cartel of Bitcoin miners is deliberately and systematically withholding blocks for an interval of about 20 minutes to a half hour, so as to ensure themselves a (significant) advantage over any would-be competitors.(3)

• That neither above item makes sense or could long survive without the other, meaning that necessarily the culprit is one and the same. Note for the record that an entity controlling 51% of the network (as is by a large margin the case with the Antpool/F2pool cartel) can safely withhold blocks for an indefinite period – if a competitor publishes a longer chain all the cartel has to do is keep mining on its own, eventually they'll prevail.(4)

Note that there are no alternative theories that more parsimoniously explain the chain of observed phenomena, and consequently the validity of the above allegations is a matter of objective truth. One may disagree on the strength of subjective feeling, or put forth his own interpretations, but these – much as in the case of alternative explanations for biological evolution – suffer from a fundamental logical weakness.

Update March 2, 2016 18:49 UTC:  In a further development on this story, Bitbet has now announced a moratorium on payments until the issue is resolved. (N. ed.)

1. You should remember that the Bitcoin relay network has for many years been in very poor shape – there were as little as 60 nodes active last year around this time. Since then the situation has only deteriorated – the jury is still out as to whether BlueMatt's doohicky actually helped or hindered while it was functional, but there's no argument that the network degraded YoY.

2. Who, according to your own preference, may be the NSA or the Antpool/F2pool cartel.

3. The A1 transaction wasn't broadcast and then included in a block – at the time it was broadcast, the block containing it had already been mined – it simply hadn't yet been shared with the rest of the plebs is all.

4. 51% means that out of a day's 144 blocks, one has 74, and consequently gains a 3 block advantage over a competitor chain every single day..It is true that the cartel does not generally wish to advertise its presence (yet), and so to date they've avoided major reorgs. Bear in mind however that it is also the case that the remainder minority is not united but fragmented – so they don't actually need to resort to major reorgs.
Edit: formatting fixes
legendary
Activity: 1176
Merit: 1134
March 05, 2016, 05:51:52 PM
#17
I think it's very important to be able to browse through all the records in a shortest possible time.

I disagree, the major requirements

verify utxo spend
verify utxo amount
remove used txo
add new utxos from block
reorganize revert utxo
yes, this was my thinking with my design. most of the operations above are very fast.
adding utxo from new block does require updating all the various datasets, but all in all not so much time

the reorg revert utxo is the only slow one, but I delay finalizing the bundle until 10 blocks past, so the odds for reorg to invalidate bundle are pretty small. If it happens, several minutes to regenerate the bundle file.

the realtime bundle should be small enough so any reorg can simply recalculate from the max reorg depth
sr. member
Activity: 689
Merit: 269
March 05, 2016, 05:38:09 PM
#16
I think it's very important to be able to browse through all the records in a shortest possible time.

I disagree, the major requirements

verify utxo spend
verify utxo amount
remove used txo
add new utxos from block
reorganize revert utxo
legendary
Activity: 1176
Merit: 1134
March 05, 2016, 04:56:02 PM
#15
Once all the balances for all the addresses are verified at all the bundle boundaries, then "all" that is left is the realtime bundle being validated. The brute force way would just be to generate a bundle without all the blocks, as each new block come in. I guess it wont be so bad as both the bloom filter and the open hashtable was designed to be able to be incrementally added to.
Although I'm not quite sure what you mean by "brute force", indeed the way to go is to keep just a blockchain secured snapshot ot the utxo db.
Its the only way to go.
Everybody knows that, but nobody does anything about it Smiley
I have sha256 hashes of all the internal datasets, so it is possible to quickly find out where any mismatch is between two datasets (though it is contaminated with endian dependence at this time). How to know who put the utxo hash into the blockchain? If there is a way to do this without it being vulnerable to attacker posting his version, then it would be great to have

by bruteforce, I mean iterating through each block, tx, address and compare results to a known good data source.

Once it confirms at a specific bundle boundary, then it is all wrapped into a read-only filesystem, which by their nature are pretty tamper proof and also adds a standard lz compression/decompression layer transparently

This means the only things that would ever need to be recalculated are the most recent bundle (max 2000 blocks) and even that is possible to be reduced with partial bundles saved as the realtime bundle grows. It is a tradeoff between startup calculation time and runtime backups

James
legendary
Activity: 2058
Merit: 1416
aka tonikt
March 05, 2016, 04:46:40 PM
#14
Once all the balances for all the addresses are verified at all the bundle boundaries, then "all" that is left is the realtime bundle being validated. The brute force way would just be to generate a bundle without all the blocks, as each new block come in. I guess it wont be so bad as both the bloom filter and the open hashtable was designed to be able to be incrementally added to.
Although I'm not quite sure what you mean by "brute force" among other things, indeed the way to go is to keep just a blockchain secured snapshot of the utxo db at a certain block.
Its the only way to go.
Everybody knows that, but nobody does anything about it Smiley
legendary
Activity: 1176
Merit: 1134
March 05, 2016, 04:40:16 PM
#13
sounds like a good stuff.
I'll give it a better look once I get sober Smiley

so can you refer to some specific code that implements this?
github.com/jl777/SuperNET

in the SuperNET/iguana dir the iguana_*.c files have this, mostly in iguana_ramchain.c. No comments to speak of and I like to pack as much into each line as possible, so it is 3x to 5x denser than most the C code out there.

I will properly document everything once it is all finalized.

As you might imagine, it is quite tricky to get all the bits in the right place, so validating the dataset is my current priority. I added ability for it to act as a lossless codec for the rawtransactions, which would allow a brute force comparison with the reference bitcoind.

Once that is confirmed, then the next step is to validate the block explorer level data, but I am not familiar enough with any to generate the reference data. Hopefully someone around here is.

Once all the balances for all the addresses are verified at all the bundle boundaries, then "all" that is left is the realtime bundle being validated. The brute force way would just be to generate a bundle without all the blocks, as each new block come in. I guess it wont be so bad as both the bloom filter and the open hashtable was designed to be able to be incrementally added to.

James
legendary
Activity: 2058
Merit: 1416
aka tonikt
March 05, 2016, 04:28:08 PM
#12
sounds like a good stuff.
I'll give it a better look once I get sober Smiley

so can you refer to some specific code that implements this?
legendary
Activity: 1176
Merit: 1134
March 05, 2016, 04:25:01 PM
#11
Quote
Not sure what sort of things "like to do anything on each single one of them" you would want to do across all UTXO. To my thinking, other that total balance, the UTXO is not spanning across all addresses, so if you can extract all the relevant tx per address efficiently, that is sufficient.
ok, so let me ask you this way.

you have your utxo database, or so you call it.
you get a new transaction and you have to verify it.
how much does it take?

i hope you know that you need to fetch each tx's input from the db, verify its consistency and then put the changed records back to the db.

FETCH:
For each vin, the txid/vout is mapped to the corresponding unspentind
I have a bitwise hashtable that is at most 50% full, so an openhash table works pretty well. I think the worst case is about 20 collisions, but each one is just a matter to increment to the next item.

once the txid is found, the vout tells us where the unspent is, as each txid has the firstvout as part of its data. at most 200, but on average scanning backwards finds the txid quicker than N/2 bundles for obvious reasons.

Using the listunspent as a comparison, that has to also traverse a linked list and it would need to do that for each bundle, so the updating utxo would be an order magnitude less work. Dont have benchmarks yet, but probably in the 100 microsecond range

VERIFY:
Once the unspentind is retrieved, the utxo bitmap is checked (instant)

PUT:
assuming all is well, the utxo bitmap bit is set to spent (instant)

On startup the utxo bitmap does need to be regenerated and updated from the last bundle boundary. I plan to make utxo dataset saves more frequently than 2000 block boundaries, maybe every 10, so only 10 blocks worth needs to be replayed to get to a current utxo set

James
legendary
Activity: 2058
Merit: 1416
aka tonikt
March 05, 2016, 03:59:40 PM
#10
Quote
Not sure what sort of things "like to do anything on each single one of them" you would want to do across all UTXO. To my thinking, other that total balance, the UTXO is not spanning across all addresses, so if you can extract all the relevant tx per address efficiently, that is sufficient.
ok, so let me ask you this way.

you have your utxo database, or so you call it.
you get a new transaction and you have to verify it.
how much does it take?

i hope you know that you need to fetch each tx's input from the db, verify its consistency and then put the changed records back to the db.
legendary
Activity: 1176
Merit: 1134
March 05, 2016, 03:43:29 PM
#9
In its raw form with all the data scattered across the 200 bundles, a simplistic search for a specific address took 2.5 milliseconds on a 1.4Ghz i5. Not a parallel process, but serial. Using faster CPU and 8 cores, I think times of 100 nanoseconds could do a complete search and sum for a specific address.
Yes, but I'm not interested in checking balance on addresses. I just want to know how fast it is to go trough each of the UTXO db records - like to do anything on each single one of them.

As for the rest, the memory mapping, storing it on disk and stuff, I think you're on to something.
But first I just want to know how fast your utxo db works comparing to mine. Smiley

it took 2.5 milliseconds to do a listunspents for an address with thousands of tx on my laptop. It could have been any address (once I get all the bugs fixed)

If you wanted to scan all addresses, the totally unoptimized method would be taking a long time as it is designed for parallel searching specific address, but once the optimized dataset is there, then a rich list could be calculated in a few seconds.

Not sure what sort of things "like to do anything on each single one of them" you would want to do across all UTXO. To my thinking, other that total balance, the UTXO is not spanning across all addresses, so if you can extract all the relevant tx per address efficiently, that is sufficient.

Maybe I missed something that needs to rescan all utxo? With my setup importprivkey is not needed as all addresses are basically already there. If the millisecond time is too slow, then it can of course be dereferenced into 6 bytes per utxo.

All utxo are in a linked list inside each bundle and a hash table allows to find the end point directly. Additionally a bloom filter is added to each bundle to quickly determine if there is a need to search within the bundle.

If you just want all utxo to scan through, just iterate through the utxo bitmap, extract the utxo data and put it all into a single memory mapped file. But not sure it is worth spending a GB to do this as all the unspent data is already in memory mapped files, so 6 bytes per unspent is enough to make a list for each address's utxo. Then with a direct lookup of that address, you find all the utxo

struct iguana_unspent // 28 bytes
 {
    uint64_t value;
    uint32_t txidind,pkind,prevunspentind,scriptoffset;
    uint16_t hdrsi:14,type:4,vout:14;
 } __attribute__((packed));

What I am saying is that there is no need for DB. All operations can be done essentially in RAM via memory mapped files, with a few exceptions for realtime address balances, which need's r/w memory

James
legendary
Activity: 2058
Merit: 1416
aka tonikt
March 05, 2016, 03:29:17 PM
#8
In its raw form with all the data scattered across the 200 bundles, a simplistic search for a specific address took 2.5 milliseconds on a 1.4Ghz i5. Not a parallel process, but serial. Using faster CPU and 8 cores, I think times of 100 nanoseconds could do a complete search and sum for a specific address.
Yes, but I'm not interested in checking balance on addresses. I just want to know how fast it is to go trough each of the UTXO db records - like to do anything on each single one of them.

As for the rest, the memory mapping, storing it on disk and stuff, I think you're on to something.
But first I just want to know how fast your utxo db works comparing to mine. Smiley
legendary
Activity: 1176
Merit: 1134
March 05, 2016, 03:23:20 PM
#7
for utxo, my approach with iguana creates a parallel set of datasets in read only memory mapped files. Arguably, this is the ultimate DB as it simply cant get any faster than this.

All things needed for block explorer level queries are precalculated and put into the read only files.

By having each bundle of 2000 blocks being mostly independent of all other blocks, this not only allows for parallel syncing of the blockchain but also using multiple cores for all queries regarading tx history.

I also calculate all address balances for each bundle, so by adding all the values across all the bundles you end up with the current balance as of the most recent bundle. Special case handling of the realtime bundle gets us up to date block explorer data for all blocks, tx, addresses (even multisig)

I dont quite have everything done, but the basic structure is showing very good results. I also remove as much redundancy as possible during the creation of the read-only files so instead of taking more space than the raw blockchain, it ends up taking about half the space. About half of that is the signatures, which can be purged for non-relaying nodes after it has been validated.

The format of the utxo vector would just be the location of the unspent that is being spent and the location of the spend is implicit in the order of the vins.

[u0, u1, u2, u3, ...] is a list of 6 byte locators for the unspent that the corresponding vin in the bundle is spending, where each ui contains the bundle id and the unspentind. Each unspentind is simply the order it appears in the bundle (starting at 1).

the above is not directly usable, but it is invariant and can be created once and put into the read-only file(system). using mksquashfs reduces the size of the index tables by almost 50%.

Given such a list for all bundles, then using a parallel multicore process, they can all be traversed and update a bitmap to indicate which outputs are spent. So a 0 bit for this means it is unspent. Maybe 1 bit of space per utxo is good enough, as that is seemingly as good as it gets, but it is 1 bit per vout, so more like 30MB size. However, I think this can be compressed pretty good with most of the early vouts already being spent (ignoring the satoshi ones). I havent had a chance to get an up to date utxo bitmap, but I hope to get one in the next week.

At the 30MB size or smaller, the entire utxo bitmap will fit into CPU L cache and provide 10x faster performance than RAM. RAM is actually quite slow compared to things that can be done totally inside the CPU.

James
I'm sorry I don't have time to dig into your code, though I'm sure its a good stuff.
So how do you actually store the db in memory, in some simple words?

I use hashmaps and its pretty fast, but also very primitive.
Plus it takes loads of memory and ages to load from the disk.
And writing it into disk - separate problem.

I'm sure there must be so much space to improve it.

Can you say how many seconds you need to browse through all the current utxo records and calc, lets say, the sum of all the values.
With my engine I cannot get below 4 seconds anymore.

UTXO db is over 35 million records now.
It really needs a better shit than leveldb
In its raw form with all the data scattered across the 200 bundles, a simplistic search for a specific address took 2.5 milliseconds on a 1.4Ghz i5. Not a parallel process, but serial. Using faster CPU and 8 cores, I think times of 100 nanoseconds could do a complete search and sum for a specific address.

But if the usecase is to keep all the totals updated, then after each bundle is completed, the thing to do is update the utxo as of that bundle boundary. That is a one time calculation of adding the spends starting from the previous bundle boundary.

What I am saying is to create write once data set into memory mapped files. So it is stored in the file. memory mapping it allows the CPU direct access to it.

The onetime update would iterate through the bundle's spends and update the utxo bitmap, but it would also update the total balances for each address. So at the end of each bundle, there would be the up to date total of all spends for each address.

Within each bundle is also a set of balances for all addresses that occur in that bundle. Again, with a onetime update, all the address balances can be updated with the changes from that bundle. This is not a mainstream case, so I dont do this automatically in iguana, but it wouldnt add too much time to the overall processing assuming these total balances are updated at the end of each bundle boundary.

It would be vin number of additions that occur in each bundle to get the total spends for addresses in that bundle and then one addition for each address to the global balances. Doing it this way, to get an up to date balance would be a matter of indexing into the global balance list and then adjusting it by the realtime bundle's vins and vouts.

In one implementation I updated all the balances as new tx came in, so it took no time to get a total summary. Checking that total by summing all the accounts took a few milliseconds, but sorting it into a rich list did end up taking a few seconds. It used up HDD space for storing all the balances, 16 bytes per address, so it does add up the space required if all addresses are needing to be updated. but a CPU can do tens of millions of adds quite quickly through RAM

James
legendary
Activity: 2058
Merit: 1416
aka tonikt
March 05, 2016, 02:47:23 PM
#6
for utxo, my approach with iguana creates a parallel set of datasets in read only memory mapped files. Arguably, this is the ultimate DB as it simply cant get any faster than this.

All things needed for block explorer level queries are precalculated and put into the read only files.

By having each bundle of 2000 blocks being mostly independent of all other blocks, this not only allows for parallel syncing of the blockchain but also using multiple cores for all queries regarading tx history.

I also calculate all address balances for each bundle, so by adding all the values across all the bundles you end up with the current balance as of the most recent bundle. Special case handling of the realtime bundle gets us up to date block explorer data for all blocks, tx, addresses (even multisig)

I dont quite have everything done, but the basic structure is showing very good results. I also remove as much redundancy as possible during the creation of the read-only files so instead of taking more space than the raw blockchain, it ends up taking about half the space. About half of that is the signatures, which can be purged for non-relaying nodes after it has been validated.

The format of the utxo vector would just be the location of the unspent that is being spent and the location of the spend is implicit in the order of the vins.

[u0, u1, u2, u3, ...] is a list of 6 byte locators for the unspent that the corresponding vin in the bundle is spending, where each ui contains the bundle id and the unspentind. Each unspentind is simply the order it appears in the bundle (starting at 1).

the above is not directly usable, but it is invariant and can be created once and put into the read-only file(system). using mksquashfs reduces the size of the index tables by almost 50%.

Given such a list for all bundles, then using a parallel multicore process, they can all be traversed and update a bitmap to indicate which outputs are spent. So a 0 bit for this means it is unspent. Maybe 1 bit of space per utxo is good enough, as that is seemingly as good as it gets, but it is 1 bit per vout, so more like 30MB size. However, I think this can be compressed pretty good with most of the early vouts already being spent (ignoring the satoshi ones). I havent had a chance to get an up to date utxo bitmap, but I hope to get one in the next week.

At the 30MB size or smaller, the entire utxo bitmap will fit into CPU L cache and provide 10x faster performance than RAM. RAM is actually quite slow compared to things that can be done totally inside the CPU.

James
I'm sorry I don't have time to dig into your code, though I'm sure its a good stuff.
So how do you actually store the db in memory, in some simple words?

I use hashmaps and its pretty fast, but also very primitive.
Plus it takes loads of memory and ages to load from the disk.
And writing it into disk - separate problem.

I'm sure there must be so much space to improve it.

Can you say how many seconds you need to browse through all the current utxo records and calc, lets say, the sum of all the values.
With my engine I cannot get below 4 seconds anymore.

UTXO db is over 35 million records now.
It really needs a better shit than leveldb
legendary
Activity: 1176
Merit: 1134
March 05, 2016, 11:12:57 AM
#5
It really is a pointless question.

I think it's very important to be able to browse through all the records in a shortest possible time.

For whom is the speed so important?

For the people who treat finances responsibly the transactional integrity will be most important specification, immediately followed by capability to integrate with existing financial and accounting middleware. Speed would be secondary or even tertiary, especially since the UTxO set is widely replicated and six confirmations take one hour on average.


All users value speed. Otherwise we would still be using the slowest CPU that will eventually complete the tasks.

While speed and reliability can be a tradeoff decision, it is actually possible to get more reliability and speed at the same time. For example, if you are needing many subsystems to all work vs one, you can get a more reliable result that is based on just one thing, vs one that requires many parts.

Of course it is most likely that any such speedy and reliable system is more complicated to write, but once written it can be validated

James
legendary
Activity: 1176
Merit: 1134
March 05, 2016, 10:56:58 AM
#4
for utxo, my approach with iguana creates a parallel set of datasets in read only memory mapped files. Arguably, this is the ultimate DB as it simply cant get any faster than this.

All things needed for block explorer level queries are precalculated and put into the read only files.

By having each bundle of 2000 blocks being mostly independent of all other blocks, this not only allows for parallel syncing of the blockchain but also using multiple cores for all queries regarading tx history.

I also calculate all address balances for each bundle, so by adding all the values across all the bundles you end up with the current balance as of the most recent bundle. Special case handling of the realtime bundle gets us up to date block explorer data for all blocks, tx, addresses (even multisig)

I dont quite have everything done, but the basic structure is showing very good results. I also remove as much redundancy as possible during the creation of the read-only files so instead of taking more space than the raw blockchain, it ends up taking about half the space. About half of that is the signatures, which can be purged for non-relaying nodes after it has been validated.

The format of the utxo vector would just be the location of the unspent that is being spent and the location of the spend is implicit in the order of the vins.

[u0, u1, u2, u3, ...] is a list of 6 byte locators for the unspent that the corresponding vin in the bundle is spending, where each ui contains the bundle id and the unspentind. Each unspentind is simply the order it appears in the bundle (starting at 1).

the above is not directly usable, but it is invariant and can be created once and put into the read-only file(system). using mksquashfs reduces the size of the index tables by almost 50%.

Given such a list for all bundles, then using a parallel multicore process, they can all be traversed and update a bitmap to indicate which outputs are spent. So a 0 bit for this means it is unspent. Maybe 1 bit of space per utxo is good enough, as that is seemingly as good as it gets, but it is 1 bit per vout, so more like 30MB size. However, I think this can be compressed pretty good with most of the early vouts already being spent (ignoring the satoshi ones). I havent had a chance to get an up to date utxo bitmap, but I hope to get one in the next week.

At the 30MB size or smaller, the entire utxo bitmap will fit into CPU L cache and provide 10x faster performance than RAM. RAM is actually quite slow compared to things that can be done totally inside the CPU.

James

legendary
Activity: 2058
Merit: 1416
aka tonikt
March 05, 2016, 03:35:27 AM
#3
It really is a pointless question.

I think it's very important to be able to browse through all the records in a shortest possible time.

For whom is the speed so important?

For the people who treat finances responsibly the transactional integrity will be most important specification, immediately followed by capability to integrate with existing financial and accounting middleware. Speed would be secondary or even tertiary, especially since the UTxO set is widely replicated and six confirmations take one hour on average.



You're mixing up things. Probably because you don't really understand what the UTXO database actually does inside the bitcoin software.

Spreed of accessing the records is important for obvious reasons.
The traditional reason is that you want to verify new transactions/blocks quickly and efficiently.
And the new reasons, like hardware wallets or any kind of stealth payments.

The current approach from the satoshi's client, where the wallet is glued to the node and keeps track of its addresses, updating their balance as new blocks appear - IMHO this solution has proven to be unreliable. Plus is totally unfeasible for private offline wallets.

An efficient UTXO database that can quickly browse through all the records is definitely a must for the next generation of bitcoin node software.
I am just not aware of any existing engines that would be ready to replace the currently used leveldb.
I think one eventually needs to be (and will be) created explicitly for Bitcoin. Just like sipa's secp256k1 lib was created explicitly for Bitcoin, as the originally used openssl's implementation wasn't good enough to handle a crucial part of the software.
Pages:
Jump to: