Pages:
Author

Topic: UTXO set size does not increase storage costs, you just suck at computer science (Read 3160 times)

legendary
Activity: 2128
Merit: 1073
What is going on here with this "in-RAM database requirement"?

DoS attack: somebody sends you lots of fake transactions with non-existent inputs. To dismiss such transaction you need to confirm that input is indeed non-existent, and it requires one or more read operations.

If you store UTXO set on hard disk drives, it would be pretty much trivial to DoS attack you because one HDD can do something like 100 random reads per second.

This is why people think that there should be a limit on UTXO set. For example, via introduction of minimal viable transaction output value. (Say, if TXO value is 1 satoshi it might be seen as spam.)

Please understand me correctly, this is just what I saw in discussions here, it appears to be a prevalent opinion.

Of course, it is possible to design system in such a way that UTXO set size won't be a problem and won't affect costs, this is what I'm talking about here...
Thanks for the explanation. I think such a DoS would/should trigger the "misbehaving peer" defense that is already implemented in the Satoshi's client.

I see still the majority of developers considers the block verification as sort of a race that happens after each block gets broadcast. I consider it the implementation artefact in the proof of concept code. Both from information-theorethic and game-theoretic point of views the block should only refer to the transactions that were already broadcast and are normally verified with at-a-leisure speed during the inter-block period.

I'm not really familiar with game theoretic methods of system analysis. I'm way more comfortable with hierarchical control systems methodology, where the nodes intent to honestly cooperate but have limited information about the state of the whole system or various resource limitations in internal processing.

I'll be watching the evolution of this problem with great interest. Maybe someone has some game-theoretic attack hidden it their sleeve. Or it may turn that this is just a problem in industrial and organizational psychology, where the hard-scientific reasoning have to take the backseat.
legendary
Activity: 2128
Merit: 1073
There is: while your competitors are downloading your new block they are wasting their time; they aren't working on extending the best chain. Provided your block still is downloaded and verified by >50% of the hashing power before the next block is downloaded and verified, your block will be extended by the majority of the hashing power and thus you benefit from the reduced competition from the miners who couldn't download and validate your block fast enough. This also means miners can offer a discount on fees, provided you promise not to submit your transactions to other miners. (trivially verified if you keep track of orphans)
I still don't get it.

"My block" consists of (1) header (2) list of hashes of positively verified transactions (3) coinbase transaction. "My block" is the shortest, so it actually has a propagation advantage over the "legacy blocks". "My nodes" don't hide the transactions, they undertake to distribute them as widely as possible before the "my block" gets broadcast. All the nodes operating through "my protocol" have plenty of time to verify the early broadcast transactions and just need to store in the RAM the hashes of positively verified transactions.

The "legacy block" format allowed hiding the blocks from the p2p network by not broadcasting them and disclosing them late at the end of the block. Hence the "block verification" overhead came from and allowed to play the deception games.

The way I see it the "my block" is beneficial over "legacy block" both in the information-theoretic sense (nearly halves the network load) and in the game-theoretic sense.

BTW. "my block" isn't really mine. This optimization was openly discussed after the publishing of Microsoft's "red balloon" paper.

The only game-theoretic advantage that I see for the "legacy block" is that by being "legacy" the code doesn't need to change and the people who memorized the source code of Satoshi's client have an advantage over the developers new to the Bitcoin. If the hard-fork is allowed that advantage is nullified. What else am I missing?
legendary
Activity: 1120
Merit: 1152
Or maybe there is some sort of game-theoretic advantage to broadcasting the transactions as late as possible? I'm not aware of any, but I haven't put much thought into the game-theoretic aspects of this concept of hiding the transactions from the other miners. To me it seems counterproductive globally.

There is: while your competitors are downloading your new block they are wasting their time; they aren't working on extending the best chain. Provided your block still is downloaded and verified by >50% of the hashing power before the next block is downloaded and verified, your block will be extended by the majority of the hashing power and thus you benefit from the reduced competition from the miners who couldn't download and validate your block fast enough. This also means miners can offer a discount on fees, provided you promise not to submit your transactions to other miners. (trivially verified if you keep track of orphans)
legendary
Activity: 1022
Merit: 1033
Which branch of computer science postulates that? Transactions arrive nearly randomly during the inter-block period, so the mean time available to verify transaction is about 5 minutes.

What is going on here with this "in-RAM database requirement"?

DoS attack: somebody sends you lots of fake transactions with non-existent inputs. To dismiss such transaction you need to confirm that input is indeed non-existent, and it requires one or more read operations.

If you store UTXO set on hard disk drives, it would be pretty much trivial to DoS attack you because one HDD can do something like 100 random reads per second.

This is why people think that there should be a limit on UTXO set. For example, via introduction of minimal viable transaction output value. (Say, if TXO value is 1 satoshi it might be seen as spam.)

Please understand me correctly, this is just what I saw in discussions here, it appears to be a prevalent opinion.

Of course, it is possible to design system in such a way that UTXO set size won't be a problem and won't affect costs, this is what I'm talking about here...
legendary
Activity: 2128
Merit: 1073
However, after that you need to keep UTXO set in RAM. RAM is expensive.
Which branch of computer science postulates that? Transactions arrive nearly randomly during the inter-block period, so the mean time available to verify transaction is about 5 minutes.

What is going on here with this "in-RAM database requirement"?

Edit: I mean the only new transaction in the block is the coinbase. Every other transaction should be already broadcast on the p2p network.

Are you trying to preserve some artefact from the Satoshi's implementation? From the computer science point of view the array of transactions appended to the block is pure overhead and redundant.

Or maybe there is some sort of game-theoretic advantage to broadcasting the transactions as late as possible? I'm not aware of any, but I haven't put much thought into the game-theoretic aspects of this concept of hiding the transactions from the other miners. To me it seems counterproductive globally.
legendary
Activity: 2940
Merit: 1090
All of this makes me wonder just who's computer science abilities he thinks "suck".  I don't know for sure, but I'd hazard a guess he's misrepresenting their point.

There was some tongue-in-cheek going on, y'know.

Similarly when I agreed "those guys'" do suck at computer science really that was just a lefthanded way of saying "good going old chap" not really anything about "those guys" at all.

-MarkM-
staff
Activity: 4284
Merit: 8808
you just don't store UTXO set locally.
[...]
But, of course, it's also possible to use UTXO set referenced by blocks instead of computing your own.

I'm not sure that "SPV UTXO security" is the right term because what I'm talking about is applicable to miners, but SPV is used in context of client systems, no?
SPV security means trusting the longest chain instead of validating for yourself. Being a "client" or not isn't relevant.

But indeed, I assumed you meant not bothering to validate for yourself rather than validating and then pruning, though only in the context of a hash committed UTXO, so that you can securely query. I would have just called that "pruning". And there is nothing wrong with that.  But I also don't know that there is any great benefit in that— as it substantially increases bandwidth usage. Tradeoffs are fine though.

SPV UTXO security means that the reward for someone who can produce N blocks isn't just reversing the transactions they can make and replacing them, it's on the order of the value of the whole bitcoin economy, past and into the future.
This attack is possible right now, N equals current blockchain height.
No, in fact— it isn't, as nodes won't reorganize back infinitely far. (this has its own problems, as a longer fork that branches where nodes won't reorganize to makes the system divergent as newly initialized nodes will be on the 'wrong' chain)

they could freely inflate the supply, etc.
No, it is possible to verify supply having only UTXO set.
To autonomously verify it you must query the entire set. If you're assuming self-verified pruned utxo thats fine, if you're assuming SPV security UTXO... not so much.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
You can always verify the data on DHT with the full blockchain, and you only need to do it once

So you're saying that having to store exabytes of data isn't a problem, all you have to have is exabytes of network bandwidth in order to install the client.

Seriously?

OP has missed the forest for the trees by interpreting "storage costs" too literally.  Turning storage costs into network costs doesn't make them go away.  But it does tempt people to change the security model by skipping that bothersome scanning-exabytes-worth-of-data phase and just trusting the DHT root hash that their friendly neighborhood multinational bank says is safe to use.

All of this makes me wonder just who's computer science abilities he thinks "suck".  I don't know for sure, but I'd hazard a guess he's misrepresenting their point.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
i.e. you still scan whole blockchain, you just don't store UTXO set locally.

You have the Bitcoin security model wrong.

How did all these exabytes of data get into the DHT?  If you didn't put them there yourself (or at least scan all of them), you're trusting whoever put them there that they constitute a well-formed blockchain.
legendary
Activity: 1792
Merit: 1111
i.e. you still scan whole blockchain, you just don't store UTXO set locally.

You have the Bitcoin security model wrong.

How did all these exabytes of data get into the DHT?  If you didn't put them there yourself, you're trusting whoever put them there that they constitute a well-formed blockchain.


You can always verify the data on DHT with the full blockchain, and you only need to do it once
legendary
Activity: 1022
Merit: 1033
What you're describing is SPV UTXO security, and it's also been brought up many times before. Perhaps you should save for allegations of unimagniativeness for ideas that area actually original. Tongue

Again, the main topic of my post was bandwidth-storage trade-off which is independent of security model you use. Actually I've outlined very conservative approach which changes nothing about security model, i.e. you still scan whole blockchain, you just don't store UTXO set locally.

But, of course, it's also possible to use UTXO set referenced by blocks instead of computing your own.

I'm not sure that "SPV UTXO security" is the right term because what I'm talking about is applicable to miners, but SPV is used in context of client systems, no?

SPV UTXO security means that the reward for someone who can produce N blocks isn't just reversing the transactions they can make and replacing them, it's on the order of the value of the whole bitcoin economy, past and into the future.


This attack is possible right now, N equals current blockchain height.

What changes if we make N less than current blockchain height?

They could steal anyones coin,

Same thing happens if you mine starting from genesis.

If you do history-rewriting attack of a limited depth you can steal coins of some people.

they could freely inflate the supply, etc.

No, it is possible to verify supply having only UTXO set.

You have the Bitcoin security model wrong:

I think you get it wrong: history-rewriting attack is fundamentally not different from UTXO manipulation attack in terms of severity.

If you have enough computational power and checkpointing is not a problem, you can do either of attacks, with same consequences.

If you don't have enough computational power, you can't do either of attacks. If centralized checkpoints are present in clients, you can't do either of attacks.
staff
Activity: 4284
Merit: 8808
For all practical intents and purposes it is enough to have N latest blocks + UTXO set.
What you're describing is SPV UTXO security, and it's also been brought up many times before. Perhaps you should save for allegations of unimagniativeness for ideas that area actually original. Tongue

SPV UTXO security means that the reward for someone who can produce N blocks isn't just reversing the transactions they can make and replacing them, it's on the order of the value of the whole bitcoin economy, past and into the future. They could steal anyones coin, they could freely inflate the supply, etc.

You have the Bitcoin security model wrong: Bitcoin verifies everything because it doesn't trust, except in the very limited fashion that it must because we can't do ordering autonomously. Abandon that and you lose the incentives that make mining secure enough to even think that you could abandon validation.

Perhaps, ultimately, the scaling means that it's a superiority security model, but if you're going to change the security, you could hypothesize even weaker models which are even less costly to operate.
legendary
Activity: 1022
Merit: 1033
There are challenges already before that though like spam and who can change which entries in the database and so on so it might be best to do just the UTXO set first and see how that works out before risking one's precious blockchain.

Actually keeping historic blockchain data isn't even necessary in practice.

Let's assume that UTXO set is referenced from blocks in such a way that block which refers to wrong UTXO set is invalid. This means that it costs a lot of money to make a block with fake UTXO set, basically...

For all practical intents and purposes it is enough to have N latest blocks + UTXO set.

A new node which joins the network can grab UTXO set from a year old block and see if subsequent blocks agree with it.

In a hypothetical scenario where UTXO set is fake, forged by an attacker, attacker is so powerful that he can create 1 year worth of blocks... He clearly dominates the Bitcoin network. He could as well re-create all blocks from start, so it doesn't matter whether we have 1 year worth of full data or full blockchain.

But if such attack is under way, that would mean that Bitcoin is worthless. Thus if it is not worthless, there is no attack, and so it is safe to grab UTXO set and work from there.

I am not sure offhand why thin client weren't already thinking DHT

It offers practically no advantages at this point, but is considerably harder to implement.

(Or maybe phones just aren't good at being DHTs, maybe they aren't "always on" or have limited bandwidth.)[/quotes]

Yes, it is likely impractical to make phones DHT nodes, although phones will be able to query DHT.

It could work for desktop clients, though. A lot of people think that running Bitcoin-Qt is the right thing, I doubt they will mind running a DHT node for themselves and for mobile brethren.
legendary
Activity: 1022
Merit: 1033
It's just a protocol which requires users to keep all data required to verify their transactions.

When the user is offline, where then can that data be found, or is it even needed at such times at all?

It is like a certificate of authenticity, it is needed only when one sends a transaction. Basically he sends a message: "Here's my transaction: xxx, and here are certificates for all coins involved: yyy".

Person you're sending coins to will need these "certificates" so he can prove authenticity of coins he got too.

And miners need them to decide whether to include it into a block. They will then repackage and optimize "certificates" of each coins involved into a certificate for a block. so other miners can just check whole block without asking anything from the person who send transaction.

There is one caveat, though: people need to update their certificates periodically, otherwise they become stale... In that case one might need a help of a third party to update it. Or DHT with historic data.

In any case, I'm not saying that this protocol is actually worth implementing. But the mere fact that it can exist shows that there is no good reason to limit UTXO set growth.

Also, nothing I mentioned requires state-of-the-art tech... It is literally something a mildly talented undergrad CS student could implement.
legendary
Activity: 2940
Merit: 1090
It's just a protocol which requires users to keep all data required to verify their transactions.

When the user is offline, where then can that data be found, or is it even needed at such times at all?

-MarkM-
legendary
Activity: 1372
Merit: 1002
legendary
Activity: 1022
Merit: 1033
In simple wordings: a distributed and shared UTXO dataset, right?

So we could also have a distributed and shared full blockchain?

Yes, that's where I'm leading to;
but in this post I've simply outlined a protocol which would allow one to take bandwidth/storage trade-off without sacrificing latency too much.

It doesn't require a distributed UTXO/blockchain dataset, I've simply implied existence of such dataset for better introduction.

It's just a protocol which requires users to keep all data required to verify their transactions.
legendary
Activity: 2940
Merit: 1090
In simple wordings: a distributed and shared UTXO dataset, right?

So we could also have a distributed and shared full blockchain?

Distributed data is one thing, reliable distributed data is a whole additional layer.

When "everyone" ("enough" people) have the blockchain, anything that "goes missing" from the DHT can be replenished.

But in principle you could do RAID type stuff to try to make it reliable.

There are challenges already before that though like spam and who can change which entries in the database and so on so it might be best to do just the UTXO set first and see how that works out before risking one's precious blockchain.

I am not sure offhand why thin client weren't already thinking DHT, maybe because one usually has the folks using it be the providers of it whereas thin clients seem to have been associated in people's minds with centralised servers. (Maybe those people had service offerings in mind as business they wanted to create a market for or something?)

(Or maybe phones just aren't good at being DHTs, maybe they aren't "always on" or have limited bandwidth.)

-MarkM-
legendary
Activity: 1792
Merit: 1111
In simple wordings: a distributed and shared UTXO dataset, right?

So we could also have a distributed and shared full blockchain?

Haha, sorry, I didn't mean to offend anyone, I'm just somewhat disappointed that people consider only the simplest solutions and do not want to see a bigger picture.

Let's start with an example... Suppose I am a miner who is in kinda exotic situation: I have unlimited computational and networking resources, and enough temporary storage (RAM) to verify blockchain, but local permanent storage is incredibly scarce. But if I want to store something, I can store it externally over network. Say, in a DHT like Freenet, Tahoe-LAFS or something like that.

So I want to download and scan blockchain up to block N just once, then I need to be able to verify validity of any transaction. Is it possible?

Of course. As I go through blockchain I will build an index and will store it in an external DHT. I only need to keep hash of the latest block locally, 32 bytes that is.

Having this hash, I can retrieve things from DHT until I find transaction outputs I need to verify in index I've built. (I do not care whether DHT is secure: if something was tampered with, hashes won't match.)

This is kinda obvious: if secure distributed file systems can exist, then I can simply store data in such file systems instead of a local file system.

But... how much would it cost me to verify a transaction? Well, tree-like datastructures generally have look-up costs on scale of log2 N where N is a number of elements. In the worst case each individual satoshi is an individual UTXO, so we have 21000000*100000000 = 2100000000000000 ~= 2^51 UTXOs. Thus I need something like 51 lookups to find an UTXO in a binary search tree. Or just 9 lookups if I have a 64-ary tree.

But people can argue that 9 lookups per UTXO is a lot... Network latency yada yada. How about zero?

That's possible. Suppose that person which sends transaction knows how I store index in DHT, it isn't secret. To make sure that I'll include his transaction into block, he will fetch all the data I need from DHT himself, and will send me a message with his transaction and all information I need.

I don't need to look up anything in a DHT myself, I only need data which was stored in the message. And this is secure: if data was tampered with, hash won't match.
 
So, basically, number of transactions I can include into a block is limited by my computational and network abilities, but storage capability/cost is irrelephant.

But what's about blocks mined by others? Oh...

Well, it is possible to do 9 DHT lookups per UTXO mentioned in block. Number of outputs is limited, I can do lookups in parallel, so it isn't such a big problem. But still...

You know, miners are friendly guys, how about they all use same DHT, and then include confirmation information together with the block they have just mined?

So I receive a new block and supplementary information which is all what is needed to confirm that block is valid.

In the end, it is possible to do all mining having only 32 bytes of permanent secure storage. It requires somewhat more bandwidth, though. But extra bandwidth costs are approximately proportional to block size. So maybe not a big problem...

E.g. I either need 128 GB of RAM, array of hard drives and 100 MBit/sec pipe. Or I need 1 GB of RAM, no hard drives at all and 1 GBit/sec pipe. Which is cheaper?

So what I'm talking about storage/bandwidth trade-off. Using less storage might increase latency, but possible in such a way that it won't be critical.

Next time I will teach you how to implement a blockchain-based cryptocurrency in such a way that new miners can start mining right away without downloading whole blockchain, stay tuned...

legendary
Activity: 1400
Merit: 1013
I'm just somewhat disappointed that people consider only the simplest solutions and do not want to see a bigger picture.
What you're seeing is ex post facto reasoning.

Some people have an unstated goal that requires Bitcoin not to scale. For whatever reason they don't want to talk about their actual reason, so they invent reasons to justify their conclusion. When the invented reasons are debunked they don't change their position as they would if those were the real reasons, but instead just invent new reasons.
Pages:
Jump to: