Pages:
Author

Topic: Max block size should also consider the size of UTXO set (Read 3311 times)

legendary
Activity: 1792
Merit: 1111
I created a draft BIP relating to etx and eblock messages.

If a protocol version number was used, then a new block message name is not strictly required.

The updates to the reference client may not be that difficult.  Transactions could be stored internally in the new format and translated when sending/receiving from the network.

To support legacy blocks, the entire blockchain would have to be reindexed.

Some further thoughts for comment...

Given a delta uxto size (whether composite with sigops or not), is it better to:

credit or debit the required fee during tx preparation
  or
aggregate delta utxo size at a block level and scale an "allowed" block size which can be mined for the given tx set in it
  or
both?

I think setting a required fee as a network rule is the wrong approach.  Either have the UTXO limits as a separate rule or folded into size.  That way there is some kind of market price effect.

I think it's a good idea and should have an new thread for that.
legendary
Activity: 1232
Merit: 1094
I created a draft BIP relating to etx and eblock messages.

If a protocol version number was used, then a new block message name is not strictly required.

The updates to the reference client may not be that difficult.  Transactions could be stored internally in the new format and translated when sending/receiving from the network.

To support legacy blocks, the entire blockchain would have to be reindexed.

Some further thoughts for comment...

Given a delta uxto size (whether composite with sigops or not), is it better to:

credit or debit the required fee during tx preparation
  or
aggregate delta utxo size at a block level and scale an "allowed" block size which can be mined for the given tx set in it
  or
both?

I think setting a required fee as a network rule is the wrong approach.  Either have the UTXO limits as a separate rule or folded into size.  That way there is some kind of market price effect.
legendary
Activity: 1078
Merit: 1006
100 satoshis -> ISO code
Some further thoughts for comment...

Given a delta uxto size (whether composite with sigops or not), is it better to:

credit or debit the required fee during tx preparation
  or
aggregate delta utxo size at a block level and scale an "allowed" block size which can be mined for the given tx set in it
  or
both?

Considering how the utxo set is expanding so fast, would it be wise to look at changing the fundamental basis for free tx, from days-destroyed to negative delta uxto size? This could be implemented much sooner than changes associated directly with the block limit.
legendary
Activity: 1232
Merit: 1094
Do you document your formula somewhere?

It was on the mailing list.

Quote from: gmaxwell
An example would be tx_size = MAX( real_size >> 1,  real_size + 4*utxo_created_size - 3*utxo_consumed_size).   The reason for the MAX is so that a block which cleaned a bunch of big UTXO could not break software by being super large, the utxo_consumed basically lets you credit your fees by cleaning the utxo set; but since you get less credit than you cost the pressure should be downward but not hugely so. The 1/2, 4, 3 I regard as parameters which I don't have very strong opinions on which could be set based on observations in the network today (e.g. adjusted so that a normal cleaning transaction can hit the minimum size).
legendary
Activity: 1792
Merit: 1111

My formula works with bytes all around, though there is some scaling because signatures tend to be much bigger than outputs and limits because I'm trying to avoid some edge cases like miners bloating the utxo set with cheaply spendable outputs in order to store-and-forward their blocksize.

Do you document your formula somewhere?
staff
Activity: 4284
Merit: 8808
block size => bandwidth and storage cost
utxo size => RAM cost
No the UTXO isn't in ram, it's just more storage; it's more costly because every verifier must have it online where as the majority of nodes can forget all the rest once its burred; and certainly doesn't need to access it. I'd suggest thinking about it as storage which is necessarily online for the UTXO and storage which is mostly offline/nearline for the history, once you're past the most recent couple hundred blocks or so.
Quote
sigops => CPU cost
However, my impression is that the CPU cost is negligible comparing with bandwidth, storage and RAM and the sigop limit is there just as an anti-DOS mechanism
It's not exactly negligible; especially when one considers the cost of catching up-- there its the dominating cost currently for many people due to sig operations syncing the chain only runs at about 11Mbit/sec or so on a 3.2GHz quad core... (Well, libsecp256k1 will make this considerably faster-- but still slower than high speed consumer broadband that is widely available)  It only doesn't impact block propagation because virtually all signatures are cached from the initial txn relay now.

I don't really think it needs to be adjusted often (or hopefully ever); in that it can be set pretty conservatively and the most important point is getting a situation where "I will pay less fees if I do bar, so if I'm otherwise neutral I should choose to do so."-- partially this is because you're right that it's mostly an anti-dos mechanism.
legendary
Activity: 1792
Merit: 1111
So your idea is to replace the MAX_BLOCK_SIZE with a single composite score of block size, delta utxo size, and something else?
Yes (something else would be the sigops limit).

I think the relationship is:

block size => bandwidth and storage cost
utxo size => RAM cost
sigops => CPU cost

Then the formula has to be reviewed regularly (i.e. having hardfork regularly), depending on each component's development in the real world.

However, my impression is that the CPU cost is negligible comparing with bandwidth, storage and RAM and the sigop limit is there just as an anti-DOS mechanism
staff
Activity: 4284
Merit: 8808
So your idea is to replace the MAX_BLOCK_SIZE with a single composite score of block size, delta utxo size, and something else?
Yes (something else would be the sigops limit; though sigops actually have a kind of quadratic cost, due to the cost of rehashing the transaction and all its inputs; but all this only need to be very roughly correct in order to get the behavioral incentives (don't use frivolous sigops!) right).
legendary
Activity: 1792
Merit: 1111
since unspendable outputs should not be entered into the UTXO set.
Unless P == NP, unspendability is undecidable in the general case... someone can always write a script whos spendability can't be decided without insanely large work, if they want. (e.g. push zeros, OP_SHA1 OP_EQUALVERIFY ... does the all zeros sha1 have a preimage?)

So once you're depending on their cooperation, defining a single known unspendable kind seems reasonable. Someone who wants to bloat the utxo set with an unspendable output always can; unfortunately.

Using that system, the entire UTXO set would be 16 bytes per entry for the key and value.
Thats not much below the current system. Value and outpoint index are compressed and just take a couple bytes, the scriptpubkey is template compressed and takes 20 bytes for most scriptpubkeys. The version, height, coinbase flag, and txid is shared among all outputs with the same id. Smiley

I wrote this long time ago when I knew Bitcoin not very well. But I always think this has to be done if we increase the max block size.

I would like to rewrite it as follow:

We will formally define "utxoSize", which is

txid (32 bytes) + txoIndex (varInt) + scriptPubKey length (varInt) + scriptPubKey (? bytes) + value (8 byte) + coinbase (1 byte) + coinbase block height (0 or 4 bytes)

This is much larger than the encoding we currently use.

Quote
Code:
utxoDiff = (total utxoSize of new UTXOs) - (total utxoSize of spent UTXOs)
With utxoDiff, we may have a soft-fork rule of one of the followings:
  • 1. Put a hardcoded cap to utxoDiff for each block. Therefore, we will be sure that the UTXO set won't grow too fast; or
  • 2. If a block has a small (or even negative) utxoDiff, a higher MAX_BLOCK_SIZE is allowed
That will encourage miners to accept txs with small or negative utxoDiff, which will keep the UTXO set small.

Having multiple limits makes the linear programming problem of deciding which transactions to include much harder (both computationally and algorithmically); it also means that a wallet cannot precisely set a fees per byte based on the content of the transaction without knowing the structure of all the other transactions in the mempool.

Did you see my proposal where I replace size with a single augmented 'cost' that is a weighed sum of the relevant costly thing we'd hope to limit?  (this wouldn't prevent having worse case limits in any of the dimensions too; if really needed).


So your idea is to replace the MAX_BLOCK_SIZE with a single composite score of block size, delta utxo size, and something else?
legendary
Activity: 1232
Merit: 1094
Unless P == NP, unspendability is undecidable in the general case

There needs to be agreement on what counts as definitely unspendable for UTXO set commitments. 

Unless the script starts with OP_RETURN, it is entered into the UTXO set would be fine.
staff
Activity: 4284
Merit: 8808
since unspendable outputs should not be entered into the UTXO set.
Unless P == NP, unspendability is undecidable in the general case... someone can always write a script whos spendability can't be decided without insanely large work, if they want. (e.g. push zeros, OP_SHA1 OP_EQUALVERIFY ... does the all zeros sha1 have a preimage?)

So once you're depending on their cooperation, defining a single known unspendable kind seems reasonable. Someone who wants to bloat the utxo set with an unspendable output always can; unfortunately.

Using that system, the entire UTXO set would be 16 bytes per entry for the key and value.
Thats not much below the current system. Value and outpoint index are compressed and just take a couple bytes, the scriptpubkey is template compressed and takes 20 bytes for most scriptpubkeys. The version, height, coinbase flag, and txid is shared among all outputs with the same id. Smiley

I wrote this long time ago when I knew Bitcoin not very well. But I always think this has to be done if we increase the max block size.

I would like to rewrite it as follow:

We will formally define "utxoSize", which is

txid (32 bytes) + txoIndex (varInt) + scriptPubKey length (varInt) + scriptPubKey (? bytes) + value (8 byte) + coinbase (1 byte) + coinbase block height (0 or 4 bytes)

This is much larger than the encoding we currently use.

Quote
Code:
utxoDiff = (total utxoSize of new UTXOs) - (total utxoSize of spent UTXOs)
With utxoDiff, we may have a soft-fork rule of one of the followings:
  • 1. Put a hardcoded cap to utxoDiff for each block. Therefore, we will be sure that the UTXO set won't grow too fast; or
  • 2. If a block has a small (or even negative) utxoDiff, a higher MAX_BLOCK_SIZE is allowed
That will encourage miners to accept txs with small or negative utxoDiff, which will keep the UTXO set small.

Having multiple limits makes the linear programming problem of deciding which transactions to include much harder (both computationally and algorithmically); it also means that a wallet cannot precisely set a fees per byte based on the content of the transaction without knowing the structure of all the other transactions in the mempool.

Did you see my proposal where I replace size with a single augmented 'cost' that is a weighed sum of the relevant costly thing we'd hope to limit?  (this wouldn't prevent having worse case limits in any of the dimensions too; if really needed).


legendary
Activity: 1792
Merit: 1111

Personally, I would like to keep hope that the disabled opcodes will be re-enabled eventually Smiley.

There is no point to "re-enable" a code because that's a hardfork. If we are going to re-introduce codes like OP_CAT, the most likely way is to softfork with P2SHv2 or OP_EVALv2
legendary
Activity: 1232
Merit: 1094
I'm not sure how accurate https://en.bitcoin.it/wiki/Script is. In my understanding, the following codes will fail the script, even if they occur in an unexecuted OP_IF branch

RETURN, CAT, SUBSTR, LEFT, RIGHT, INVERT, AND, OR, XOR, 2MUL, 2DIV, MUL, DIV, MOD, LSHIFT, RSHIFT, VERIF, VERNOTIF, 0xba-0xff

Looking at the source code.

The sequence is

fExec is set (true means next opcode is executed)

opcode fails to parse (only fails for double sized opcodes) -> FAIL

last push to stack to large -> FAIL

check number of non-push opcodes > maximum -> FAILS

OP_ -> FAIL

if (fExec || opcode is an IF-type opcode)
   // No curly brackets unfortunately
   case
   OP_VERIFY:  FAIL if not true
   OP_RETURN:  FAIL
   (and all the normal opcodes)

The disabled opcodes are hard FAILS no matter what happens with OP_IF.  OP_RETURN only fails if executed.

Easy rule would be that a script is unspendable if it contains a disabled opcode or OP_RETURN before the first OP_IF.  It is much easier to just check if the first opcode is OP_RETURN.

Personally, I would like to keep hope that the disabled opcodes will be re-enabled eventually Smiley.
legendary
Activity: 1792
Merit: 1111
All UTXOs are considered to be potentially spendable, unless proven otherwise

I think having an exact definition of unspendable is ideal.  This is necessary for UTXO set commitments, since unspendable outputs should not be entered into the UTXO set.

At the moment, CScript.IsUnspendable() is true unless the script starts with OP_RETURN.

I think the disabled opcodes are OK unless they are part of an executed branch?

Are scriptPubKeys always accepted as long as they encode a byte array?  Is there a check that it is valid script when decoding?

The easiest is just to use OP_RETURN at the start to mark unspendable.  The next easiest would be if any of the disabled opcodes (or OP_RETURN) occur outside an OP_IF OP_ENDIF portion of the script.

I'm not sure how accurate https://en.bitcoin.it/wiki/Script is. In my understanding, the following codes will fail the script, even if they occur in an unexecuted OP_IF branch

RETURN, CAT, SUBSTR, LEFT, RIGHT, INVERT, AND, OR, XOR, 2MUL, 2DIV, MUL, DIV, MOD, LSHIFT, RSHIFT, VERIF, VERNOTIF, 0xba-0xff
legendary
Activity: 1232
Merit: 1094
All UTXOs are considered to be potentially spendable, unless proven otherwise

I think having an exact definition of unspendable is ideal.  This is necessary for UTXO set commitments, since unspendable outputs should not be entered into the UTXO set.

At the moment, CScript.IsUnspendable() is true unless the script starts with OP_RETURN.

I think the disabled opcodes are OK unless they are part of an executed branch?

Are scriptPubKeys always accepted as long as they encode a byte array?  Is there a check that it is valid script when decoding?

The easiest is just to use OP_RETURN at the start to mark unspendable.  The next easiest would be if any of the disabled opcodes (or OP_RETURN) occur outside an OP_IF OP_ENDIF portion of the script.
legendary
Activity: 1792
Merit: 1111
I wrote this long time ago when I knew Bitcoin not very well. But I always think this has to be done if we increase the max block size.

I would like to rewrite it as follow:

We will formally define "utxoSize", which is

txid (32 bytes) + txoIndex (varInt) + scriptPubKey length (varInt) + scriptPubKey (? bytes) + value (8 byte) + coinbase (1 byte) + coinbase block height (0 or 4 bytes)

txid is the txid of the utxo
txoIndex is the index of the utxo in the tx, recorded as varInt
scriptPubKey length is the length of the scriptPubKey of the utxo, a varInt
scriptPubKey is the scriptPubKey of the utxo
value is the utxo value in Satoshis
coinbase is an indicator to show whether the utxo is a coinbase reward
if it is a coinbase reward, 4 more bytes is needed to record the block height

I think these are the minimal amount of data needed to store utxo. the utxo set needs to show whether an utxo is from coinbase due to the 100-block rule.

However, if the utxo contains ANY one of the invalid OP_CODE, e.g. OP_RETURN, OP_VERIF, OP_CAT which guarantees the script to fail, the utxoSize is set to zero.

For each block, we need to calculate the net change in UTXO size, which is

Code:
utxoDiff = (total utxoSize of new UTXOs) - (total utxoSize of spent UTXOs)

With utxoDiff, we may have a soft-fork rule of one of the followings:

  • 1. Put a hardcoded cap to utxoDiff for each block. Therefore, we will be sure that the UTXO set won't grow too fast; or
  • 2. If a block has a small (or even negative) utxoDiff, a higher MAX_BLOCK_SIZE is allowed

That will encourage miners to accept txs with small or negative utxoDiff, which will keep the UTXO set small.

If we can make the size of utxo entry a constant (as suggest by TierNolan) for any type of utxo, this could be simplified to

Code:
utxoDiff = (# of new potentially spendable UTXOs) - (# of spent UTXOs)
where
(# of new potentially spendable UTXOs) = (# of new UTXOs) - (# of new UTXOs with invalid OP_CODE)

All UTXOs are considered to be potentially spendable, unless proven otherwise
legendary
Activity: 1232
Merit: 1094
You don't need any extra info for P2PKH or P2SH because you can reconstruct the scriptPubKey with scriptSig.

Yes, of course, sorry.
legendary
Activity: 1792
Merit: 1111
Quote
Also, the etx message won't need to provide scriptPubKey if it is a standard one like P2PK, P2PKH or P2SH. Just use a byte to indicate the type is enough.

It still needs to provide the info to generate the scriptPubKey (address or serialized scriptPubKey).

You don't need any extra info for P2PKH or P2SH because you can reconstruct the scriptPubKey with scriptSig.

(For P2PK and BIP11 multi-sig, however, the etx has to provide the scriptPubKey)
legendary
Activity: 1232
Merit: 1094
ok, but I don't understand the meaning of your: Hash(key_salt | txid | n) maps to Hash(value_salt | OutPoint | height | ... )

Why don't just use an index of Hash(salt|txid|n|height|scriptPubKey|value)? The etx message will provide all these information (expect the salt which is a secret of the node).

I was thinking you need to be able to look it up from the database, so need (txid | n) to be a key.  However, you are right a hash set is sufficient, if the etx has all the required additional info.

This drops things to 8 bytes per UTXO output for serialization and slightly more due to hashset overhead for RAM.

There could be an eblock too.  This allows nearly complete block verification without needing any info outside the eblock.  The only check is that the UTXOs all actually exist.

I think this would help with the consensus library.  You pass it an eblock message and it returns a list of UTXOs that must exist for the block to be valid and also a list of new UTXO hashes.

This means that (almost) all the complexity of block verification (script and crypt) is handled internally in the library.  The only thing that the outside client needs to be is manage the UTXO digests.

It might be better to use hash(salt | hash()), so that the salting can be done outside the consensus library.  The consensus library would use 32 byte digests.

Quote
Also, the etx message won't need to provide scriptPubKey if it is a standard one like P2PK, P2PKH or P2SH. Just use a byte to indicate the type is enough.

It still needs to provide the info to generate the scriptPubKey (address or serialized scriptPubKey).
legendary
Activity: 1792
Merit: 1111
How about the scriptPubKey and value of the utxo? These are not part of the spending tx.

I meant that the info isn't strictly needed in the UTXO database (in general).  An extended transaction format could include the scriptPubKey and the value (and also the height, for coinbase spends).

This moves the cost of storing that info from RAM in every node to whoever wants to spend the transaction.  P2SH outputs already effectively do this (well not the value), but there is no reason that all transactions couldn't support it.  

During the transitiion, tx messages could be converted into etx messages by full nodes.  Eventually, tx messages could be depreciated.  Wallet software would have to manage the info for any coins held in its wallet.

ok, but I don't understand the meaning of your: Hash(key_salt | txid | n) maps to Hash(value_salt | OutPoint | height | ... )

Why don't just use an index of Hash(salt|txid|n|height|scriptPubKey|value)? The etx message will provide all these information (expect the salt which is a secret of the node).

Also, the etx message won't need to provide scriptPubKey if it is a standard one like P2PK, P2PKH or P2SH. Just use a byte to indicate the type is enough.
Pages:
Jump to: