Author

Topic: Extending Transactions to include UTXO info (Read 1088 times)

legendary
Activity: 1232
Merit: 1094
I am thinking of splitting the BIP into two parts.  The first part would be the updated network protocol.  This should be less controversial.

Network Protocol

The can be implemented without a soft fork of any kind.  It is purely a protocol change and so should be less controversial.

  • Automatic reindex of undo info (run once)
  • Add new service bit EXTENDED_TX
  • Update transaction serialization
  • Add upgrade method to add info to transactions received from legacy clients
  • Check/Upgrade raw transactions received over rpc
  • Update the add to memory pool code to only allow new type transactions
  • Update the code that reads blocks from disk to use undo info to add extra information

I think the service bit is a waste of a bit.  Eventually, most nodes would upgrade.  A UTXO-lite node could just disconnect from legacy clients when making outbound connections.

Soft Fork

The canonical digest could be defined as Hash(canonical_salt | new OutPoint serialization).

New nodes would use block version 4, with the normal thresholds for rule activation.

If 750 of the previous 1000 blocks are version 4 or higher, the rule is active for all blocks.  Otherwise, the canonical salt is reset to zero.

If the rule is active and the canonical salt is set to zero, then the canonical salt is set to the hash of the block being connected to the chain and a collision test is performed.  If that causes a collision (after UTXO set updates for the current block), the canonical salt is reset to zero.

Once a non-colliding canonical salt is found, it is locked and subsequent blocks which cause a collision are rejected.

If 950 of the previous 1000 blocks are version 4 or higher, version 3 and lower blocks are rejected.

I need to check how long iterating through the UTXO set would take to check for collisions.  If this uses significant CPU/disk resources, then a DOS attack can be launched by activating and deactivating the rule.

If it is greater than around a minute, then a more efficient way to test a new canonical salt would be required or hysteresis added to the super-majority function.

With 20 million entries and 8 bytes per entry, that gives 160MB of RAM to create the set.  If a hash set is used, it would be closer to 200-250MB of RAM.  This is probably OK for a once off burst.

It is also 20 million hashes.  CPUs can handle around 1Mhash per second.  That gives 20 seconds just for the hashing.

[Edit]

I wonder if the soft fork is really worth it.  The collision avoidance is likely not worth it, since local salting is still required to protect against finding a collision with the CanonicalDigests. 

[Edit2]

UTXO commitments should probably still use the full 32 bytes, since they can't be protected by a local salt. 
legendary
Activity: 1232
Merit: 1094
I was thinking of adding a UTXO_RAW option.  This would be for historical blocks only.  All new blocks would be stored in the new format.  This would mean that nodes don't need to reprocess their historical data.

Miners would have to fully update though.

Now that I think about it, does the "undo" information contain the UTXO set update information?  That would allow old nodes to generate blocks with the additional information.

[Edit]
Looks like the undo data is for all tx inputs in the block.  Was this data generated for historical blocks for nodes that updated?

[Edit2]

Assuming that is the case, it is probably easier to just keep the current disk serialization.  When loading a block from disk, both block and undo data could be loaded at the same time to create the new blocks.

[Edit3]

It looks like the undo information doesn't contain the height or coinbase status of the previous transaction, except when spending the final UTXO in the transaction.  Storing this information for coinbase transactions only would be a relatively small increase in the database size.

It means that all non-coinbase transactions would have to have the same minimum height, i.e. zero.  This means that a colliding transaction couldn't simply wait until the next block.

Alternatively, the height could be stored for all UTXOs.
legendary
Activity: 1232
Merit: 1094
After thinking more carefully, the updated proposal with softfork bears a huge risk when the length of the UTXO digest is not enough.

Consider an attacker can mine the following message:

Code:
36bytes nonce|2100000000000000|attacker's pk_script|any height in the past

As long as the result matches ANY existing digest in the UTXO set, the attacker will generate 21M bitcoin for himself as there is no other way for UTXO-lite nodes to detect it. This is a birthday attack and is much easier than mining for a particular digest. Whether the real UTXO has 1 satoshi or 1M bitcoin is irrelevant. If there is still any legacy nodes, there will be a hard-fork.

I think the best solution is to combine the soft fork with the salt system.  I like the way the soft fork gives guaranteed uniqueness, rather than saying that a collision is unlikely.

Each node computes two digests.

CanonicalDigest = Hash256(UTXO info)
SaltedDigest = Hash256(node_specific_salt | Canonical)

Internally the node uses LocalDigest = SaltedDigest[N-1:0] | CanonicalDigest[7:0] as its digest.  The consensus rule is that no two UTXOs can have the same CanonicalDigest.  This guarantees that there are no collisions for the LocalDigest.

N could be picked as a node policy.  A value of 4 seems reasonable.  Mining pools might select a larger value.  Changing the number would require a database re-index.

The UTXO set would be stored as a map of

CanonicalDigest to SaltedDigest

That would allow CanonicalDigest collisions to be detected.

Mining pools would have to be careful that their node_specific_salt is not leaked.  That is a risk only to the mining pool.  Unless the pool has more than 50% (and only uses 1 full node), then the other miners will reject that miner's blocks. 

Even if an attacker knew the salts for pools representing greater than 50% of the network, generating a transaction that is accepted by all would be very difficult.  If 4 pools had more than 50% of the power between them and they use 4 byte SaltedDigests, then two transactions need to give the same result for

CanonicalDigest[7:0] | SaltedDigest1[3:0] | SaltedDigest2[3:0] | SaltedDigest3[3:0] | SaltedDigest[3:0]

That gives a 24 byte search space, which is better protection than RIPEMD160 and it assumes that all 4 pools have leaked their node_specific_salt and only used 4 byte SaltedDigests. 

I definitely think the canonical digest should be 8 bytes in this case.  With the soft fork rule, UTXO collisions are dealt with.  8 bytes means a 2^64 UTXO space.  That gives 134,217,728 TB.  I think if the UTXO space ballooned to that size, then the network has bigger problems.  The current client couldn't handle it, so it would mean a "hard" client modification anyway (or lots of "soft" client modifications that are each compatible with the previous version).
legendary
Activity: 1792
Merit: 1111
After thinking more carefully, the updated proposal with softfork bears a huge risk when the length of the UTXO digest is not enough.

Consider an attacker can mine the following message:

Code:
36bytes nonce|2100000000000000|attacker's pk_script|any height in the past

As long as the result matches ANY existing digest in the UTXO set, the attacker will generate 21M bitcoin for himself as there is no other way for UTXO-lite nodes to detect it. This is a birthday attack and is much easier than mining for a particular digest. Whether the real UTXO has 1 satoshi or 1M bitcoin is irrelevant. If there is still any legacy nodes, there will be a hard-fork.

Therefore, the digest has to be long enough, with 20 or even 32 bytes. But even with 32 bytes I think it might not be as secure as the current model, as I can't see a practical way to generate 21M bitcoin even if I am able to find a SHA256D collision under the current model.

Interestingly, your original proposal with node-specific-salt does not have this problem as it is not a consensus rule and the salt ensures that a collision may affect only a few node
legendary
Activity: 1232
Merit: 1094
I updated the BIP again to address concerns raised.
legendary
Activity: 1232
Merit: 1094
I like this idea but it implicitly contradicts with BIP30 (https://github.com/bitcoin/bips/blob/master/bip-0030.mediawiki) and has a remote risk of hardfork, in theory.

As UTXO-lite nodes will not store the index of not-fully-spent txid, they are unable to check whether the txid of new tx is same as any existing not-fully-spent tx. If a collision happens, legacy nodes and UTXO-lite nodes will disagree with the validity of the new tx and lead to hardfork.

Interesting.  Two transactions cannot have the same txid unless they build off the same coinbase before the height field was added.

Quote
Of course, as we already have BIP34, a collision like this means a successful preimage attack against SHA256D. Therefore, I'm not sure if this should be considered as really risky.

If a pre-image attack is successful, then Bitcoin has bigger problems.  

Is it safe other than that?

There are 2 pairs which violate the rule:

91812: http://blockexplorer.com/block/00000000000af0aed4792b1acee3d966af36cf5def14935db8de83d6f9306f2f
91842: http://blockexplorer.com/block/00000000000a4d0a398161ffc163c503763b1f4360639393e0e4c8e300e0caec

91722: http://blockexplorer.com/block/00000000000271a2dc26e7667f8419f2e15416dc6955e5a6c6cdf3f2574dd08e
91880: http://blockexplorer.com/block/00000000000743f190a18c5577a3c2d2a1f610ae9601ac046a38084ccb7cd721

The coinbase from 91812 and 91722 hasn't been spent yet, so they can't be now.  Therefore 91880 is the only one that can be spent.

A special rule excluding the coinbase for blocks 91812 and 91722 from the UTXO set would cover this situation?  A check similar to the BIP30 check could be included to make sure the block hashes match.

Quote
Instead of 8 bytes as you suggested, use 10 or 12 bytes. Similar to BIP34, a new consensus rule will require that new tx must not have an UTXO-lite identifier same as an existing unspent UTXO. If there is a collision, miners will just include the tx to the next block, which will change the tx's UTXO-lite identifier. Also, no node_specific_salt is needed and all UTXO-lite nodes will share the same UTXO-lite index. We may even commit the hash of the UTXO-lite set to the blockchain.

Yeah, I was thinking forward to UTXO set commitments.

Do you think new messages for block and tx are the way to go.  The alternative is to just use the protocol version to decide what fields should be included.

Another question is if nodes should re-index the block files.  Old blocks would have to be updated.  This means a step where the blk???.dat files are all updated.

The soft fork would be just to enforce the non-collision property.
legendary
Activity: 1792
Merit: 1111
I created a draft BIP for extended transactions.

Transactions just point to the transaction outputs that they are spending.  No additional information included.  This means that the UTXO database of every node has to store the information about the UTXO (value, scriptPubKey and height).

An extended transaction is the same as a normal transaction, except that this information is included.  It is purely for information purposes.  The original transaction format is used for signing and the txid.

The advantage is that only the wallet which owns the coin needs to actually store the information about the UTXO.  If they want to spend the coin(s), they can create an extended transaction message and broadcast it.

Each UTXO entry could be 8 bytes instead of the current average of 36 bytes at the moment.

Full nodes already contain the entire UTXO set.  If they receive a standard transaction, they can add the extra information to make the extended transaction.

Future nodes, which use the reduced UTXO set size, wouldn't be able to process legacy transactions.  This would give 3 types of node.  

  • Legacy nodes
  • Border nodes
  • UTXO-lite nodes

UTXO-lite nodes would only make outgoing connections from legacy nodes, since they can't process their transactions.

I am undecided if new message names are the way to go.  If each peer is required to either use blocks/txs or eblocks/etxs, then it is a waste making them different.

I like this idea but it implicitly contradicts with BIP30 (https://github.com/bitcoin/bips/blob/master/bip-0030.mediawiki) and has a remote risk of hardfork, in theory.

As UTXO-lite nodes will not store the index of not-fully-spent txid, they are unable to check whether the txid of new tx is same as any existing not-fully-spent tx. If a collision happens, legacy nodes and UTXO-lite nodes will disagree with the validity of the new tx and lead to hardfork.

Of course, as we already have BIP34, a collision like this means a successful preimage attack against SHA256D. Therefore, I'm not sure if this should be considered as really risky.

And if we could ignore the theoretical risk, I'd suggest to formally enact your BIP as a soft-fork.

Instead of 8 bytes as you suggested, use 10 or 12 bytes. Similar to BIP34, a new consensus rule will require that new tx must not have an UTXO-lite identifier same as an existing unspent UTXO. If there is a collision, miners will just include the tx to the next block, which will change the tx's UTXO-lite identifier. Also, no node_specific_salt is needed and all UTXO-lite nodes will share the same UTXO-lite index. We may even commit the hash of the UTXO-lite set to the blockchain.
legendary
Activity: 1232
Merit: 1094
I created a draft BIP for extended transactions.

Transactions just point to the transaction outputs that they are spending.  No additional information included.  This means that the UTXO database of every node has to store the information about the UTXO (value, scriptPubKey and height).

An extended transaction is the same as a normal transaction, except that this information is included.  It is purely for information purposes.  The original transaction format is used for signing and the txid.

The advantage is that only the wallet which owns the coin needs to actually store the information about the UTXO.  If they want to spend the coin(s), they can create an extended transaction message and broadcast it.

Each UTXO entry could be 8 bytes instead of the current average of 36 bytes at the moment.

Full nodes already contain the entire UTXO set.  If they receive a standard transaction, they can add the extra information to make the extended transaction.

Future nodes, which use the reduced UTXO set size, wouldn't be able to process legacy transactions.  This would give 3 types of node. 

  • Legacy nodes
  • Border nodes
  • UTXO-lite nodes

UTXO-lite nodes would only make outgoing connections from legacy nodes, since they can't process their transactions.

I am undecided if new message names are the way to go.  If each peer is required to either use blocks/txs or eblocks/etxs, then it is a waste making them different.
Jump to: