Author

Topic: Transaction pruning details (Read 3950 times)

hero member
Activity: 588
Merit: 500
June 03, 2011, 01:43:22 PM
#19
If pruning ever happens, I see there's still a need for a value-added service -- providing versioning history of bitcoin block chain.
Some one may need to prove to others certain transaction(s) happened a few years later.

Certain sites will always keep the entire transaction history, like Block Explorer, miners, etc. So I don't think that will be an issue.
legendary
Activity: 1441
Merit: 1000
Live and enjoy experiments
June 02, 2011, 09:25:17 AM
#18
If pruning ever happens, I see there's still a need for a value-added service -- providing versioning history of bitcoin block chain.
Some one may need to prove to others certain transaction(s) happened a few years later.
administrator
Activity: 5222
Merit: 13032
May 31, 2011, 08:41:42 AM
#17
Wait, how deeply must an output be buried before it can be pruned?

Very deep: you must know for sure that the transaction will never be needed for checking again (due to a reorg).

I'd say at least 1000 blocks -- probably more.
ene
newbie
Activity: 42
Merit: 0
May 31, 2011, 06:31:31 AM
#16
How will this be checked - will all the clients running on everybody's machines do the pruning and check each other.

Why would they need to check anything? The only possible problem is a bug in the code, but then this would affect everybody equally.

The rest of your post makes no sense either.
newbie
Activity: 7
Merit: 0
May 31, 2011, 04:19:03 AM
#15
Code:
Tx0
  -> 50 to A [0]

Tx1
  <- import 50 from Tx0:0
  -> 1 to B [0]
  -> 49 to A [1]

Tx2
  <- import 1 from Tx1:0
  -> 1 to C [0]

Tx3
  <- import 1 from Tx2:0
  -> 1 to D [0]

I agree that when C would send 1 bitcoin to D in transaction 3, transaction 2 could be pruned.

I have to correct myself here, Tx2 cannot be pruned before Tx1 is pruned.
hero member
Activity: 588
Merit: 500
May 31, 2011, 04:14:17 AM
#14
Wait, how deeply must an output be buried before it can be pruned?
member
Activity: 126
Merit: 10
May 31, 2011, 03:17:37 AM
#13
The pruner will have a lot of power manually manipulating the blockchain bookkeeping!! How will this be checked - will all the clients running on everybody's machines do the pruning and check each other.

For that matter all previous transactions up a few branch levels in the spending tree can be ommitted if the majority of the clients agree on this.

An arbitrary number of three branch levels and 6 confirmations x 3 = 18 confirmations (3 hours) should be enough.  This will allow plenty of scalibiliy.  Backup copies of every 3 hours pruning can be kept for back auditing the network's integrity.  This may allow for a lighter client and a heavier client with slower pruning intervals.
newbie
Activity: 7
Merit: 0
May 31, 2011, 02:42:21 AM
#12
Thanks a lot for the very detailed explanation. I think I completely get it.

Storage size magnitude will not be in the order of total past transactions, but in the order of total (non-empty) accounts (or addresses).

One final question, a bit off-topic: Would it be necessary to treat coinbase transactions differently? Couldn't they just be pruned like normal transactions?

 
legendary
Activity: 1526
Merit: 1134
May 31, 2011, 02:23:07 AM
#11
It's really not possible to think about pruning without talking about inputs/outputs. You're getting yourself confused because you're thinking in terms of transactions. Here's how your example would look:

Code:
Tx0
  ->50 to A  [0]

Now A sends 1 bitcoin to B.

Code:
Tx0
  -> 50 to A [0]

Tx1
  <- import 50 from Tx0:0
  -> 1 to B [1]
  -> 49 to A [2]

At this point the only output of Tx0 has been used up, so it can be deleted once buried deeply enough. B sends 1 coin to C.



Code:
Tx0
  -> 50 to A [0]

Tx1
  <- import 50 from Tx0:0
  -> 1 to B [0]
  -> 49 to A [1]

Tx2
  <- import 1 from Tx1:0
  -> 1 to C [0]

At this point output 0 of Tx1 has been used but output 1 has not been, thus Tx1 cannot be pruned. At some point, output 1 of Tx1 will also get used up and it can then also be pruned entirely.

As theymos points out, it's probably possible with some clever programming to delete individual outputs and leave only a "was spent" bitfield in place.

Transactions that reference non-existant transactions in their inputs are considered "orphan" and go pending. They don't get processed again until their dependencies are satisfied. If Tx1 was fully pruned, but you had a copy of it, broadcasting it again wouldn't help because eventually you'd have to go all the way back to coinbase transactions which could just be treated specially (ie having a "coinbase was spent" flag in the pruned data structures). So perhaps 80 bytes per block is too optimistic, but it's likely possible to get close to that figure.

At any rate, Bitcoin has bigger scalability problems than storage. It doesn't worry me much.
newbie
Activity: 7
Merit: 0
May 31, 2011, 02:06:41 AM
#10
My point is to try and find out how efficiently previous transactions can be pruned.

Transaction 0 can be pruned, but it's information (and stored data) is effectively moved to transaction 1.

Transaction 1 also contains the data of A transferring a coin to B. I expected this data to be prunable after transaction 2. But transaction 1 can only be pruned when A has spent its remaining 49 bitcoins, which might take a long time.

I agree that when C would send 1 bitcoin to D in transaction 3, transaction 2 could be pruned.

Or, if A would send 1 bitcoin to D in transaction 3, transaction 3 would take transaction 1 as referenced transaction, send 1 bitcoin to D and 48 back to A. After this, transaction 1 could then also be pruned. This need not take so long.

Could you confirm I am thinking correctly here?
administrator
Activity: 5222
Merit: 13032
May 31, 2011, 01:25:38 AM
#9
Data has decreased, since you would otherwise store both transaction 0 and transaction 1. Transaction 1 can also be forgotten once both of its outputs are spent.

Miners could probably also forget individual spent outputs.
newbie
Activity: 7
Merit: 0
May 31, 2011, 01:07:28 AM
#8
It is possible. If someone rebroadcasts the same transaction, it will be rejected because you don't have its previous transaction.

Let me restate my example to be more in accordance with the specification.

Say at some point A, B and C own 0 bitcoins.

In transaction 0, 50 bitcoins are sent to A (A just found a block).

A sends 1 bitcoin to B in transaction 1. I read in the forum (http://forum.bitcoin.org/index.php?topic=8689.msg125994#msg125994) that the current client will 'use up' all of the money from transaction 0 and send the change to a new address (or back to A I presume). I think that is why you say the previous transaction doesn't exist?

Then B sends 1 bitcoin to C in transaction 2. Now transaction 0 can be safely forgotten, but the amount of data to be stored hasn't really decreased, since the complete transaction 1 (containing 2 tx_outs) is to be stored. Am I right here?
administrator
Activity: 5222
Merit: 13032
May 30, 2011, 04:15:24 PM
#7
It is possible. If someone rebroadcasts the same transaction, it will be rejected because you don't have its previous transaction.
newbie
Activity: 7
Merit: 0
May 30, 2011, 04:07:35 PM
#6
No, transaction inputs are specified in terms of txhash:output_index pairs, not addresses. You don't import value from an address, you literally import it from a particular transaction.

The pruning process isn't implemented today, but when it is there are a few possibilities. One is that the tx won't be completely deleted, the hash of it will still stick around in the database. It's 32 bytes instead of several hundred, so it's still a big win. Another possibility is that some nodes will keep the whole chain and if a tx input isn't known, pruning nodes can contact archival nodes and ask them for the tx. If they don't have it, it's an orphan transaction. If they do, it means the connected tx was pruned and thus it's a double spent.

You're saying that the option that the tx won't be completely deleted is just one of the possibilities. What I am very interested in, is whether there is a possibility that previous transactions can be completely deleted, as is claimed in Nakamoto's paper.

After studying the protocol, for now my conclusion is that this is not the case. Referencing a former transaction instead of an account unfortunately doesn't make a difference. Could anyone please invalidate my conclusion?

Honestly, I'm skeptical pruning will ever really be needed. Assuming continued growth, disk storage is so cheap that the cost of keeping the old parts of the chain around will be dwarfed by the costs of keeping up with the head.

Whether pruning is necessary or not is another question, but I'm glad you brought it up. Assuming only the transaction hash sticks around like you say and 2000 transactions per second as mentioned on the scalability page, this means a storage need of 32 * 2000 * 3600 =5529600000  bytes = 5.5 Gb per day. Not to mention that this must be stored in such a way that it is searchable quickly, which will likely increase storage needs.

All and all, quite a lot more than the 'best cast' of 80 bytes per block (or 1.16 MB / day) that is mentioned on the scalability page. I hope you understand why I'm so damned curious about this.

legendary
Activity: 1526
Merit: 1134
May 30, 2011, 01:11:06 PM
#5
No, transaction inputs are specified in terms of txhash:output_index pairs, not addresses. You don't import value from an address, you literally import it from a particular transaction.

The pruning process isn't implemented today, but when it is there are a few possibilities. One is that the tx won't be completely deleted, the hash of it will still stick around in the database. It's 32 bytes instead of several hundred, so it's still a big win. Another possibility is that some nodes will keep the whole chain and if a tx input isn't known, pruning nodes can contact archival nodes and ask them for the tx. If they don't have it, it's an orphan transaction. If they do, it means the connected tx was pruned and thus it's a double spent.

Honestly, I'm skeptical pruning will ever really be needed. Assuming continued growth, disk storage is so cheap that the cost of keeping the old parts of the chain around will be dwarfed by the costs of keeping up with the head.
newbie
Activity: 7
Merit: 0
May 30, 2011, 12:32:23 PM
#4
The first one can be forgotten (assuming it only has one output). It will never again be needed for verifying other transactions, since it can't be redeemed again.

Thank you, in yours and Mike's answer I read the same, transaction 1 can be forgotten. But, what happens then if B keeps a copy of the transaction and after a while resends it to the network? My guess is that, if the original transaction 1 was pruned, another 1 bitcoin is transferred from A to B?
administrator
Activity: 5222
Merit: 13032
May 30, 2011, 12:23:51 PM
#3
Forget for a while about transaction fees, but can one of these transactions be pruned (after being confirmed by enough blocks in the longest chain)?

The first one can be forgotten (assuming it only has one output). It will never again be needed for verifying other transactions, since it can't be redeemed again.
legendary
Activity: 1526
Merit: 1134
May 30, 2011, 12:22:47 PM
#2
You can prune a transaction once all of its outputs are spent and it has been buried under "enough" blocks to make a chain re-org very unlikely.
newbie
Activity: 7
Merit: 0
May 30, 2011, 12:01:38 PM
#1
Having studied Bitcoin in quite some depth (the 'scalibility page', Nakamoto's paper and some forum questions), I cannot find an answer to the question:

How is it that old transactions can be pruned from the ledger that the blockchain is.

Simple example:

Say at some point A owns 10 bitcoins, B and C own 0 bitcoins.

A transfers 1 bitcoin to B (transaction 1), after which B transfers 1 bitcoin to C (transaction 2).

Forget for a while about transaction fees, but can one of these transactions be pruned (after being confirmed by enough blocks in the longest chain)?

If not, can someone please give me a typical example in which a transaction can be pruned?
Jump to: