I have been trying to figure out exactly what information exists in each transaction and each block. I understand the merkle trees and the chain of block headers, and how everything is linked through hashes. What I don't understand is what is actually contained in a TxIn and OutPoint object. Does each transaction reference a previous TxOut? Does that TxOut have to have enough BTC to cover the amount of the TxIn? If so, doesn't it get complicated to track partial balances in all previous TxOuts that eventually have to be accumulated if the user wants to empty those accounts?
Perhaps I'm missing something. There are three things I really want to know:
(1) Are the transactions chained to one another such that you can simply follow the chain backwards to get all the information you need to know about a particular address, given the last transaction involving that address?
My current understanding of how this works, derived from
https://en.bitcoin.it/wiki/Transactions,
https://en.bitcoin.it/wiki/Script and looking at transaction in blockexplorer, is as follows. It's probably easier to think of Bitcoins being stored at transaction outputs, each of which has a value and an associated Bitcoin address (a hash of a public key). One can get at those Bitcoins by specifying the full public key for the address and a signature for the transaction. Since this signature can only be created using the private key for that address, only the "owner" of that address can get at those Bitcoins. Each transaction then says "take *all* the Bitcoins that were at TxOut n_1 of Tx t_1, n_2 of t_2, ..., [here are all the right public keys and signatures] and create TxOut's holding b_1 BTC at address a_1, a_2 BTC at address a_2, ...".
There are never "partial balances" stored at a given transaction output. This is why when you send a small number of BTC to someone, the transaction actually has a TxIn with a lot of BTC, and two TxOut's, one with the BTC for the recipient of the send, and one with the "change" sent back to you (at a different address, which never shows up in the client GUI). See, for example, this transaction:
http://blockexplorer.com/tx/56faebec0694f42c201b0afbd2327dc823b8298b3aa4bb3313bab3e2fe026f44. On the other hand, there may be many unclaimed TxOut's in the block chain belonging to the same address, so it's not true that given the last transaction for a particular address, you can reconstruct the total number of BTC that can be sent from that address.
(2) Is the specification telling me that I will need to hold the blockchain if I want to construct a transaction? What is the minimum amount of information I would need to hold on my phone, for my lite phone-client to be able to construct a valid transaction? I was planning to keep only the block headers, balance of the account, and the private key. I don't want to store any block information. Will I have to keep previous transaction data?
You will need to know a transaction with unclaimed TxOut's to the address that you're sending from. You don't need to store the whole blockchain, though at present, you will have to have downloaded and analyzed the blockchain starting from the time when the sender's address first had any Bitcoins. For example, BitcoinJ (the Java library) keeps a list of addresses that belong to you. Whenever it receives a new block at the tip of the main chain, it scans it for transactions with any of your addresses at TxIn's or TxOut's, and keeps a local copy of those transactions. It ignores everything else in the block. In the future, presumably there will be a way of either (a) asking a trusted service or a number of your peers for all recent transactions involving a given address [possibly through pattern matching to avoid revealing to these third parties that you're the address' owner], or (b) asking your peers to only forward the parts of blocks that are relevant to you. You can then verify that a transaction has been incorporated into the block chain if you know its merkle branch.
(3) What information is contained in a transaction that prevents it from being repeated/re-broadcast by an attacker? Is it linked to a specific block? If that transaction is broadcast just as the next block is solved, does it have to be re-broadcast? I thought a transaction could be included in any future block, but then I don't know what would prevent someone you just paid from re-broadcasting and paying themselves again.
Any TxOut can only be redeemed by a TxIn of a single transaction. This rule gets enforced when the transaction is incorporated into the block by a miner. Nodes will refuse to accept blocks with transactions violating this rule, so, for instance, the blocks from a malicious miner that incorporate double spends from a single TxOut won't propagate through the network.
You can broadcast a transaction as many times as you want, and it will only be incorporated once into the block chain. My understanding is that the client broadcasts its transactions to all of its connected peers, and rebroadcasts them every 30 minutes until it sees the transaction incorporated in a block.
Someone else can't take your transaction and pay themselves because the signatures at each TxIn cover the *entire* transaction, not just the TxOut that they're redeeming. Your so-called "friend" who you just paid would have to create a new transaction redeeming the TxOut from your address and sending it elsewhere, but he is unable to generate a valid signature for this new transaction since he doesn't have your public key.
And on a side note, I haven't quite figured out how block "timestamping" works. In some places it looks like timestamps are 32-bit unsigned numbers, in some cases they are block numbers between 0 and 2015. I am uncertain how the unix-time timestamps could work when you have unsynchronized clocks across all nodes.
The block timestamps are all 32-bit UTC unix times that are "network-adjusted". That is, the client gets from its peers (in the version message) their relative what time they think it is, and computes the offset from local time to each of the "peer" times. The median offset is thereafter used to convert local time to network time. This only synchronizes the clients roughly, but enough to get the difficulty calculation mostly right, which is all that timestamps are even used for. The way this works is that a block will be accepted by a node only if its timestamp is greater than the median timestamp of the preceeding 11 blocks (so you can't add blocks with a timestamp that's too low, and try to lower the difficulty on the next retarget) and if the timestamp is no further than 2 hours into the future (so that the timestamp is not too high, and you increase the difficulty on the next retarget). Retargets happen every 2016 blocks (about two weeks), which is where I guess your 0 and 2015 limits are coming from.
If I want to reference a specific block, do I only provide the header hash and the node searches the headers for it? Or is it okay to say "Block #122,245" ?
You can only get specific blocks by hash number, not height on the block chain. Blockexplorer allows you to search by height, and every client with an accurate blockchain can determine what block is at a particular height, but there's no way in the protocol to ask a peer for a block by height.
Thanks for you patience with my questions. I am anxious to start work on a new client, but the specification clear enough for me.
-Eto
Hope that helped!