Author

Topic: [RFC] Requirements for headers-only client? (Read 5801 times)

legendary
Activity: 1526
Merit: 1134
Yeah, the first-launch optimizations are definitely the initial goal.

One trick is to track the point at which you have to start downloading full blocks. A simple bug is to start up, create some keys in the wallet and then say "hmm I have keys, guess I need to download full blocks to find out what's in them". Perhaps the keys need to have the current block number as read from the version handshake with them.
newbie
Activity: 8
Merit: 0
All this talk about neato-spiffy future hub-and-spoke supernodes-and-leaves architecture is great.

But I was kinda thinking of first solving the problem that anybody who just wants to download the client, get a few coins from a friend, and then spend them on something has to wait a very long time right now.  That doesn't require any re-architecting of the system or any new networking messages, and should make the "out of the box" bitcoin experience much better for lots of people.

And it is a step towards a future neato-spiffy Uber-efficient hyper++client.


Why would such a person need the entire blockchain?  There's already some hardcoded "checkpoints" in the code; such a person could just download the headers from a recent checkpoint.
legendary
Activity: 1652
Merit: 2301
Chief Scientist
All this talk about neato-spiffy future hub-and-spoke supernodes-and-leaves architecture is great.

But I was kinda thinking of first solving the problem that anybody who just wants to download the client, get a few coins from a friend, and then spend them on something has to wait a very long time right now.  That doesn't require any re-architecting of the system or any new networking messages, and should make the "out of the box" bitcoin experience much better for lots of people.

And it is a step towards a future neato-spiffy Uber-efficient hyper++client.
legendary
Activity: 1526
Merit: 1134
After a few years of bitcoin growth, you'll be asking a smartphone to drink from a firehose (though the TX filtering suggestions in this thread do mitigate that).

Remote filtering means the requirements scale with the economic activity of the device owner rather than the system as a whole - not a problem in the foreseeable future.
legendary
Activity: 1708
Merit: 1010
Personally, I don't see the value of a headers-only smartphone client in places that most users have nearly continuous Internet access on those smartphones; because those who can't stand to let their app update can just use a smartphone client that can remotely control a full client at home or their Mybitcoin account.  The real value in a headers'-only smartphone client is to be able to spend coins from that smartphone in places that the sender does not have (secure and/or reliable) Internet access at all,(think blackout) by directly communicating with the vendor via some ad-hoc method.  (ad-hoc wifi, bluetooth, near-field, Dash7, whatever)  Such headers-only clients don't necessarily need transaction discovery, if whenever the client is 'connected' to another client (full or light) transactions to the headers-only client are 'pushed' to the client and the blocks that support the inputs.  Say you have a full client at home (or are at a Bitcoin bank branch) and want an amount available on your home client (bank account) onto your phone's light client.  You connect via some ad-hoc method (or simply over a wireless router in infrastructure mode) and the full client 'pushes' the transaction and supporting blocks to the light client, and the client can still verify that it's extremely unlikely that the data is either false or corrupted.  Then when it's time to spend those coins, the light client can likewise 'push' the new transaction, the old transaction and the supporting blocks to the next client.  If the next client is also light, it can also verify that it's very unlikely that it's being defrauded by checking those blocks against the matching headers, and a full client can simply check the transaction against it's full blockchain.

This does not require a 'trusted' node in any case, but still requires that the 'pushing' client know at least one of your phone's account numbers, and in practice requires a 'trusted' node, even if it's your own at home.  Once the light client has spent all of the funds, it simply no longer needs those blocks, and discards them.  Such a client functionally does not participate in the p2p network at all, excepting to fetch the most recent headers; and does not forward any transactions or blocks except for it's own.
member
Activity: 98
Merit: 13
Still, latency matters. Assume an average block size of 10kb (not so different from today). If you don't use your wallet app for a day that's a 10*6*24=~1.4mb of data to pull down, perhaps over a 3G link, perhaps from a heavily loaded node. Go on vacation for a week and now it's 10mb when you get back. People don't want to wait for apps to navel gaze, they want to just open the app and it's all there waiting for them. So it's worth thinking about now, because by the time it floats to the top of the TODO list, gets designed, implemented, released and people upgrade in sufficient numbers to have the feature be readily available it might be about time for it to shine.

I tend to think that simple payment clients will hang off some well-known payment processors, who will guarantee instant transactions and assume some double-spending risk.

Direct block chain transaction verification does not seem conducive to mobile clients, even lightweight ones, to me.

After a few years of bitcoin growth, you'll be asking a smartphone to drink from a firehose (though the TX filtering suggestions in this thread do mitigate that).

legendary
Activity: 1526
Merit: 1134
Yeah block sizes mean it's not a big deal today. Though it'd be best if people don't have yet another excuse to artificially limit the block size.

Still, latency matters. Assume an average block size of 10kb (not so different from today). If you don't use your wallet app for a day that's a 10*6*24=~1.4mb of data to pull down, perhaps over a 3G link, perhaps from a heavily loaded node. Go on vacation for a week and now it's 10mb when you get back. People don't want to wait for apps to navel gaze, they want to just open the app and it's all there waiting for them. So it's worth thinking about now, because by the time it floats to the top of the TODO list, gets designed, implemented, released and people upgrade in sufficient numbers to have the feature be readily available it might be about time for it to shine.
hero member
Activity: 755
Merit: 515
To avoid the need to download full blocks, a simple pattern matching command can be added to the protocol. Strawman proposal, a new txmatch command is added.
I don't see much advantage to pushing the IsMine() to supernodes of individual leaf nodes.  Currently blocks are very small and I don't see that changing too dramatically for quite some time.  The load on the network of a thin client (especially if it is just making ~8 outgoing connections and not relaying much/anything) is very small.  Eventually it might be necessary to implement some kind of IsMine() remote stuff, but by then we will all be using justmoon's new stuff anyway (or something very similar). 

I don't think it makes sense for these clients to relay stuff or take part in the network. It just means full nodes will find them via IRC, connect to them and then discover they can't provide the block chain so it was a waste of time. Relaying but not verifying adds latency without any security.
Good point, hadn't really thought it all the way through.
legendary
Activity: 1526
Merit: 1134
I don't think it makes sense for these clients to relay stuff or take part in the network. It just means full nodes will find them via IRC, connect to them and then discover they can't provide the block chain so it was a waste of time. Relaying but not verifying adds latency without any security.
legendary
Activity: 1652
Merit: 2301
Chief Scientist
Raw dump of the notes I got from Satoshi with the headersonly patch (which is in the git tree as headersonly branch):
Quote
Here's my client-mode implementation so far.  Client-only mode only records block headers and doesn't use the tx index.  It can't generate, but it can still send and receive transactions.  It's not fully finished for use by end-users, but it doesn't matter because it's a complete no-op if fClient is not enabled.  It's important to get this in as documentation showing the cut-lines for client-only re-implementers.

If it looks fine to you, go ahead and commit it to SVN.  It should be completely innocuous and no-op.

With fClient=true, I've only tested the header-only initial download.

A little background.  CBlockIndex contains all the information of the block header, so to operate with headers only, I just maintain the CBlockIndex structure as usual.  The nFile/nBlockPos are null, since the full block is not recorded on disk.

The code to gracefully switch between client-mode on/off without deleting blk*.dat in between is not implemented yet.  It would mostly be a matter of having non-client LoadBlockIndex ignore block index entries with null block pos.  That would make it re-download those as full blocks.  Switching back to client-mode is no problem, it doesn't mind if the full blocks are there.

If the initial block download becomes too long, we'll want client mode as an option so new users can get running quickly.  With graceful switch-off of client mode, they can later turn off client mode and have it download the full blocks if they want to start generating.

My plan was to dive into what Satoshi wrote already, understand it, test it in fClient=true mode (sending/receiving/relaying transactions on testnet), fix whatever is broken/unimplemented.

And then write code to switch from fClient=true to fClient=false, downloading full blocks, etc.  And then writing code that does the toggle when generation is turned on for the first time or when getwork is called (I think those are the only times you need full blocks).

I haven't looked at or thought about the relaying code.  Simply relaying all transactions (without checking to see if they're valid) if fClient=true should work nicely.
legendary
Activity: 1526
Merit: 1134
I'll try and answer as I've imagined it playing out - these are all just proposals. I have enough things on my plate with BitCoinJ and more to keep me busy for a while.

My plan was to do some changes to the official codebase to support serving lightweight clients but not to actually make it be lightweight itself. The main reason is I don't have enough confidence to do serious surgery on the code without any unit tests to catch errors. Others who have made changes more often (like Gavin) might be able to do a better job. Satoshi sent us a partially complete patch before he left, I didn't look at how complete it is but I think it only implements the initial first-launch download optimization, not everything.

OK, with that, here goes. Firstly some terminology. I'm going to call these things SPV clients for "simplified payment verification". Headers-only is kind of a mouthful and "lightweight client" is too vague, as there are several other designs that could be described as lightweight like RPC frontend and Stefans WebCoin API approach.

Quote
What are the engineering assumptions we will make, WRT headers-only clients?

Minimum hardware goal: smartphone (of two years ago) or better class of devices, all the way up to desktops. SPV for servers/merchant nodes might be interesting in a few years but for now, the additional security is probably worth running a full node. Non-goal: smartcards, J2ME candybar phones etc.

What that means concretely, memory usage <16mb total, startup time <500msec.

Protocol: a slightly extended (backwards compatible) of the existing P2P protocol. Connecting to newer nodes means a more optimal experience but is otherwise not necessary.

Quote
Is the merkle tree or any other data besides block header required?

Currently SPV clients need to download full blocks from the moment their wallet is created (first key). The reason is that's the only way to discover new transactions. getheaders can be used as an optimization for new users. It's not even required to download the full chain. You can start from the last checkpoint and not go all the way back to the genesis block.

Quote
If just the block header, how will a client verify a received TX?  Surely at least the block's merkle tree is needed?

Sort of. Merkle tree doesn't help you because you don't know which nodes in the tree are yours. Merkle branch on the other hand ...

To avoid the need to download full blocks, a simple pattern matching command can be added to the protocol. Strawman proposal, a new txmatch command is added. The contents are:

  • a request id
  • a range of block numbers to search between
  • a regularly formatted transaction, except that the scriptSigs and scriptPubKeys can contain pattern matching opcodes as used in the solver today. Other fields are ignored. The SIGHASH flags might be useful for controlling how the inputs/outputs are matched - need to think about it more.

The full node then loads each block in the range and checks each transaction for a match. This could be quite intensive so some pushback message might be necessary. If the node is overloaded, the client can try another.

The response is a "txmatches" message containing the same request id plus a list of txmatch structures:

  • block hash it appeared in
  • merkle branch linking the tx to the header
  • transaction data itself

The SPV client can now check the server is not lying to it by verifying the merkle branch against the block headers it has downloaded (possibly in parallel with this operation).

Quote
How will a lightweight client find new TX's sent to it, when it only knows its own keypairs?  Either it must request full blocks w/ all TX's, or someone queries nodes for TXs/keys in a privacy-squelching manner?

Yes, indeed. A simple implementation forces you to choose between requesting everything (ick) or revealing your public keys to a trusted node. There are some other ways to do it:

  • Extend the pattern matching opcode language to allow for prefix matches. Choose a short prefix. Now you get some transactions that aren't relevant but also the ones that are. A better design might allow you to provide arbitrary Bloom filters over addresses/pubkeys.
  • Split up your request across many nodes so any individual node only sees a few.
  • Do something a bit like onion routing. Contact 10 nodes, request a public key from them. Now connect to 10 other nodes and send a txmatch message but including an IP address of the one of the 10 nodes and a request ID (so they can be multiplexed together). The pattern match tx is encrypted under that public key (you can assume there's a stable ip to pubkey mapping). Now it's hard for nodes to know which computers own which keys. They just get requests from random IPs.

I haven't thought about the last one much. I think for now just using a bunch of trusted nodes would work OK, kind of like checking your mail. They can tie a few keys together but not all.

Quote
What are the ramifications of some nodes on the P2P network having only partial blocks?   Will we need to introduce some sort of seek-nodes-with-full-block-contents logic for lightweight clients to find supernodes?

SPV clients do not accept connections nor do they register with peer discovery mechanisms. They are not really participants in the P2P network at all. The P2P traffic is really just between the full backbone type nodes.

Quote
What are the ramifications of partial blocks on the JSON-RPC API, if any?

For now just disabling JSON-RPC if you're not running a full node is probably fine. If you want to mine or be a merchant requiring you to handle the block chain isn't too much to ask. In future it might be but by such a point, there'll be a lot more manpower available Wink

Quote
How will old clients behave, when faced with partial blocks?  Surely we want them to keep working?

They won't request them so the transition is backwards compatible.

Quote
My initial thoughts turn towards pruning spent tx's, because, if you can do that (and the network survives), you can handle partial blocks.

I think the two are probably unrelated though both worth doing. Pruning is really about optimizing full node storage. Disk space is so cheap today it's not a big deal IMHO. In future if nodes start regularly running out of IOPs then people may be forced to host the block chain in RAM. At that point pruning would certainly become more interesting.
hero member
Activity: 755
Merit: 515
My not-fully-thought-through idea:
A supernode/leaf node system similar to gnutella:
supernodes continue to function with full headers and such, leafnodes continue to see full blocks as they pass over the network, but don't store anything more than the headers and txes which belong to them (IMHO merkle tree of txes is not required).  leafnodes won't bother checking a received transaction, but will simply keep an eye on the blockchain for confirms (and possibly not show 0/unconfirmed transactions in gui?). 

In order to retain backward compatibility, leafnodes use a different address encoding scheme on the irc (ie adding - in front of their nick or similar).  Also, leaf nodes connect exclusively to supernodes as the supernodes can check txes and blocks before passing them across the network to the leaf nodes.  In order to allow the network to hold enough supernodes, we implement something similar to gnutella in that if your node is up x hours/day on average, has enough disk space, and has enough spare cputime, it upgrades itself to supernode (and has an option to set this manually), downloads the full blockchain and starts verifying transactions. 

I'm sure this has problems, anyone care to poke some holes?
member
Activity: 98
Merit: 13
People keep talking about a headers-only client as a major step towards making bitcoin a realistic payment platform, enabling the network to scale further by separating nodes into super-nodes and lightweight-leaf clients.

Unless I'm missing a wiki page somewhere (cluebat cheerfully requested), a thread needed to be started gathering all the minor, concrete to-do items and baby steps needed to get the official bitcoin codebase to the headers-only goal line.

Questions...

What are the engineering assumptions we will make, WRT headers-only clients?

Is the merkle tree or any other data besides block header required?

If just the block header, how will a client verify a received TX?  Surely at least the block's merkle tree is needed?

How will a lightweight client find new TX's sent to it, when it only knows its own keypairs?  Either it must request full blocks w/ all TX's, or someone queries nodes for TXs/keys in a privacy-squelching manner?

What are the ramifications of some nodes on the P2P network having only partial blocks?   Will we need to introduce some sort of seek-nodes-with-full-block-contents logic for lightweight clients to find supernodes?

What are the ramifications of partial blocks on the JSON-RPC API, if any?

How will old clients behave, when faced with partial blocks?  Surely we want them to keep working?

What are some small, concrete, baby steps that the official client codebase can take, towards this goal?  My initial thoughts turn towards pruning spent tx's, because, if you can do that (and the network survives), you can handle partial blocks.

Concrete technical answers requested, this is about completing the version 1 bitcoin design, not about totally redesigning bitcoin.  Gavin, Mike and others are full of talk about lightweight clients.  Let's make it happen Smiley
Jump to: