Bitcoin Core 0.12 introduced a new blocksonly setting. When set to blocksonly a node behaves normally but sends and receives no lose transactions; instead it handles only complete blocks. There are many applications for nodes where only confirmed transactions are interesting, and a node which still verifies and forwards blocks still contributes to network health-- less, perhaps, than one that relays transactions: but it also consumes fewer resources to begin with. An additional downside they don't get the latency advantages of signature caching since every transaction they see is totally new to them-- this isn't something miners should use.
How much less bandwidth does blocksonly use in practice? I recently measured this using two techniques: Once by instrumenting a node to measure bandwidth used for blocks vs all other traffic, and again by repeatedly running in both modes for a day and monitoring the hosts total network usage; both modes gave effectively the same result.
How much is the savings? Blocksonly reduced the node's bandwidth usage by
88%.
A significant implication of this is that any scheme for bandwidth reduction which works by using already relayed transactions to reduce the size of transmitted blocks can _AT MOST_ reduce the overall bandwidth usage by 12%-- assuming that differential compression achieved an "infinity-fold" improvement. (An example of
Amdahl's law)
Why does relay use so much bandwidth? The fundamental reason for this is because relay in the Bitcoin Protocol is linear in the number of transactions and the number of peers. (E.g. bandwidth is a function of Transactions * Peers). The protocol uses an 'efficient' INV scheme --- but this scheme only reduces the bandwidth by a constant factor: For every transaction a 36 byte long INV is sent (or received) from _every_ peer, and the full transaction is only sent once. Since the INV is a significant fraction of the transaction size (e.g. 36 bytes vs 300) this means that every 10 or so peers uses just as much bandwidth as sending the full transaction every time would take.
For the subset of nodes where blocksonly meets their needs it's available now and gives pretty much the greatest possible bandwidth savings. Full stop. For blocks the INV process is much more efficient because the ratio of INV size to a megabyte block is so large. But what can we do for all the other cases?
In the literature there are schemes for quite efficient gossip but there seems to be a trade-off between efficiency and attack robustness (for example, you could arrange nodes in a minimum spanning tree rooted at some super node(s) which would be optimal for repeated bandwidth but insanely unreliable and vulnerable to attacks). Even the simple INV scheme is vulnerable to attack: peers can delay transactions by INVing but then sending the data only after a long delay. With ephemeral anonymous peers, attack resistance is very important to us-- but fortunately we don't need to achieve optimal efficiency.
One possible scheme I've been discussing (and working on) for a while is mempool reconciliation. My work here was originally motivated by the desire to avoid relay costs for transactions which are just going to be quickly replace or fall below the relay threshold, but this can also be applied to drastically reduce relay costs.
The idea is this, at random intervals a node will ask one of it's peers for a sketch of the top X MBytes of it's mempool. In it's request it could also signal that it's interested only in transactions whos ancestor feerate is over some minimum threshold, along with information about the size and feerate of its own mempool. The returned sketch is an IBLT (see
this reddit post for an EL5 explanation that I created for IBLT) of some size estimated by the sender based on the information sent by the requester. Unlike the efficient block transmission IBLT proposals, this IBLT only transmits transaction IDs-- which will avoid the fragmentation overhead needed in sending whole transactions.
The requester then attempts to reconstruct the IDs using the content of it's own mempool. If it is unable to reconstruct all of the IDs, it requests from another random peer with a different seed.
When the node has multiple IBLTs it can use the partial solutions from each of them to help solve the other ones.
As it learns new txids that it was previously unaware of it then fetches them via getdata. When it completes a reconstruction it can INV any missing transactions towards the peers that didn't have them.
(To avoid IBLT batching delays slowing transaction propagation too much, two peers can be elected to receive INVs for every transaction immediately; similar to the existing trickle-for-privacy scheme).
Most of the academic literature on IBLTs is concerned about parameter selections that minimize the asymptotic overhead (size of IBLT vs set difference) for perfect reconstruction of the set. These parameters, however, result in quite high probabilities for complete failure. In our case, being able to recover some of a set is fine both because mempools don't need to be completely consistent, and requests to other peers will help recover the missing entries.
A similar pattern happened (for the same reason) in the invention of
fountain codes-- (one can approximately think of a IBLT as the dual of a fountain code). To achieve better odds of reconstruction, the order of nodes in the graph (isomorphic to the number of hash buckets in IBLT) should be randomly selected from a special distribution, instead of a constant. The effect is that the mixture of node orders reduces the probability that a decode gets stuck, because it is more likely that there will be be some entries that have the optimal density. For IBLT this can be done by using the hash of the set member to pick the number of buckets for that set just like it is used to pick which buckets are used.
But right now I'm not aware of any research on what these optimal order distributions would look like for IBLTs given assumptions around capacity ratios... so I think there is a fair amount of basic science that needs to be worked out in order to really nail down a good design. I've been trying to work on making more progress here for a couple months, but the toxic environment makes the grind a little difficult. I thought I would checkpoint my thinking here in the hope that someone would have some more thoughts to contribute.
In any case, this scheme would avoid the quadratic-like behavior of relay bandwidth. It would let nodes trade latency off vs relay overhead, and it would allow for a graceful way of handling the highest priority transactions first. -- transactions a dozen or more blocks deep in the mempool are not going to get mined any time soon, so if they have lower relay latency from reconciling them less frequently that is no great harm. I believe it could do so without substantial increases in vulnerability to attack. The same software infrastructure could also be used for bandwidth minimized block transfer for nodes that do have mempools (otherwise blocksonly gives optimal bandwidth gains), though latency minimization is much better accomplished via techniques like Matt's efficient block relay protocol-- since the most important thing to minimize latency is to minimize roundtrips.