Author

Topic: Protocol Buffers for Bitcoin (Read 13222 times)

full member
Activity: 150
Merit: 100
August 03, 2010, 02:16:20 PM
#19
Indeed, I would tend to trust a library many people use over something I myself wrote. Maybe I shall start working on a patch to try and include protobuffers as a first try mechanism (send a protobuf, if the remote end responds with confusion, send a standard packet instead)
member
Activity: 115
Merit: 10
August 03, 2010, 09:46:33 AM
#18

Indeed, but the version packet is probably the smallest packet of all the ones sent, so we'll gain more elsewhere. Also, keep an eye on the main point. The fact that protocol buffers are smaller is a nice aside to the fact that they're Forwards compatible and make bitcoin portable between languages.

Debugging is also easier with non-custom formats.  Instead of being the only one using it, you have many other people on different projects looking for and fixing bugs.  You also often get tools for decoding/displaying the packet to make it easier to see if something is wrong.  IMHO the size of the packet is the least important reason.
full member
Activity: 150
Merit: 100
August 03, 2010, 05:20:23 AM
#17
The reason I didn't use protocol buffers or boost serialization is because they looked too complex to make absolutely airtight and secure.  Their code is too large to read and be sure that there's no way to form an input that would do something unexpected.
I hate to sound rude, but that sounds like the danger with the SCRIPT field in transactions. You're comfortable writing a whole evaluation language letting the blocks suggest operations to the client, but you're not comfortable using a library like protocol buffers?

I don't really know much about the scripts, this sounds like a good point. Are the scripts turing complete? If not, how powerful are they? From what I've heard elsewhere it sounds like a pushdown automata.

Quote from: martin
Would you consider including an option to write the wallet file out in protocol buffer format instead of the custom format? That way the default can be the custom format which you trust more, and users can export their wallet to protobuf format if they want to move to a new client.
Why not use XML for that case? The size of the wallet file on disk isn't exactly a big concern when it comes to export, and XML compresses pretty well. Plus, it's completely human readable - it would help people to understand what is actually stored.

That's a good point, I guess you should use XML for the wallet file export.

In fact, surely you should use xml for the wallet file all the time? Then again you need a serialisation library for XML, so maybe that answers that question :/
full member
Activity: 210
Merit: 105
August 02, 2010, 09:23:50 PM
#16
The reason I didn't use protocol buffers or boost serialization is because they looked too complex to make absolutely airtight and secure.  Their code is too large to read and be sure that there's no way to form an input that would do something unexpected.
I hate to sound rude, but that sounds like the danger with the SCRIPT field in transactions. You're comfortable writing a whole evaluation language letting the blocks suggest operations to the client, but you're not comfortable using a library like protocol buffers?

Quote from: martin
Would you consider including an option to write the wallet file out in protocol buffer format instead of the custom format? That way the default can be the custom format which you trust more, and users can export their wallet to protobuf format if they want to move to a new client.
Why not use XML for that case? The size of the wallet file on disk isn't exactly a big concern when it comes to export, and XML compresses pretty well. Plus, it's completely human readable - it would help people to understand what is actually stored.
full member
Activity: 150
Merit: 100
August 02, 2010, 06:05:35 PM
#15
To be honest, I'd be more confident using a library written by google and used by thousands than I would a library written by myself. But that's just me Wink

The main advantage of protocol buffers was forwards compatibility (which you say we have, how is that supported currently) and cross compatibility between languages, which is important for bitcoin to develop in my opinion. How easy would it be to read the current structure from another language?

Would you consider including an option to write the wallet file out in protocol buffer format instead of the custom format? That way the default can be the custom format which you trust more, and users can export their wallet to protobuf format if they want to move to a new client.
founder
Activity: 364
Merit: 7553
August 02, 2010, 03:22:08 PM
#14
The reason I didn't use protocol buffers or boost serialization is because they looked too complex to make absolutely airtight and secure.  Their code is too large to read and be sure that there's no way to form an input that would do something unexpected.

I hate reinventing the wheel and only resorted to writing my own serialization routines reluctantly.  The serialization format we have is as dead simple and flat as possible.  There is no extra freedom in the way the input stream is formed.  At each point, the next field in the data structure is expected.  The only choices given are those that the receiver is expecting.  There is versioning so upgrades are possible.

CAddress is about the only object with significant reserved space in it.  (about 7 bytes for flags and 12 bytes for possible future IPv6 expansion)

The larger things we have like blocks and transactions can't be optimized much more for size.  The bulk of their data is hashes and keys and signatures, which are uncompressible.  The serialization overhead is very small, usually 1 byte for size fields.

On Gavin's idea about an existing P2P broadcast infrastructure, I doubt one exists.  There are few P2P systems that only need broadcast.  There are some libraries like Chord that try to provide a distributed hash table infrastructure, but that's a huge difficult problem that we don't need or want.  Those libraries are also much harder to install than ourselves.
full member
Activity: 150
Merit: 100
August 02, 2010, 06:37:33 AM
#13
Why do you consider it a breaking change? There's no reason you couldn't first try with the new protocol and then retry using the old bitcoin serialization technique.

That's a very good idea, and I would say it's the way to go with this.

Also I think this is a change that should be made sooner rather then later while the BitCoin community is still small. It's already been a major blocker in making new clients and delaying it is going to hamper bitcoin's adoption.

Since it's a non breaking change, it should be done as soon as possible in my opinion, for those very reasons.

The question remains, is anyone willing to help implement it? I'm an experience programmer but I have no C++ experience unfortunately, so I'm gonna need a little help if I try to do this myself Wink
newbie
Activity: 10
Merit: 0
August 01, 2010, 09:43:43 PM
#12
Why do you consider it a breaking change? There's no reason you couldn't first try with the new protocol and then retry using the old bitcoin serialization technique. Also I think this is a change that should be made sooner rather then later while the BitCoin community is still small. It's already been a major blocker in making new clients and delaying it is going to hamper bitcoin's adoption.
full member
Activity: 150
Merit: 100
July 31, 2010, 09:11:05 AM
#11
The "0x00" groups each represent one byte.

Oops Embarrassed

breaking change

I think the best way to phase in protocol bufferswould to avoid breaking changes to start with, instead start with protocol buffers for the local files (like the wallet), which would gain us a little bit of size on disk, ease of reading the wallet file in other software, and get some experience using protocol buffers. Then is the time to start phasing in protocol buffers for networking in my opinion.

Does the current version of bitcoin have any handling for ignoring chunks of a packet? If so, phasing in protocol buffers could be as simple as writing the current packet AND writing the protocol buffer (as an ignored field for older clients), then once enough people have upgraded get rid of the old encoding.
full member
Activity: 210
Merit: 105
July 31, 2010, 08:45:23 AM
#10
The encoded protocol buffer is just 55 bytes, wheras the bitcoin version is 85 0x00 sets (each one representing 2 bytes each I assume). This means that my badly designed protocol buffer is over half the size of the hand built layout!
The "0x00" groups each represent one byte. The length of the standard version packet is 87 bytes plus 20 for the header. The header could be massively optimized as well:
Code:
message start "magic bytes" - 0xF9 0xBE 0xB4 0xD9
command - name of command, 0 padded to 12 bytes "version\0\0\0\0\0"
size - 4 byte int
checksum (absent for messages without data and version messages) - 4 bytes
Obviously using proto buffers here, while absolutely a breaking change, would save a fair bit of space, especially because the "I've created a transaction" packet has the name "tx" meaning that there's at least 10 bytes of overhead in every one of those packets.
full member
Activity: 150
Merit: 100
July 30, 2010, 11:54:27 AM
#9
The encoded protocol buffer is just 55 bytes, wheras the bitcoin version is 85 0x00 sets (each one representing 2 bytes each I assume). This means that my badly designed protocol buffer is half the size of the hand built layout!

I realize that you are evangelizing for protocol buffers (and you seem to be doing a very good job of it too, I might add), but I will challenge that hand built data layouts are always bad.

Still, I hope this does give some food for thought and on a practical basis any improvement in the network protocol that shaves off a few bytes is always better.  This doesn't seem to sacrifice too much in terms of the overhead either.  More significantly, you are calling attention to an area of efficiency that needs to be addressed and is very helpful to the project.  Thank you for doing that.  I'm hoping to get caught up to where you are at now on this protocol business.

They're not always bad. However, if you put in so much effort that your hand built packet was smaller than a protocol buffer then you're probably putting too much effort into a micro optimisation Wink

I'll be happy to help anyone catch up with the protocol buffers. If someone is willing to work with me I'd even work on a patch, I have very little C++ experience so I can't do it alone unfortunately.

I think using protocol buffers as the serialization format is a good idea, but I don't think just switching to protocol buffers "buys" enough to be worth the effort (at least not now, when transaction volume is low).

I would disagree, protocol buffers are smaller which is nice, but it's not their main advantage - they're forwards compatible which is a hugely important thing in a p2p network, they're also something which can easily be used in many languages, which make implementing new clients in new languages easier, which in my opinion is vital for bitcoin.

FYI, it is pointless to make a packet smaller than 60 bytes -- the minimum size of an Ethernet packet.  Packets are padded up to 60 bytes, if they are smaller.

Indeed, but the version packet is probably the smallest packet of all the ones sent, so we'll gain more elsewhere. Also, keep an eye on the main point. The fact that protocol buffers are smaller is a nice aside to the fact that they're Forwards compatible and make bitcoin portable between languages.
legendary
Activity: 1596
Merit: 1100
July 30, 2010, 10:47:33 AM
#8
FYI, it is pointless to make a packet smaller than 60 bytes -- the minimum size of an Ethernet packet.  Packets are padded up to 60 bytes, if they are smaller.
legendary
Activity: 1652
Merit: 2316
Chief Scientist
July 30, 2010, 09:28:23 AM
#7
Speaking of the network...
... is there any really robust, generic, low-latency, open source p2p network "middleware" out there?

I think using protocol buffers as the serialization format is a good idea, but I don't think just switching to protocol buffers "buys" enough to be worth the effort (at least not now, when transaction volume is low).

I'd like to see some experimenting with running bitcoin on top of a different networking layer (and use protocol buffers, too).  Is there a p2p network that is designed to be extremely highly reliable and difficult to infiltrate or attack with malicious nodes?
full member
Activity: 224
Merit: 141
July 30, 2010, 09:13:27 AM
#6
The encoded protocol buffer is just 55 bytes, wheras the bitcoin version is 85 0x00 sets (each one representing 2 bytes each I assume). This means that my badly designed protocol buffer is half the size of the hand built layout!

I realize that you are evangelizing for protocol buffers (and you seem to be doing a very good job of it too, I might add), but I will challenge that hand built data layouts are always bad.

Still, I hope this does give some food for thought and on a practical basis any improvement in the network protocol that shaves off a few bytes is always better.  This doesn't seem to sacrifice too much in terms of the overhead either.  More significantly, you are calling attention to an area of efficiency that needs to be addressed and is very helpful to the project.  Thank you for doing that.  I'm hoping to get caught up to where you are at now on this protocol business.
sr. member
Activity: 308
Merit: 250
July 30, 2010, 07:30:23 AM
#5
Some people have been suggesting that protocol buffers might be larger than the custom written packet layout. I suspect that actually it would be *smaller* due to some of the clever encoding used in protocol buffers.
I agree that it could be smaller; not necessarily because of clever encoding, but because it would allow us to drop reserved bytes and the like.

Not only does it allow it to drop reserved fields, but it uses ZigZag encoding and some other tricks to keep integers and the like as absolutely small as possible.  So yea, it uses clever encoding. =P  It's also blazingly fast to process!
full member
Activity: 150
Merit: 100
July 30, 2010, 06:37:59 AM
#4
I used the above protocol buffer (as I said before, it's probably not optimal) and data obtained via http://www.alloscomp.com/bitcoin/version.pys as test data.

Quote
Version: 306
nLocalServices: 1
nTime: 1280487684
addrYou: #.#.#.#:#### (nServices: 1)
addrMe: #.#.#.#:#### (nServices: 1)
nLocalHostNonce: 2359069617775922941
vSubStr: ""
nBestHeight: 71137

The encoded protocol buffer is just 55 bytes, wheras the bitcoin version is 85 0x00 sets (each one representing 2 bytes each I assume). This means that my badly designed protocol buffer is over half the size of the hand built layout!
full member
Activity: 150
Merit: 100
July 30, 2010, 02:29:55 AM
#3
Some people have been suggesting that protocol buffers might be larger than the custom written packet layout. I suspect that actually it would be *smaller* due to some of the clever encoding used in protocol buffers.
I agree that it could be smaller; not necessarily because of clever encoding, but because it would allow us to drop reserved bytes and the like.

That too, although the counter argument people always make to that is that we could do away with reserved bytes anyway. No matter how impractical that would be :/

To resolve this, I think a test is in order, I shall encode a wallet file/network packet using protocol buffers and compare the size the packets in the current scheme. However, I have no idea what's in a packet, what data is stored in a packet, and in what format?
That would be the hard part, of course. If you want to test with the version packet (not really ideal, since it's only sent once per connection), I've decoded that fully:
https://bitcointalksearch.org/topic/m.6250

I was hoping for a transaction packet or something, but I'll give it a go with that for now. I could also test with the wallet file if anyone has decoded that?

Addendum:

Ok, Working from this summary of the version packet layout:

Quote
version
    * {0xf9,0xbe,0xb4,0xd9}
    * "version" (0x00 padded)
    * 4 byte message size
    * 4 byte checksum
    * 8 byte nLocalServices (always 1 if !fClient, no idea either what that means)
    * 8 byte timestamp (remember to use network byte order)
    * Remote address (the address this Node thinks he is):
          o nServices - uint64 (8b), still cryptic, don't know the meaning yet
          o pchReserved - (12b): some reserved space, apparently for later IPv6
          o ip - uint (4b)
          o port - unsigned short (2b)
    * Local address (the address this Node sees you under):
          o nServices - uint64 (8b), still cryptic, don't know the meaning yet
          o pchReserved - (12b): some reserved space, apparently for later IPv6
          o ip - uint (4b)
          o port - unsigned short (2b)
    * 8 byte nLocalHostNonce (needed for a handshake, if I'm not mistaken)
    * A subversion string ".0" in my case
    * nBestHeight - int (4b): appears to be the last block number

I created this protocol buffer definition:

Quote
message version
{
   message AddressInfo
   {
      required unint64 nServices;
      required fixed32 ip;
      required uint32 port;
   }

   required uint32 magic = 2045;         //0xf9 | 0xbe << 1 | 0xb4 << 2 | 0xd9 << 3
   required uint32 version;
   required int64 checksum;
   required uint64 timestamp;
        required uint64 nLocalServices;

   required AddressInfo Remote;      //the address this node thinks he is
   required AddressInfo Local;      //the address this node sees you under

   required fixed64 nLocalHostNonce;
   required string SubversionString;
   required uint32 nBestHeight;
}

Does that look correct? The only changes I've made are that the indented things in the bullet point list are nested message types, and I've completely dropped the 12 bytes of reserved ipv6 space (since that can easily be added in later, which is the point of protocol buffers). I should point out that I probbaly haven't picked the best encoding types for all these fields, that depends upon the values they're likely to store, so in practice the packet will probably be a little smaller than my tests indicate
full member
Activity: 210
Merit: 105
July 29, 2010, 08:04:06 PM
#2
Some people have been suggesting that protocol buffers might be larger than the custom written packet layout. I suspect that actually it would be *smaller* due to some of the clever encoding used in protocol buffers.
I agree that it could be smaller; not necessarily because of clever encoding, but because it would allow us to drop reserved bytes and the like.

To resolve this, I think a test is in order, I shall encode a wallet file/network packet using protocol buffers and compare the size the packets in the current scheme. However, I have no idea what's in a packet, what data is stored in a packet, and in what format?
That would be the hard part, of course. If you want to test with the version packet (not really ideal, since it's only sent once per connection), I've decoded that fully:
https://bitcointalksearch.org/topic/m.6250
full member
Activity: 150
Merit: 100
July 29, 2010, 06:29:31 PM
#1
There has been a discussion going on elsewhere about using protocol buffers for bitcoin. To summarise the advantages:

-> Small encoding
-> Very fast
-> Implementations in loads of languages (So writing new clients become a lot simpler)
-> Forwards compatible (indeed, this is most of the point of protocol buffers)
-> Extremely simpleto use in code

So initially I would suggest storing the wallet file using protocol buffers, this isn't a breaking change and immediately makes the wallet file easier for other programs to parse. Eventually I would hope that bitcoin could use protocol buffers for networking.

Some people have been suggesting that protocol buffers might be larger than the custom written packet layout. I suspect that actually it would be *smaller* due to some of the clever encoding used in protocol buffers. To resolve this, I think a test is in order, I shall encode a wallet file/network packet using protocol buffers and compare the size the packets in the current scheme. However, I have no idea what's in a packet, what data is stored in a packet, and in what format?
Jump to: