Bitcoin Protocol Specification | Bitcointalksearch.org

Gyrsur

legendary

Activity: 2856

Merit: 1520

Bitcoin Legal Tender Countries: 2 of 206

7 years in production and no official specification. it's a shame!

Hal

vip

Activity: 314

Merit: 4276

According to Gavin's https://github.com/gavinandresen/bitcointools/blob/master/NOTES.txt, serialization of any vector object gets preceded by a count of the number of elements in the vector, in the variable-length 1/3/5/9 byte format. I added this count field to the new wiki, e.g. to addr messages. Also, block messages contain a vector of their transactions, so that part is also preceded by a variable-length count.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

The best suggestion I've seen for saving bandwidth is that completed blocks should not contain full transaction bodies but only the Merkle tree. Nodes that somehow missed the original tx broadcast for some nodes in the tree would then just getdata peers for them in the usual manner.

This isn't an issue today but it would make running nodes cheaper if BitCoin goes to extreme scales, like thousands of transactions a second.

RHorning

full member

Activity: 224

Merit: 141

Quote from: Cdecker on December 07, 2010, 07:04:07 AM

What exactly is this used for then: http://www.bitcoin.org/wiki/doku.php?id=bitcoins_draft_spec_0_0_1#variable_sized_data ?

The only place I currently see that being used is in scripts. Thanks to Theymos the ideas behind scripting are less opaque but it still is pretty arcane for those who really want to get into the gritty details of Bitcoin.

In theory it could be put into the protocol eventually as a way to save bandwidth, but so far I haven't seen it used in that way. If that was a goal, it would seem that there would be some other concepts in place that would facilitate data compression more effectively and perhaps even be more extensible too.

Cdecker

hero member

Activity: 489

Merit: 505

What exactly is this used for then: http://www.bitcoin.org/wiki/doku.php?id=bitcoins_draft_spec_0_0_1#variable_sized_data ?

Cdecker

hero member

Activity: 489

Merit: 505

I wonder when they changed that one. It was incredibly hard to get the size out of the message, and we had to switch between size length according to the first byte.

RHorning

full member

Activity: 224

Merit: 141

Quote from: Cdecker on December 04, 2010, 08:04:55 AM

Sometimes I cant resist but question satoshis choices: a UINT64 size field? It's incredibly hard to implement in Java ( well not really BigInteger helps) and do we really need messages larger than 4GB (4 bytes)? UINT 64 would allow for messages of 18.45 Exabytes. That's more than all the world movies put together.

I think I'll simply drop messages requiring UINT64 sizes.

Are you asking about the message header "size" field, indicating how large the message packet itself is? I thought that was just a simple 4-byte int value followed by a 4-byte checksum. That format information comes from main.cpp and also implemented in net.h:

Code:

//

   // Message format

   // (4) message start

   // (12) command

   // (4) size

   // (4) checksum

   // (x) data

   //

On the whole, most messages are quite small, with the exception of the transaction messages themselves which can grow to sizes on the order thousands of bytes (10k is the limit for a single script per input or output). In theory some of the other messages could get fairly large, but still on that order of magnitude peaking at about 50k in extreme situations. I can see where a shortint is perhaps too small and that a complex transaction with dozens of inputs and outputs might need more than 64k bytes, but you are correct that there is no need to get past the gigabyte range for message sizes.

*Edit* I also found this little snippet of code relevant to this discussion:

Code:

static const unsigned int MAX_SIZE = 0x02000000;

(from serialize.h)

This is the current maximum size for any single message on the network at the moment, as something larger than this is simply going to be rejected.

Cdecker

hero member

Activity: 489

Merit: 505

Sometimes I cant resist but question satoshis choices: a UINT64 size field? It's incredibly hard to implement in Java ( well not really BigInteger helps) and do we really need messages larger than 4GB (4 bytes)? UINT 64 would allow for messages of 18.45 Exabytes. That's more than all the world movies put together.

I think I'll simply drop messages requiring UINT64 sizes.

theymos

administrator

Activity: 5222

Merit: 13032

Quote from: RHorning on December 01, 2010, 07:31:19 PM

My question is in a couple parts: Is this in the roadmap for getting implemented in the future or is this simply an idea that hasn't really been completely thought through? What kind of security issues are there in terms of a 3rd party "changing" the transaction information and simply updating to a new transaction version? Or is this a "no later than" type of notification where the transaction expires after a certain block number has been created?

It is an interesting feature to Bitcoins if it could be pulled off. Apparently most miners are not paying attention to this attribute as well, and it may be something to reconsider.

A transaction can't be included in a block if its lock time is in the future. Even now blocks breaking this rule will be rejected.

The feature is designed to work with in-memory transaction replacement, which is currently disabled (it was enabled in older versions):

Code:

// Disable replacement feature for now
return false;

// Allow replacing with a newer version of the same transaction
if (i != 0)
return false;
ptxOld = mapNextTx[outpoint].ptx;
if (!IsNewerThan(*ptxOld))
return false;
for (int i = 0; i < vin.size(); i++)
{
COutPoint outpoint = vin[i].prevout;
if (!mapNextTx.count(outpoint) || mapNextTx[outpoint].ptx != ptxOld)
return false;
}
break;

IsNewerThan() checks that the input's sequence number is lower than the other version. Lower sequence=newer.

This disabled feature is not network-enforced in any way, so it could be enabled at any time.

You can't replace a transaction unless you can sign it. So it should be safe. It might be unsafe if you're using inputs that can be redeemed by more than one person: the other person could make your transaction invalid (but not steal your other inputs).

It was probably disabled because it makes accepting transactions with 0 confirmations really unsafe. It could be safely re-enabled if transactions were only replaceable if they actually specify a non-zero lock time, and this was marked in the UI.

Quote from: satoshi on November 15, 2010, 01:37:44 PM

nTimeLock does the reverse. It's an open transaction that can be replaced with new versions until the deadline. It can't be recorded until it locks. The highest version when the deadline hits gets recorded. It could be used, for example, to write an escrow transaction that will automatically permanently lock and go through unless it is revoked before the deadline. The feature isn't enabled or used yet, but the support is there so it could be implemented later.

RHorning

full member

Activity: 224

Merit: 141

Going over the transaction specs, I noticed a "lock time" attribute on each transaction. With this, there is apparently some sort of protocol envisioned for being able to push transactions to various nodes but also require them to be included at some future block instead of being processed immediately. In other words, it is a request to miners to not include the transaction "no earlier than" some particular block number. In addition, there is the ability for details about the transaction to be modified subsequent to its inclusion into a block.

My question is in a couple parts: Is this in the roadmap for getting implemented in the future or is this simply an idea that hasn't really been completely thought through? What kind of security issues are there in terms of a 3rd party "changing" the transaction information and simply updating to a new transaction version? Or is this a "no later than" type of notification where the transaction expires after a certain block number has been created?

It is an interesting feature to Bitcoins if it could be pulled off. Apparently most miners are not paying attention to this attribute as well, and it may be something to reconsider.

RHorning

full member

Activity: 224

Merit: 141

Quote from: Cdecker on November 27, 2010, 07:20:51 PM

Let's try to keep this thread alive and unbury it with new findings while we go along. One fact that I stumbled over (for several hours today, hurting myself as I went) is that all numbers in the protocol are not encoded in network byteorder, but rely on little endian. I guess that would be pretty important if we are to create a documentation.

I think there are two ways to look at the protocol, a high level one, where everything is expressed in nice words and comparisons, and another dearly needed one that details the actual information and format on the wire.

I hope you've looked at the "draft spec" that I've been writing where I've put some of this information in, but your input is very much appreciated. I forgot to mention the byte order as it is a huge detail, but something I've come to expect from projects like this. About the only thing that is recorded in "network byte order" that I'm aware of at the moment is timestamp structure, and that is in part because the structure is defined in a library not written by Satoshi. Nothing personal against Satoshi here either, as all that is going on is that he isn't re-ordering the bytes as the vast majority of the clients are using Intel architecture on their computers. It simply makes the software a whole lot easier to write so far as transmitting the data.

This is also a pet peeve of mine as it opens up the whole little endian vs. big endian debate. This is also where Intel going against the grain on this issue has sort of messed things up and a tale of how architecture decisions made decades ago continue to come back and impact everybody in sometimes significant ways. For the most part, other than as a potential bug when you are trying to read/write data on a shared data format used by multiple computer systems (aka on a CD-ROM or via the internet) it rarely is even a problem.

At the moment I'm trying to wrap my head around the transaction and block formats in the network data sharing protocol. A whole bunch is buried in there and isn't very well documented in terms of what it is doing. If you could help in that regard, let me know too!

Cdecker

hero member

Activity: 489

Merit: 505

Let's try to keep this thread alive and unbury it with new findings while we go along. One fact that I stumbled over (for several hours today, hurting myself as I went) is that all numbers in the protocol are not encoded in network byteorder, but rely on little endian. I guess that would be pretty important if we are to create a documentation.

I think there are two ways to look at the protocol, a high level one, where everything is expressed in nice words and comparisons, and another dearly needed one that details the actual information and format on the wire.

One nice detail to add is for example that each messahe starts with a 4-byte magic

Code:

_magic = '\xf9\xbe\xb4\xd9'

.

Also in the original design a lot of attention went into how the size of a message is encoded:

Code:

def getSize(self):
first = self.getUByte()
if first == 255: return self.getUInt64()
elif first == 254: return self.getUInt()
elif first == 253: return self.getUShort()
else: return first

But message types are simply encoded with a padded 16 byte string. So I'm starting to wonder about the design choices. Why make the size field optimized when the other part of the message is large always? No offense intended, but this kind of things just make it hard to implement.

Oh and when using Java you might pay close attention on how to read unsigned data types (again, something I had to bang my head against before realizing my error Roll Eyes

)

RHorning

full member

Activity: 224

Merit: 141

Quote from: theymos on November 24, 2010, 06:57:25 PM

Quote from: RHorning on November 24, 2010, 05:09:31 PM

For those familiar with the network level protocols, what is the difference between getblocks and getdata?

Have you seen this?
http://www.bitcoin.org/wiki/doku.php?id=network

As a matter of fact, I missed that page. Thank you so much for putting the effort into writing that explanation. It really does make a difference.

As a side note, we really need to put together some menus or something that links deep into the wiki, or at least put references to it on other pages.

I've been trying to collect content related to the protocol for some time, so every little bit helps. Again, thanks!

theymos

administrator

Activity: 5222

Merit: 13032

Quote from: RHorning on November 24, 2010, 05:09:31 PM

For those familiar with the network level protocols, what is the difference between getblocks and getdata? Both seem to be a list of hashes representing blocks which need to be sent to the requesting node.

One difference I can see is with the "getblocks" command/packet type will request a range of blocks, while getdata requests individual blocks. Is this the only difference or is there something more significant that I'm missing here? I'm trying to figure out when this particular packet type might be used instead or why there seems to be a duplication of block request methods seemingly doing the same thing.

Have you seen this?
http://www.bitcoin.org/wiki/doku.php?id=network

Getdata requests a specific block or transaction by hash. You generally only send a getdata after you receive an inv listing a block/tx that you don't already have. Getblocks requests an inv containing the hashes of all blocks in a range (max 500 at a time). It's used for initial block download and re-syncing after some downtime.

Getblocks (client) -> inv (server) -> getdata (client) -> block (server)
Send one getblocks, get an inv with 500 entries, send 500 getdata messages, receive 500 block messages. This sounds inefficient, but the download is actually very fast (it's the verification that eats up most of the "download" time).

RHorning

full member

Activity: 224

Merit: 141

For those familiar with the network level protocols, what is the difference between getblocks and getdata? Both seem to be a list of hashes representing blocks which need to be sent to the requesting node.

One difference I can see is with the "getblocks" command/packet type will request a range of blocks, while getdata requests individual blocks. Is this the only difference or is there something more significant that I'm missing here? I'm trying to figure out when this particular packet type might be used instead or why there seems to be a duplication of block request methods seemingly doing the same thing.

RHorning

full member

Activity: 224

Merit: 141

Quote from: da2ce7 on November 24, 2010, 03:28:24 AM

Quote from: Gavin Andresen on November 23, 2010, 12:10:57 PM

[... , I] it think writing informal specifications documenting how bitcoin works right now is a great idea, and will be really helpful when it is time to go through some standardization process.

This is the most important thing to happen, IMHO, doing so would dramatically lower the barriers of entry of creating 2nd generation bitcoin clients independent of the reference implementation.

So if it would take many man_months of work to develop a formal specification, then how long would it take to develop a 'good enough' informal specification?

I think this is the wrong way to look at it, particularly given the mostly volunteer nature involve with the operation of Bitcoins at the moment. There have been several attempts to start the documentation process, and the important thing to do now is to build upon those efforts and get what information anybody knows down into some usable form. Documentation of Bitcoins all around is sort of weak, and even if you aren't a programmer it would still be useful to at least try to explain the concepts of Bitcoins in some way that perhaps even non geeks can understand them.

There is also a whole bunch of useful information which is now getting buried in these forum threats, so indexing these discussions would also be helpful in some way, although for the specific details of the operation of Bitcoins ultimately falls upon the source code of the reference implementation written by Satoshi.

Like trying to eat an elephant, it takes time and patience where you can only take one bite at a time. If you can read the source code and understand even a portion of it, get that knowledge recorded or simplified if you can. At that point we can debate the merit or lack there of for specific decisions in the current design. My experience is also that once something is established and not challenged, that it tends to become something permanent in nature even on an "open source" project. Right now, most people don't even know what to start challenging because the details are buried in code. I'm hoping that a "good enough" documentation effort can at least bring some of those issues to the front.

da2ce7

legendary

Activity: 1222

Merit: 1016

Live and Let Live

Quote from: Gavin Andresen on November 23, 2010, 12:10:57 PM

[... , I] it think writing informal specifications documenting how bitcoin works right now is a great idea, and will be really helpful when it is time to go through some standardization process.

This is the most important thing to happen, IMHO, doing so would dramatically lower the barriers of entry of creating 2nd generation bitcoin clients independent of the reference implementation.

So if it would take many man_months of work to develop a formal specification, then how long would it take to develop a 'good enough' informal specification?

RHorning

full member

Activity: 224

Merit: 141

Quote from: slush on November 23, 2010, 01:38:31 PM

Quote from: Timo Y on November 23, 2010, 12:03:27 PM

it's not always what the user wants or needs

Well, there is no way how to implement unofficial clients for many users/programmers (like me), because they are not enough skilled in C++ and reverse engineering. But I'm capable to write alternate client with at least basic specification how whole thing works. Unfortunatelly because I'm not capable to write own client, I'm also not capable to help anybody with specs. At this time, I'm dependent to somebody else who starts specification process.

I'm absolutely not talking about any formal standard, wiki should help a lot in this stage.

I've thrown some additional information onto the wiki already, at least enough to start. I've found at least some of the relevant sections in the source code for Bitcoins and will try to get some more information put on there, as well as some threads to look through as well. More theory certainly should be put together, and perhaps an evaluation of some of the decisions already made... which can certainly be useful.

wumpus

hero member

Activity: 812

Merit: 1022

No Maps for These Territories

Quote from: The Madhatter on November 23, 2010, 02:15:24 PM

I'm still advocating for a few small changes to the protocol now before it becomes too much of a PITA to change later. (I mentioned this on the forum about 10 months ago.)

1. The handshake should be reversed. An open Bitcoin port shouldn't identify what it is. The connecting client should initiate the handshake. This improves privacy a lot. Think nmap. Think spies. Think any tool that can fingerprint (I use telnet) a service by simply connecting to an open port.
On the other hand, this would be a lot of trouble for existing clients. A more breaking protocol change is hard to think of.

2. The connections should be SSL. We should try to emulate FF connecting to Apache or DPI will eventually become our worst enemy. We should take what the Tor developers learned the hard way into account early on.

3. The Bitcoin client should choose a random unused port to listen on when it is first installed. For a ISP or even a nation to block port 8333 is quite easy and is becoming easier all the time.

4. UPnP is a must. The Bitcoin client should automatically open up whatever port it decided on with UPnP. This will relive a lot of NAT problems and will extend the P2P network a lot better.

Agreed.
1. This would counter simple port scan/identification attacks by script kiddies. Bitcoin (or any protocol) should not announce what it is. Let the connecter speak first. Just break the connection if it is not what it expected. It will not be impossible to identify the service, just a lot harder.

2. I'm all for this. SSL support is always a good addition. It would at least provide a level of security. Potential issue (specific to SSL) is key/certificate management.

3. Why not. The range in which to randomize should be configurable though, so that firewalls that only leave through a certain range can be used (same as with bittorrent)

4. Yep, that would help with a lot of home routers.

MoonShadow

legendary

Activity: 1708

Merit: 1011

Quote from: The Madhatter on November 23, 2010, 02:24:04 PM

The port isn't unknown. The IP/port are published to the network once the client has seeded successfully. Every other node writes that to their addr.dat.

As far as I can tell the addr.dat contains IP/port already.

I see. Well, I'm not a programmer, yet I cannot see the value in obscuring, encrypting or otherwise trying to hide the port. The port can be blocked for those who wish to hide their client, and still work; while the data in transit is only openly coded transactions and blocks. The only risks to the port being open is a sign to potential crackers that there is a running client (and therefore a wallet.dat) on the machine. Just close the port if that is a concern.

Topic: Bitcoin Protocol Specification (Read 12152 times)