Using the Bitcoin blockchain for P2P discovery?

Jet Cash

legendary

Activity: 2870

Merit: 2474

https://JetCash.com

The Bitcoin blockchain already has problems with regard to its size. Surely using it as a data storage medium would make this worse, and would be a really bad step.

paulkernfeld

newbie

Activity: 10

Merit: 0

Quote

There is no way for the client to know which blocks contain the transactions with the ip addresses, so it would need to download all of the blocks. I suppose you could say that it would only need to download blocks from a certain point onward, but that would still grow to be very large after some time.

This is exactly the problem with previous proposals: downloading the entire blockchain would just take too long. The way I'm proposing to solve this is to predetermine which blocks to look at as a part of the protocol. E.g. instead of looking at blocks

Code:

0, 1, 2, 3, 4, 5, 6

, a client would look at blocks

Code:

0, 4, 6, 2

. This increases write time, because you have to wait for the right block height in order to write.

Thanks for the critiques! This is definitely helping me to improve the proposal. I have updated the Gist to include this stuff.

achow101

staff

Activity: 3458

Merit: 6793

Just writing some code

Quote from: paulkernfeld on January 01, 2016, 03:37:34 PM

Hmm, I was hoping to avoid downloading the entire blockchain. I agree with you that downloading the entire blockchain for discovery is impractical, because a new application would require hours or days to start up. With my design, I believe that the client only needs to download all block headers, not all blocks. The client then requests a subset of the actual blocks. In order to accomplish this, I'm making big sacrifices on write speed (e.g. you might wait months to write data).

If you are encoding the data into a transaction, the block header is not going to help. It doesn't indicate what transactions are there, you would need the actual block for that. There is no way for the client to know which blocks contain the transactions with the ip addresses, so it would need to download all of the blocks. I suppose you could say that it would only need to download blocks from a certain point onward, but that would still grow to be very large after some time.

Another way would be to always have the live peers on the other network to broadcast a transaction with their ip address to confirm their liveness on the network. If it happened every 24 hours then a new node would only need the last 24 hours worth of blocks to get nodes that were live in the past 24 hours.

paulkernfeld

newbie

Activity: 10

Merit: 0

Hmm, I was hoping to avoid downloading the entire blockchain. I agree with you that downloading the entire blockchain for discovery is impractical, because a new application would require hours or days to start up. With my design, I believe that the client only needs to download all block headers, not all blocks. The client then requests a subset of the actual blocks. In order to accomplish this, I'm making big sacrifices on write speed (e.g. you might wait months to write data).

achow101

staff

Activity: 3458

Merit: 6793

Just writing some code

Quote from: paulkernfeld on January 01, 2016, 03:23:21 PM

Ah! Sorry, I misunderstood your original question. I am assuming that the new node is able to connect to the Bitcoin network, and that it uses the Bitcoin network to find discovery information for another peer-to-peer network. Does that make any sense?

Oh I see. So it utilizes the bitcoin network and the blockchain in order to connect to a different p2p network. This would only be useful if the user also uses Bitcoin and a full node implementation. It seems like a horribly inefficient way to connect and it would take ages to do so since it needs to download the whole blockchain and search through it for the transactions with the ip addresses encoded in them.

This is a good idea, I just don't think it will work very well in practice.

paulkernfeld

newbie

Activity: 10

Merit: 0

Ah! Sorry, I misunderstood your original question. I am assuming that the new node is able to connect to the Bitcoin network, and that it uses the Bitcoin network to find discovery information for another peer-to-peer network. Does that make any sense?

achow101

staff

Activity: 3458

Merit: 6793

Just writing some code

Quote from: paulkernfeld on January 01, 2016, 02:50:13 PM

Good catch! I definitely did not make that clear. Basically, the answer to that is that nodes would have to download all block headers first, in the manner of a Satoshi SPV client (https://en.bitcoin.it/wiki/Thin_Client_Security). Then, they would request full blocks in a specific order. Does that answer your question?

No. Headers wouldn't supply the necessary information for finding peers.

What I am saying is that this scheme wouldn't work for the initial peer discovery and sync. There is no way for a new node to know about peers to connect to since it doesn't have the blockchain. And without peers to connect to it cannot download the blockchain.

paulkernfeld

newbie

Activity: 10

Merit: 0

Good catch! I definitely did not make that clear. Basically, the answer to that is that nodes would have to download all block headers first, in the manner of a Satoshi SPV client (https://en.bitcoin.it/wiki/Thin_Client_Security). Then, they would request full blocks in a specific order. Does that answer your question?

achow101

staff

Activity: 3458

Merit: 6793

Just writing some code

This is a really obvious flaw but I don't think you addressed this:

If I start a new node that doesn't have the blockchain already, then how would I be able to find peers to connect to since I can't look in the blockchain?

paulkernfeld

newbie

Activity: 10

Merit: 0

Hey everyone, I would love to hear feedback on whether this idea is worth pursuing, and, if so, how it might be improved.

Here's a link to a prettier version of this: https://gist.github.com/paulkernfeld/7126c1307fd46561df9c

Preface: Blockchain Storage Politics
---------------------------
The issue of whether the Bitcoin protocol should be used for data storage is [contentious](https://github.com/bitcoin/bitcoin/pull/5286), and I apologize to anyone who thinks this post is in poor taste. If you don't mind, I'd like to limit this particular discussion to the technical merits of Vespucci, rather than the moral merits.

Vespucci
========
Vespucci is a proposed protocol that uses the Bitcoin blockchain for decentralized application discovery.

The Problem
===========
In order for an application to join a P2P network, the application must somehow find the IP address of one or more peers to connect to. This is called bootstrapping, and it can be difficult. BitTorrent and Bitcoin clients often include constant addresses of "bootstrap nodes," long-lived server nodes. This solution works only if these long-lived server nodes remain up, which creates a potential point of failure. So, this protocol is designed to provide discovery for applications with the following requirements:

1. The P2P application needs to download a global list of peers when it's first opened.
2. Anyone should be able to register bootstrapping peers for discovery.
3. Read performance, i.e. the time to read peers and join the network, is very important.
4. Write performance, i.e. the time to register a new peer, is very unimportant.
5. The ability to discover the Bitcoin network is assumed.

The Protocol
============
Addresses of peers will be stored using [OP_RETURN](https://en.bitcoin.it/wiki/OP_RETURN) transaction outputs. In order to discover peers for the application, it will look through the blockchain, returning relevant transactions to the application.

What is stored?
---------------
The data pushed after OP_RETURN will consist of:

1. Two bytes, `V0` in ASCII, identifying the message as belonging to the Vespucci protocol.
2. A few additional bytes identifying the application.
3. A zero byte to mark the end of the application ID.
4. A list of compressed addresses.

Since the Blockchain is a shared resource, we want to be sure to use it wisely. We can use space efficiently by compressing addresses and allowing multiple addresses to be batched into the same transaction.

Addresses
---------
Uncompressed addresses will be zero-terminated generic URIs. This allows us to support IP addresses as well as hostnames.

`scheme:[//[user:password@]host[:port]][/]path[?query][#fragment]`

Information to include or not include:

* We probably don't need to store `scheme` (e.g. `http` or `magnet`), because that information can probably be inferred by the application.
* It doesn't make much sense to include a `user` and `password` in Bitcoin, a publicly-viewable store!
* The `host` field will always be populated
* The `port`, `path`, `query`, and `fragment` fields may be populated

Compression
-----------
In order to compress URLs, we'll want to use a small-string compressing library specifically trained on URL-looking data. [shoco](http://ed-von-schleck.github.io/shoco/) allows users to do [just this](http://ed-von-schleck.github.io/shoco/#generating-compression-models).

Look order
==========
The Bitcoin blockchain is a log-structured data store, optimized for great write performance, but not designed for reading. The blockchain currently grows by about 25 GB/year, and is not indexed only by time. This presents a dilemma: a P2P application that's five years old would have to look through 125 GB of data if searching linearly, even in the unlikely event that the block size limit is not increased. That's a lot of data to look through just to get some addresses! So, how can we turn the write-optimized blockchain into a read-optimized discovery store?

To solve this, we look through blocks in an order that maximizes read performance while greatly sacrificing write performance. The algorithm will be as follows:

* WLOG, label the first block that we care about block 0. Label subsequent blocks in [height](http://bitcoin.stackexchange.com/questions/18561/definition-of-blockchain-height) order: 1, 2, 3, ...
* Define the current maximum block height as `M`.

1. Set integer `K` such that `K := ceiling(log2(M))`.
2. Look at all non-looked-at blocks where block number `B` is a multiple of `2 ^ K`, in descending order.
3. If `K > 0`, set `K := K - 1`. Otherwise (`K = 0`), we have looked at all blocks.

This can also be thought of as writing block heights in binary and counting the number of zeros at the end. An example:

Code:

Blocks: 0 1 2 3 4 5 6 7 8 9 10 11 12 13
Binary: 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101
End zeros: 4 0 1 0 2 0 1 0 3 0 1 0 2 0

Look order: 0 8 12 4 10 6 2 13 11 9 7 5 3 1

Just as when picking port numbers, applications should try to avoid synchronizing initial block heights, in order to avoid crowding at important blocks. To optimize further, applications could stop looking at a certain value of `K`. For example, if we only look until `K = 10`, the minimum write time is a week, and the application is guaranteed to never have to look through more than 1/1024 of the blockchain, which would be 25 MB/year currently.

Spam prevention
---------------

Issues
======
Protocol interference
---------------------
It's possible that non-Vespucci messages might begin with the Vespucci and application prefixes, either by chance or by malice. Given this, Vespucci clients should tolerate:

* Malformed addresses
* Addresses pointing to malicious bootstrap nodes
* [Decompression attacks](https://en.wikipedia.org/wiki/Zip_bomb) taking advantage of the compression protocol

Related Work
============
[Blockname](https://github.com/telehash/blockname) is a similar project, a Bitcoin blockchain DNS cache. Blockname is trying to solve a slightly harder problem, that of creating a 1:1 mapping from domain name to server address. In Vespucci, each application may return a set of addresses.

Topic: Using the Bitcoin blockchain for P2P discovery? (Read 1060 times)