Author

Topic: Potential weakness in block downloading (Read 1542 times)

hero member
Activity: 742
Merit: 500
December 11, 2011, 02:22:46 PM
#18
The challenge of bitcoin is to evolve it into a scalable, robust, fault tolerent, decentralized, efficient and secure system Wink

GIT belongs in the "secure" category.
GIT belongs in the "scalable" category.
GIT belongs in the "decentralized" category.
Etc...

So already three reasons to support GIT, ideally if anyone can run a GIT server and the client can pass around the latest HEAD hash.  Some people would just use GIT to pull from a trusted source, others would retrieve the block via the bitcoin network.

^ That idea could probably be better than bittorrent because it immediately allows the blocks which were downloaded to be checked for validity.

FTFY

Seriously though, I'd like to see more variety in block chain storage and transmission.  I'm sure what we have now will prove to be quite primitive compared to how it will be handled in 5 years.

Of course, even simple a round-robin with parallel connections would be a great improvement.

You can't download a git repo from multiple peers in parallel though.  Interesting idea though.

Sure you can.  GIT-supported transfer protocols allow requests for individual objects.  So once you have the list of objects you can request each object from a different repository.  This will add a lot more overhead than grabbing all objects at once though.  This shouldn't be too big of an issue because with most protocols you can easily ask for ranges of objects or a subtree of the object tree.

I stand corrected.  http://code.google.com/p/gittorrent/wiki/MirrorSync could be an interesting way to distribute the chain.  Would each block be a commit?

(I thought I posted this yesterday, but I must not have)
full member
Activity: 385
Merit: 110
December 11, 2011, 01:59:08 AM
#17
It's hard to tell how the protocol works exactly.

The best thing to do is to go read the software.  It's the true specification: It's in a (mostly) precise, meaningful, mathematical language— and should it differ from any prose description then it's the one that wins.

Skill in code reading varies, but I think the bitcoin 'official client' code is fairly clear.



Yeah some parts of the protocol are easy to understand, strangely enough it's the database reading/writing which still mystifies me.

However when it comes to the checksum these kind of things also scare me off a little bit:

From main.cpp:
Code:
       // Checksum
        if (vRecv.GetVersion() >= 209)
        {
            uint256 hash = Hash(vRecv.begin(), vRecv.begin() + nMessageSize);

What the hell does vRecv.begin() do ? Wink Smiley

One has to be a boost "expert" Wink to understand that Wink

Is that some kind of pointer ? And why does it need to know the beginning of a message ? Wink

It's probably something stream related Wink

Still by looking at the code it's not immediately apperent that this is the overal message structure, at least the message structure between clients it seems.

"ProcessMessages" is it's routine.

This could mean anything... is this "process irc messages" ? is this "client messages", is this something special like "block messages".

IRC uses it's own protocol and also uses messages, even TCP could be considered messages, though usually considered "streams" (or segments).

Then there is http ? And perhaps even other stuff I don't know about and could be in there... for all I know the database could be communicating with "messages" Wink Smiley

Even windows communicates with "messages" Wink Smiley =D

Strange but true.

Ofcourse from viewing it it does seem to have something to do with some kind of communication protocol.

With some help from the docs, the code, and you guys on the forum, it's starting to make some sense Wink

Also sometimes the filename helps, irc.cpp and irc.h is pretty obvious ! Wink =D
staff
Activity: 4284
Merit: 8808
December 10, 2011, 11:10:15 PM
#16
It's hard to tell how the protocol works exactly.

The best thing to do is to go read the software.  It's the true specification: It's in a (mostly) precise, meaningful, mathematical language— and should it differ from any prose description then it's the one that wins.

Skill in code reading varies, but I think the bitcoin 'official client' code is fairly clear.

full member
Activity: 385
Merit: 110
December 10, 2011, 07:18:59 PM
#15
This is the reason why bittorrent uses sha hashes to protect the segments against bit errors or malicious bit errors.
So this is a good question:
How does bitcoin protect against bit errors, or malicious bit errors ? Is there a hash which is calculated over the entire block ? Meaning every bit Huh

Why don't you spend some time reading up on it instead of wasting everyone's time with weakly informed theories?

The protocol is protected, and the individual blocks are hashed and just dropped if the hash doesn't have the required form. This works on a block by block basis and is complimented for older blocks where malicious changes might be possible with periodic checkpoints.




It's hard to tell how the protocol works exactly.

However you seem to hint at this section:

https://en.bitcoin.it/wiki/Protocol_specification

It contains a message structure under title "common structures".

Assuming all messages use that structure then the protocol would be protected with a sha hash which is good, one problem less to worry about Wink though there could be exceptions ?

It would help if the protocol specification would contain some diagrams/lines/arrows to visually show how the fields link into other structures.

Now it's slowly starting to make sense to me at least Wink Smiley
legendary
Activity: 1904
Merit: 1002
December 10, 2011, 07:14:18 PM
#14
The challenge of bitcoin is to evolve it into a scalable, robust, fault tolerent, decentralized, efficient and secure system Wink

GIT belongs in the "secure" category.
GIT belongs in the "scalable" category.
GIT belongs in the "decentralized" category.
Etc...

So already three reasons to support GIT, ideally if anyone can run a GIT server and the client can pass around the latest HEAD hash.  Some people would just use GIT to pull from a trusted source, others would retrieve the block via the bitcoin network.

^ That idea could probably be better than bittorrent because it immediately allows the blocks which were downloaded to be checked for validity.

FTFY

Seriously though, I'd like to see more variety in block chain storage and transmission.  I'm sure what we have now will prove to be quite primitive compared to how it will be handled in 5 years.

Of course, even simple a round-robin with parallel connections would be a great improvement.

You can't download a git repo from multiple peers in parallel though.  Interesting idea though.

Sure you can.  GIT-supported transfer protocols allow requests for individual objects.  So once you have the list of objects you can request each object from a different repository.  This will add a lot more overhead than grabbing all objects at once though.  This shouldn't be too big of an issue because with most protocols you can easily ask for ranges of objects or a subtree of the object tree.
hero member
Activity: 742
Merit: 500
December 10, 2011, 06:30:04 PM
#13
This is the reason why bittorrent uses sha hashes to protect the segments against bit errors or malicious bit errors.
So this is a good question:
How does bitcoin protect against bit errors, or malicious bit errors ? Is there a hash which is calculated over the entire block ? Meaning every bit Huh

Why don't you spend some time reading up on it instead of wasting everyone's time with weakly informed theories?

The protocol is protected, and the individual blocks are hashed and just dropped if the hash doesn't have the required form. This works on a block by block basis and is complimented for older blocks where malicious changes might be possible with periodic checkpoints.



You mean "might be prevented with periodic checkpoints" don't you?
staff
Activity: 4284
Merit: 8808
December 10, 2011, 06:20:29 PM
#12
This is the reason why bittorrent uses sha hashes to protect the segments against bit errors or malicious bit errors.
So this is a good question:
How does bitcoin protect against bit errors, or malicious bit errors ? Is there a hash which is calculated over the entire block ? Meaning every bit Huh

Why don't you spend some time reading up on it instead of wasting everyone's time with weakly informed theories?

The protocol is protected, and the individual blocks are hashed and just dropped if the hash doesn't have the required form. This works on a block by block basis and is complimented for older blocks where malicious changes might be possible with periodic checkpoints.


hero member
Activity: 742
Merit: 500
December 10, 2011, 06:07:29 PM
#11
The challenge of bitcoin is to evolve it into a scalable, robust, fault tolerent, decentralized, efficient and secure system Wink

GIT belongs in the "secure" category.
GIT belongs in the "scalable" category.
GIT belongs in the "decentralized" category.
Etc...

So already three reasons to support GIT, ideally if anyone can run a GIT server and the client can pass around the latest HEAD hash.  Some people would just use GIT to pull from a trusted source, others would retrieve the block via the bitcoin network.

^ That idea could probably be better than bittorrent because it immediately allows the blocks which were downloaded to be checked for validity.

FTFY

Seriously though, I'd like to see more variety in block chain storage and transmission.  I'm sure what we have now will prove to be quite primitive compared to how it will be handled in 5 years.

Of course, even simple a round-robin with parallel connections would be a great improvement.

You can't download a git repo from multiple peers in parallel though.  Interesting idea though.
full member
Activity: 385
Merit: 110
December 10, 2011, 10:55:48 AM
#10
This problem could much more easily be solved- without Bittorrent or repositories- simply by downloading blocks in a round-robin fashion through different connections.
Suppose Connection 1 is painfully slow.
The client would use connection 1 to transfer block 1. Connection 2, block 2 and so on.
(assume 8 connections)
The first connection to finish downloading a block would download block 9, the next block 10, and so on. To process blocks, you need them in order so you still would be waiting on an earlier block (like block 1 in this case). If 50 blocks are downloaded and stored (blocks 2 through 51 in this case) while still waiting on a prior block to download, further downloads are paused. If the data transfer speed of this connection is less than 10% of the average of all connections, then the transfer is abandoned and the block is re-requested through another connection.

This solution
* is secure
* is decentralized
* does not require us to implement bittorrent.
* prevents malicious slow connections from having any impact
* is tolerant of the occasional enormous block even if it happens to come through the slow connection

-Atheros

This is not enough, even the current implementation of bitcoin could be vunerable to bit errors. I am not sure if bitcoin adds extra integrity checking on top of tcp. Just tcp alone is not enough to transfer data intact over dialups. I have seen dialups introduce bit errors which were not detected in udp protocols, I would expect the same to happen in tcp, since it uses the same error checking, which is very weak: summations only.

Ethernet is a different matter since it has crc32 build in, even crc32 is not enough to protect against all bit errors, this does not even go into the issue of malicious bit errors.

This is the reason why bittorrent uses sha hashes to protect the segments against bit errors or malicious bit errors.

So this is a good question:

How does bitcoin protect against bit errors, or malicious bit errors ? Is there a hash which is calculated over the entire block ? Meaning every bit Huh

I am not completely sure but I think bitcoin does this with the block hashes, and the block chain... Wink

If so then it's clear that processing can only be done after all blocks from the genesis block have been downloaded and check etc...

I think the merkle hash tree idea for downloading which I mentioned was to allow the blocks to be verified, even if not all of them or even if the genesis block has not been downloaded yet.

So the merkle hash tree idea could be used for out-of-order block verification which could then also allow the client to immediatly upload to others...

But indeed this idea would be much more complex then simple round robin... however the more complex idea could have performance advantages...

The idea was to prevent a single connection from slowing things down.

I think your solution to simply dump the connection and switch to another one, might be a more easilier solution, this solution does assume that all blocks can be ultimately downloaded from some kind of genesis block.

However in the future perhaps the block chain will switch to something else like a ledger/balance sheet, perhaps some clients might be in different states... and then perhaps this round robin idea might start to fail if they are in different states, perhaps the tree idea could have some adventages here, since it can be more easily pruned... not sure though...



legendary
Activity: 1904
Merit: 1002
December 10, 2011, 01:31:54 AM
#9
The challenge of bitcoin is to evolve it into a scalable, robust, fault tolerent, decentralized, efficient and secure system Wink

GIT belongs in the "secure" category.
GIT belongs in the "scalable" category.
GIT belongs in the "decentralized" category.
Etc...

So already three reasons to support GIT, ideally if anyone can run a GIT server and the client can pass around the latest HEAD hash.  Some people would just use GIT to pull from a trusted source, others would retrieve the block via the bitcoin network.

^ That idea could probably be better than bittorrent because it immediately allows the blocks which were downloaded to be checked for validity.

FTFY

Seriously though, I'd like to see more variety in block chain storage and transmission.  I'm sure what we have now will prove to be quite primitive compared to how it will be handled in 5 years.

Of course, even simple a round-robin with parallel connections would be a great improvement.
sr. member
Activity: 249
Merit: 251
December 10, 2011, 12:42:50 AM
#8
This problem could much more easily be solved- without Bittorrent or repositories- simply by downloading blocks in a round-robin fashion through different connections.
Suppose Connection 1 is painfully slow.
The client would use connection 1 to transfer block 1. Connection 2, block 2 and so on.
(assume 8 connections)
The first connection to finish downloading a block would download block 9, the next block 10, and so on. To process blocks, you need them in order so you still would be waiting on an earlier block (like block 1 in this case). If 50 blocks are downloaded and stored (blocks 2 through 51 in this case) while still waiting on a prior block to download, further downloads are paused. If the data transfer speed of this connection is less than 10% of the average of all connections, then the transfer is abandoned and the block is re-requested through another connection.

This solution
* is secure
* is decentralized
* does not require us to implement bittorrent.
* prevents malicious slow connections from having any impact
* is tolerant of the occasional enormous block even if it happens to come through the slow connection

-Atheros
full member
Activity: 385
Merit: 110
November 30, 2011, 04:05:46 PM
#7
The bitcoin client would do well to just fetch the majority of the blockchain as a signed file via HTTP from a repository, and only rely on the P2P network to bring it up to date.  Go figure, it could even use BitTorrent!

Neh, this goes against the distributed/p2p philosophy of bitcoin Wink Smiley

Besides that http server would get overloaded real fast in the future as bitcoin becomes more populair, and a central point of failure/attack Wink

The challenge of bitcoin is to evolve it into a scalable, robust, fault tolerent, decentralized, efficient and secure system Wink

HTTP does not belong in the "secure" category.
HTTP does not belong in the "scalable" category.
HTTP does not belong in the "decentralized" category.
Etc...

So already three reasons to dismiss HTTP, some of these could be solved if everybody starts running a mini HTTP server in bitcoin, but that would be a little bit weird ! Wink and still doesn't solve everything potentially ! Wink Smiley And still needs some more work... I already wrote/posted an idea how distributed downloading of blocks could be done with a merkle tree hash ! Wink

^ That idea could probably be better than bittorrent because it immediately allows the blocks which were downloaded to be checked for validity.

With bittorrent there might still be a risk of individual data segments to be wrong/corrupted. As far as I can remember, bittorrent works with a list of hashes for each block. So the problem here is the "torrent file".

Who will produce this torrent file ?.

The torrent file is not in the categories above, and thus should be rejected ! Wink
legendary
Activity: 2128
Merit: 1073
November 30, 2011, 03:43:24 PM
#6
just fetch the majority of the blockchain as a signed file
Why signed? The block-chain is already signed beyond all recognition. The blkindex.dat is all redundant information and could be safely recreated on the spot as needed.
full member
Activity: 154
Merit: 102
Bitcoin!
November 30, 2011, 03:37:11 PM
#5
Good call Mike.
vip
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
November 30, 2011, 03:12:52 PM
#4
The bitcoin client would do well to just fetch the majority of the blockchain as a signed file via HTTP from a repository, and only rely on the P2P network to bring it up to date.  Go figure, it could even use BitTorrent!
full member
Activity: 385
Merit: 110
November 30, 2011, 03:07:06 PM
#3
Ok that's really funny.

Let's see how long it would take with current database size of let's say 1 gigabyte:

1024 * 1024 * 1024 / 1024 * 8 = 131072 seconds needed.

131072 / 60 * 60 = 36.40888 hours.

So approximately 37 hours, that's amazingly still pretty doable with just a 56k6 modem ! Wink

Perhaps a little bit optimistic calculation but still, it 'll be done in about 2 days of downloading ! Wink Smiley

At least that's my estimation ! Your mileage may vary ! Wink =D

sr. member
Activity: 406
Merit: 250
November 30, 2011, 02:51:54 PM
#2
As malicious as a dialup user trying to serve you the blockchain.
full member
Activity: 385
Merit: 110
November 30, 2011, 02:48:03 PM
#1
Hello,

From looking at the c/c++ source code for bitcoin it seems there is a little weakness in the block downloading mechanism, and there is also room for improvement.

1. First the weakness:

The weakness is probably in the way it downloads the block: It seems to connect to clients at random times to prevent always connecting to the same client.

However if there is a malicious client in the group, which happens to be the first to be connected to, then this malicious client can do the following:

The bitcoin client 0.5 always selects the first connection to download blocks from, so the potential weakness is:

If the malicious client is selected then the malicious client can make the download go rrrrreeeaaallll slllloowwww... therefore the connecting client will never download the blocks on time...

So perhaps this can be exploited.

For example the malicious client could try and make as many clients which happen to connect to him go as slow as possible Wink Smiley

Also there is room for improvement:

2. Instead of downloading just from the first one, the blocks could also be downloaded from multiple connections, one block per connection, spread out, then it repeats. However this would probably require some bandwidth throttling otherwise the client would probably end up dossing itself Wink For example it's download pipe becomes overloaded/full, which doesn't necessarily have to be a problem, also tcp is designed to compensate for that and have some basic throttling to prevent overflow of entire network, however if the client still wants to do something else it might still be handy to have some throttling, however throttling is not an immediate requirement so this idea could probably be implemented straight away.

Bye,
  Skybuck.
Jump to: