Author

Topic: Transaction propagation speed - experimental results (Read 1869 times)

jr. member
Activity: 42
Merit: 11
I see your point. Yes, prior observations would help to clear picture.
staff
Activity: 4242
Merit: 8672
If you do it with small number of peers, your results will be overly - optimistic (you will constantly under-estimate propagation times).
Apparently I wasn't very clear. You should monitor for a long time in order to build a list of old transactions which you will then exclude from your subsequent analysis interval so that your data is less polluted by retransmissions. In making this exclusion list it's not important that you have many peers.

Otherwise you end up with a long tail of broken, invalid, never confirming transactions, double spends, etc. that arrive forever and look like unboundedly long propagation in your analysis. You'll still have many but I believe you can probably exclude most with 24-48 hours of exclusion collection.

jr. member
Activity: 42
Merit: 11
If you do it with small number of peers, your results will be overly - optimistic (you will constantly under-estimate propagation times).
staff
Activity: 4242
Merit: 8672
Reprocessed plot (counted each (tx_id, peer) pair only once):
One thing you want to do is get a list of transactions across several days— doesn't have to be from connecting to a great number of nodes.... and then collect your big run.  Then exclude all of those 'seen before my window started' transactions. That should get rid of a large number of the initial retransmissions.  I'm not sure how you can exclude subsequent retransmissions reliably, however.


jr. member
Activity: 42
Merit: 11
Reprocessed plot (counted each (tx_id, peer) pair only once):
http://img404.imageshack.us/img404/6438/propr.png
As I suspected, there is almost no change (some minor peaks are gone).
Quote
You said you had 1900 connections "at the end of the experiment".  I would expect a connection you made 20 minutes into the experiment to send you the transaction 20 minutes (or more) after the start of the experiment.  So, did you track the time between when you last connected to a peer and when the peer next sent you the Tx packet?
I doubt that something is queued and pushed after connection. I never seen this happen - after connection, you receive invs at steady rate. inv broadcast is triggered by successful validation of tx.
Quote
A useful overlay would be the time between when you first see a packet and the time when it enters the blockchain.
Well, after I do block download parsing, it can update graphs.
newbie
Activity: 53
Merit: 0
Please discuss.
Very cool graph!  But I'm not sure if it tells much of a tale.  There's no hint as to how many distinct Tx packets you are seeing.

I don't claim to be an expert on the Bitcoin protocol.  I only know what I've observed.

With respect to the long tail, I think that could happen if a node was offline when the packet originally traversed the network.  Later, when the node rejoins the network, it receives the Tx packet from one of its peers.  I think it retransmits that packet to each peer it subsequently connects with.

I think this may also occur as connections between peers flap up and down.  Doesn't each new "up" connection result in an exchange of some queued Tx data?

You said you had 1900 connections "at the end of the experiment".  I would expect a connection you made 20 minutes into the experiment to send you the transaction 20 minutes (or more) after the start of the experiment.  So, did you track the time between when you last connected to a peer and when the peer next sent you the Tx packet?

A useful overlay would be the time between when you first see a packet and the time when it enters the blockchain.  I think the slope of your long tail is partially dependent on how quickly packets enter the blockchain.
legendary
Activity: 3878
Merit: 1193
What can I say? Certainly, propagation speed is not that great. "Tails" are quite long. There are fair amount of retransmits (I believe second mode is result retransmits).

Cool graph. Can you add some marks for some interesting percentages? Like where in the graph does 90% land?
jr. member
Activity: 42
Merit: 11
It's possibly retransmits. I was told satoshi client does retransmit at random intervals, so it's possibly some other client/software re-sending them after 10 min timeout.
legendary
Activity: 2142
Merit: 1010
Newbie
Why is there a peak at 600s? Coincidence?
full member
Activity: 154
Merit: 100
... topologically widespread ...
This doesn't work - you have no idea what the whole network looks like unless you connect to every relaying peer, so you don't know if your nodes are widespread or not. You'd have to repeat the experiment thousands of times to account for sometimes being very close to each other and other times very far apart.
full member
Activity: 154
Merit: 100
As I understand, peers should not announce same tx twice (except the original sender, which can retransmit after set amount of time). I can try exclude duplicates, but doubt it will change results much.
Depends on the size of their memory pool. Default size is 1000, so at several transactions per second, this equates to a few minutes.
jr. member
Activity: 42
Merit: 11
Or can use only one, as did I. Node in question did not do any relaying itself, because it would skew results.
legendary
Activity: 2940
Merit: 1090
You culd use two nodes, one that listens to as many as possible, to try to figure out when a transaction first appeared; another that waits to hear directly itself, without few connections, about the transactions.

Or not just two; have many, topologically widsespread, few-connection nodes and average how long it takes for them all to hear about a transaction compared to when the node connected to "almost everyone" first heard about it.

-MarkM-
jr. member
Activity: 42
Merit: 11
As I understand, peers should not announce same tx twice (except the original sender, which can retransmit after set amount of time). I can try exclude duplicates, but doubt it will change results much.
legendary
Activity: 1400
Merit: 1005
Interesting analysis.  Rather than creating transactions to test, you simply test the first and last time you hear about any given transaction.

The problem, as Zeilap pointed out, is duplicates.  Not only might node A send the same message to you twice, but it might send the same message to peer B twice.  And peer B might not have chosen to send it to you the first time, for whatever reason, but send it to you the second time.  Then it appears that it took X number of seconds to reach peer B, when it is not, in fact, true.
full member
Activity: 154
Merit: 100
You need to exclude any duplicates from the same peer. If I tell you about a transaction 3 seconds after it's first announced, and then later on, I send you an inv message with a whole bunch of transactions from my memory pool, including the one I already told you about, you can't count me telling you twice. Clearly I knew about that transaction less than 3 seconds after it was announced, and that's what your interested in.
jr. member
Activity: 42
Merit: 11
I was interested, how fast do transactions propagate through network in reality?
I was thinking, if I can connect to every node and listen to inv messages, I could estimate it (node should send inv when it got tx and verified it, and only once). Well, I did exactly that. Only I ended up connected not to all nodes, but to fair share of them (~1900 active connections at the end of the experiment). I logged every inv message with timestamp. After several hours, I processed resulting 2.5 Gb text file with python, remembering time I first saw each transacion hash, and next time I got inv with it, I calculated time difference, binning it (used 1 sec bins) and summing over all transactions.
I ended up with following distribution (x is in seconds):

What can I say? Certainly, propagation speed is not that great. "Tails" are quite long. There are fair amount of retransmits (I believe second mode is result retransmits).
Please discuss.
Jump to: