great post on Big O notation by awemany on Reddit:
https://www.reddit.com/r/Bitcoin/comments/3a5f1v/mike_hearn_on_those_who_want_all_scaling_to_be/csa5grvOk. Lets say you are going to a party on new years eve. At 0:00, you go and greet and wish the best for next year to everyone who is around you. Because it is customs, everyone else does the same. Lets further assume that you are in a single room with everyone else, and for the greetings to happen orderly, only one person can greet another person at any single time.
Now, doing a greeting takes time, lets say d seconds.
For a party with 10 people, you'd greet 9 others. Assuming each greeting takes a second, and everyone else would do the same, so there would be 10 people each greeting 9 people, 10 x 9 == 90 greetings, taking just 1.5 minutes.
Now assume you are going to a bigger party instead, with 100 guests.
Naively, you might assume something along the lines of: 'Ok, ten greeting each takes 1.5min, so 100 must take 15min.'.
But at the party with 100 guests, one hundred people are greeting 99 other guests each, so a total of 9900 greetings. And that takes 2h 45min.
Whoops!
The reason being, of course, that for n guests, it takes
d * n * (n-1)
seconds to do the greetings. And the term n * (n-1) becomes big rather quickly!
So this process of doing new years greetings is something that clearly doesn't really work for large crowds. If you'd train everyone who goes to the party to talk really, really quickly (reducing d), you might get the greetings done within the assumed 15min. But then, this process would still break if you'd go to a crowd of a thousand.
Computer science people would say it is an algorithm that scales rather badly. They say it is in O ( n ^ 2 ).
Now, clearly, d * n * (n-1) is not the same as n * n for almost all choices of t and n.
But for very large n, n * (n-1) gets very close to just n ^ 2. And the factor t, the time it takes to shout the greeting, isn't really affecting so much the problem that at some large crowd size, it becomes infeasible to use the above greeting algorithm.
And this is captured in O( n ^ 2 ). It only gives you the overall, rough behaviour of a scheme or algorithm, without the unnecessary details. Consider that algorithms on a computer can be very complex, and there can be many such factors and weird numbers in front of the n2 part. This so called big O notation requires some more (but still relatively simple) math of which the specifics are not relevant to you as a 5 year old. But you can read more on Wikipedia, for example.
Ok. So now lets turn to the topic of Bitcoin, or rather the topic of its scalability.
The Bitcoin network consists of a set of nodes that send messages to each other to inform each other about new transactions to process. For Bitcoin to work as proper money, the whole network has to somehow agree on what transactions have been done, and which have not.
There is now of course a lot of complexity and thought into the specifics on how this all works, but what is interesting in the context of this discussion is that somehow, all full nodes need to learn about all transactions from all other full nodes.
And here we can go back to the above picture of the party on new year's eve.
Imagine a Bitcoin full node is a party guest, and greeting means passing a transaction on to others.
If Bitcoin would work like the party, all n nodes would announce a new transaction (the greeting), to everyone else.
As the CS guy says: This is O(n 2 ), this doesn't scale.
But the problem is, that this is the wrong picture of the Bitcoin network.
In the equivalent picture from above, a Bitcoin full node would make a greeting (forward a transaction) to every single other node in the network.
And I think you can see the problem here: There is no party going on among the full nodes (they are not sentient yet), and they do not care whether they receive the SAME transaction from every other node personally. Rather, they just care about somehow receiving it at all.
And so, the Bitcoin network has been designed in an more efficient manner. Basically, a transaction is gossiped across the network. If a node hears a valid transaction, it just forwards it to whatever other neighboring, connected node has not seen it yet.
And if you now think about it, a network of full nodes that are all somehow connected just needs 'n gossipings' to get a message to everyone in the village.
That is O(n) behavior, and that is an algorithm that scales a lot better. That's linear behavior.
Now, and this might not be exactly ELI5 anymore, the situation is of course more complex. Because full nodes are not the same as users in Bitcoin, one actually needs to introduce another variable describing the number of users in the network (let that be u) and the number of transactions a typical user makes in a given period of time, let that be t.
Could it be that somehow, the load on the network, the number of transactions to push around, is something that grows too fast with more users joining, or with the number of transactions going up?
It can be assumed that every one of the u users wants to make a certain average number of transactions per day, t, and so the number of transactions going into the Bitcoin network scales like O(u * t).
The total number of transaction transmissions in the network then are in O(n * u * t), because, as we discussed above, every full node sees every transaction.
If you assume that every user just has to have a node under his desk, it is n == u, and so, the scaling of the whole Bitcoin network would be O(u ^ 2 * t), which is quadratic, CS guy doesn't like it, doesn't scale, forget it!
But wait a bit! First of all, the metric that we used here was for the whole network. In the context of the blocksize debate, some people worry about decentralization, and have some (unstated, vague and ill-defined, one might add) idea on how many full nodes there ought to be in the network. They argue that a full node might become too expensive to run in a non-scaling Bitcoin network. The metric, though, that every user would use to decide whether to run a node or not, is not in O(u ^ 2 * t), the whole network, it is rather the load per node, which is in O(u * t).
And secondly, but equally important, the idea that every user is going to run a full node is not the state of and has never been the vision for Bitcoin. Satoshi himself said and proposed that most users are going to use SPV (simplified payment verification) wallets, which are very thin clients interacting with the network, and not full nodes.
Finally, a thing should be said about the number of transactions t. There is this thing called Metcalfe's law, that states that the value of a network with u users goes like O(u ^ 2), as each user can connect to each other in the network.
If this would accurately describe the total number of transactions in the network, Bitcoin would be not scalable, as full network transactions would go like O(n * u ^ 2) and per node as O(u ^ 2). That would not be scalable.
Now, it is certainly to be expected that with a larger network, a user might do more Bitcoin transactions, as the network has more utility to him/her. He might shift more of the usual cash or banking transactions to Bitcoin, as soon as the Bitcoin network allows him to do so. In any case, he's not going to interact with all u users, just because they are there!
But it can reasonably be expected that there is (on average) a certain number of banking and cash transactions a user wants to make per day, overall, and if Bitcoin would take over all those transactions, it would still level out at some point. Technically, this would be O(1) in the final state.
However we are not there yet and, certainly, some growth is indeed to be expected with a growing Bitcoin user base. In here, Metcalfe's law is discussed in the context of the first dotcom boom. In there, an argument is made that the value of a network with u users rather grows like O(u log u) than O(u ^ 2). Assuming that this growth in value reflects fully in the actual transaction rate, the growth in transactions per node on the Bitcoin network would then also be O(u log u), which is benign, and scalable.
Interestingly, it should be noted that in that paper, they argue against Metcalfe's law as being too optimistic of the growth in value of a network.
Ironically, here we have people now trying to stop Bitcoin because it might grow too much, it might be too successful!
Does this EILYA5 to you sufficiently?
Phew. I think this is actually pretty comprehensive. /u/mike_hearn, feel free to make/mix this into a blog post, if you think it helps.