I actually don't think there's anything wrong with SPV or the current state of the ecosystem (minus some centralization concerns, but that's for another time)! How many transactions could the current ecosystem handle if we utilized every blockchain at once. This is going to be much easier to explain if we start using daily throughput instead of MB per block, which is a terrible metric.
I'm sort of agnostic on this but the standard argument against is that it loses on network effect both protocol-wise and in terms of unit of account. If you are a Bitcoin user and you want to pay a Litecoin user that means doing an exchange, which adds many kinds of overhead. This puts this sort of heterogenous solution at a competitive disadvantage to a single network that is able to service both customers.
Likewise if you slice the other way and say that every user is both a Bitcoin and Litecoin user (meaning you can just choose any chain to transact), again that adds (slightly different) overhead.
Certainly if you have infinite chains you can have infinite total capacity but that also creates infinite overhead.
As for "Digital Gold", he didn't coin the phrase, but he set the expectation that it works like gold: "The steady addition of a constant of amount of new coins is analogous to gold miners expending resources to add gold to circulation"
Cherry picking his quotes is not really helpful. He also said:
The bandwidth might not be as prohibitive as you think. A typical transaction would be about 400 bytes (ECC is nicely compact). Each transaction has to be broadcast twice, so lets say 1KB per transaction. Visa processed 37 billion transactions in FY2008, or an average of 100 million transactions per day. That many transactions would take 100GB of bandwidth, or the size of 12 DVD or 2 HD quality movies, or about $18 worth of bandwidth at current prices.
If the network were to get that big, it would take several years, and by then, sending 2 HD movies over the Internet would probably not seem like a big deal.
and (pre-SPV)
The current system where every user is a network node is not the intended configuration for large scale. That would be like every Usenet user runs their own NNTP server. The design supports letting users just be users. The more burden it is to run a node, the fewer nodes there will be. Those few nodes will be big server farms. The rest will be client nodes that only do transactions and don't generate.
and
I anticipate there will never be more than 100K nodes, probably less. It will reach an equilibrium where it's not worth it for more nodes to join in. The rest will be lightweight clients, which could be millions.
and
Forgot to add the good part about micropayments. While I don't think Bitcoin is practical for smaller micropayments right now, it will eventually be as storage and bandwidth costs continue to fall. If Bitcoin catches on on a big scale, it may already be the case by that time. Another way they can become more practical is if I implement client-only mode and the number of network nodes consolidates into a smaller number of professional server farms. Whatever size micropayments you need will eventually be practical. I think in 5 or 10 years, the bandwidth and storage will seem trivial.
all of which, among others, make it clear he expected Bitcoin to scale and be used for many routine transactions, including increasingly-small micropayments, by millions of users, not just settlements between gold vaults. He may have been wrong, but that was his expectation.