I think you've got it correctly - the reason that 1 can't continue this behavior successfully depends upon his control of the future blocks (which he can't actually *control by himself* due the random nature of crypto hashes).
So the *evil block forger* actually needs *evil friends* to help him. If he has enough friends then he can win but the actual probability of this is why we need a math guy.
OK, it's good that I'm starting to understand something. Sorry for being soooo slow, but that's on purpose It's better to clear any misunderstanding on an early stage.
To proceed, I need to understand:
1. How the "weight" variable is actually computed?
2. In particular, for what reason the weights on a "bad" branch are supposed to be much smaller than on the main one?
3. Assume that a node sees at least one neighbor with a different version of the blockchain (from the blockchain that our node has). How does it decide, which is the "correct" one?
Then, do I understand correctly, that the main object of interest is the probability distribution of the length of the "bad" branch? That is, we should be able to issue statements like "given that the bad guy has X% of all NXT, he can grow a bad branch of length at least 10 with probability at most 0.00000000003", right?
On the side note, there is probably also a question about penalizing "suspicious" accounts. Because if the rules are too strict, then inevitably your would penalize some good guys too, and so maybe an attacker can invent a strategy to break the network because too many nodes are penalized. Is this a possible issue?