I imagine a malicious node (or clustered nodes) with majority power has better than not chance of beating the network by definition. If he does not broadcast the first winning block, but waits for two winning blocks, then the network will rightly accept his block chain. If he does not win after two blocks, then he does not broadcast at all. Because of statistical deviation, the network will be rightly concerned, but I don't think there's much the network can do about it in the short term.
But if his "lead" is so tenuous that it's only 51% then he's taking a big gamble by devoting so much hashing power to trying to produce two valid blocks before anyone else produces one.
Anyway 51% isn't a magic number if this is what a cheater is trying to do. You can with 40% hashing power try to do the same thing. You'll have less chance of getting your two block lead than the 51% guy but not a huge amount less.
EDIT:
Allow me to elaborate, and see if my math is correct.
With 51% hash power, a peer has 51% chance of producing a block before anyone else does.
The chance of producing two blocks before anyone else does is .51^2, or 26%.
With 40% hash power, the same calculation (0.4^2) is 16%.
Thus with 51% hash power your odds of being able to produce two blocks before anyone else produces one is only 10% better than with only 40% hash power. Of course you are continually attempting to produce two blocks so your chances of producing two blocks before any one else can be expressed as a function of the number of blocks that you've been trying to do this for.
| 1 | 2 | 3 | 4 | 5 | 6 |
51% | 26% | 45.3% | 59.5% | 70% | 77.9% | 83.6% |
40% | 16% | 29.5% | 40.8% | 50.3% | 58.2% | 64.9% |
I calculated the above table for each hash power H as:
f(n) = 1 - ((1 - H^2)^n)
The table shows that after 6 rounds, or approximately 1 hour at 10 minute average block generation intervals, someone with 51% hashing power has an 83.6% chance of having successfully produced two deviant blocks in a row to immediately add to the top of the block chain. Someone with 40% hashing power will have a 64.9% chance after 6 rounds of succeeding similarly. The difference of success after 6 rounds is only 18.7%, so I don't see why just having 40% hashing power would prohibit someone who wanted to cheat in this way from making the attempt (they'll just have to try a little longer, that's all; they'll need e.g. 14 rounds for 90% chance of having succeded whereas the 51% cheater only needs 8 rounds for 90% chance of having succeeded).
Someone with 75% hashing power would only need 3 rounds for 90% chance of success.
Of course, all of the above is predicated on announcing the deviant blocks as soon as they are discovered so that everyone else accepts them into the chain everyone starts over again with computing the next block.
The chance of computing 2 blocks in the same time that it takes someone else to compute 1 is more difficult to calculate. I think it may require integration. I'll have to think about it a bit ...