Yeah, I don't really think it is either. It's an interesting theoretical attack, but would probably never be able to be successfully mounted on Bitcoin. On a fledgling altcoin, though...
Right, to create the fork you only need one inconsistency. But imagine you are switching back and forth between two chains with nearly identical data in them except that the hash of a single transaction is changed in the first block of the fork. It probably wouldn't be very disruptive to most users of the system if it just switched back and forth, but all your transactions were still confirmed and all still had the same hash. Adding many inconsistencies would make using the two sides of the fork very different experiences. On one side your transactions are all processed and are fine. On the other, your transactions are removed because they depend on transaction IDs that are no longer in the block chain...
I'm not saying that the attacker should try to cause further forks on one side of the dual fork-chain at all. They would just put inconsistencies between the two main growing chains, not cause further forking. When the power is split between three chains, it takes an increasingly long amount of time to make the second strongest chain catch up to the main chain (the strongest chain).
Remember, a valid chain contains all valid transactions and no double spends. As long as there is a longest valid chain, even if the 51% attacker is the one that forked and extended it, everything still works. Now an attacker could use 51% to block or filter transactions etc., things already discussed, but that's not the attack you're describing.
That's interesting, but I'm not sure I see how it could work just yet. It sounds like you're saying that the network would introduce a third chain to split mining power between. But the network has to accept the chain with the most work as the correct chain. And I don't see how splitting the networks mining power amongst two chains would help things, it basically makes the 51% attack become a 67% attack because the network is splitting its time between two chains.
Maybe you could explain the defense you are proposing a bit more. It seems like if the network did somehow force the attacker to split their time mining on three chains, the attack would still work, it would just take increasingly long before the each chain replacement. In general though, I don't see how it could work, because anyone with less than 50% of the hashing power essentially has to choose the chain with the most work to make sure they stay in consensus with the rest of the network. But anyone with more than 50% of the hashing power (i.e. the attacker) does not have to follow this rule, as they can create valid blocks faster than the rest of the network.