---------------
By definition in a 51% attack the attacker IS NOT mining the main chain so a penalty that doesn't allow him to mine on the chain he already isn't mining isn't much of a penalty is it? On the other hand in the attack chain it will be the legit miners who failed to mint a block and they will be subject to the penalty. When the attack chain is longer and the attacker broadcasts it, then it becomes the longest chain and some or all of legit miners will be penalized for up to 1440 blocks.
In Nxt both the chains (legit and hidden) will have the same cumulative difficulties coz forging power of penalized forgers is delegated to the others and total power is bumped back to 100%. But the hidden chain won't have transactions of the economic cluster and this is where extra consensus rule becomes handy.
The attacker has ~10 minutes (may be changed after we collect stats) to reveal his chain. After 10 mins all transactions will be set in stone and 51% attacks won't be possible at all. If the attacker combines 51% attack with eclipse attack then the victim will notice that the recent blocks don't have transactions of the cluster (actually there could be some coz of time difference and corrupt participants of the cluster but the ratio will be much lower than threshold).
Original design assumes that every block there will be 2-3 forgers who have to compete at certain block height. So every block at least 1 forger will be penalized. Number of forks within 10 min window is supposed to be high and users shouldn't rely on low number of confirmations.
Some forgers will attach their blocks to dead branches but they shouldn't reforge blocks after blockchain reorgs coz other forgers may see contradicting blocks (signed by the same account but belonging to different branches) and report them by including both the blocks into their own chains.