I see your confusion. You've confused propagation with verification.
Again, this isn't about creating a too-big block, or even an especially large one (though the larger it is the longer the verification time can be).
It is about creating blocks of acceptable size with valid but complex transactions.
Yes, other miners may decide to implement some trap for blocks that take a long time to verify. Doing so runs the risk that not everyone else will do that, and get stuck mining on the not-longest chain.
If the block is discarded and they continue on the previous, then they are forking at risk of orphaning. Either way the miners have a risk and the complicated TX miner has an advantage... unless everyone does the same thing (and they won't).
does your above argument depend on the existence of the relay network or not?
The debate might be helped with a graphic of network topology types. Bitcoin is the bottom right "irregular" type, except far larger and messier than the simple example here. This makes it infinitely more robust than all the other types. Call that Satoshi genius or simply an emergent property of decentralization or both.
Let's say that a "rogue" block gets mined by a node and sent to its peers. For illustration this might be too big for most bandwidth e.g. 100MB (leaving aside the message size limits) or a deviously constructed bloated tx block of 5MB which takes 3 hours to validate. If this rogue block can't be efficiently handled then it borks the node's immediate peers for a period of time. However, the rest of the network continues in a viable state. If the block is finally processed the invs to the next peers get ignored because the chain has moved on and the rogue block orphaned.
We immediately see that a fully-connected network has a far higher risk of systemic failure because a rogue block could seize up all nodes simultaneously, yet an irregular (mesh or multi-hop) network will be largely unaffected. Each node having just a dozen connections means that the other 6000 nodes keep on tracking oblivious to the rogue block. This is a reason why "5 nodes worldwide" is useless, and thousands are best.
This design feature is compromised if nodes forward blocks without validation, or if a central block transmission method exists which makes the mining topology more like a bus or star. That's one reason why I really like IBLT as a block propagation efficiency method: it retains the inherent robustness of an irregular topology when all nodes participate in being able to encode (mine) and decode (read, verify).
Agree on the resiliency of topology, but consider...
This threat is not from a bandwidth attack with a big block, for it to work, the block size of the "rogue" has to be acceptable, the transactions valid.
Implementation of IBLT helps only if this rogue is not done intentionally by a miner (and not broadcasting the long-validation-time transactions until included in the winning block).
It gets interesting when the verification takes longer than the propagation which as we've seen can be done with a high volume of outputs and inputs, and possibly more so by ordering inputs/outputs within the TXs and block to avoid validation and processing shortcuts.
Consider further that this is merely a single edge case threat. When it comes to corner cases (multiple edge cases), you can get other failure conditions in more rare circumstances. Any hard fork presents at least one additional edge case.
The relay network topology may matter in picking the degree of complexity to have the desired effect, (even a 25 second validation gives about a 5% head start on the 10 minutes, which may be enough to increase profit above the competition). The relay network topology can also be utilized by picking your peers to make the method more likely to succeed.
I'm a fan of increasing the transaction rate, but I'm more interested in the Bitcoin Experiment being successful in the long run. I think we'll get there, but don't be so impatient as to stumble over the things to do first.