I'm too tired atm, perhaps I didn't understand what you mean ...
I don't see where's a serious bottleneck or some conspiracy. Since X11 (and hefty, keccak and those others from sph-sgminer) is not memory hard, you're not stressing the memory controller, L2 caches and the ram chips.
You're used to scrypt, see it as a "reference" and say that others must not be optimized due to low power use and thermals. However, scrypt is the odd one out in the first place, a complete card fuck like Furmark or some videocard stress test.
But that's the point. Imagine if I optimized a game to push my GPU as hard as Furmark does just to squeeze some extra work out of it. Both of us run the game and while you get 32 FPS using the exact same hardware as I am, I'm getting 48 FPS due to my secret optimizations under the hood. I know this is a very terrible analogy, but think about it.
None of this would be a problem if pushing the GPU or limiting the GPU is a user choice, much like scrypt. Let's say I mine a Scrypt coin and get 500Kh/s on my GPU stock, but if I OC and OV some I can push it to 585Kh/s; now durring the summer months this might be a problem where you live due in part to extreme heat, increased electricity price and increased power consumption, so I go ahead and underclock 50% and undervolt to about 60% and now get 255Kh/s. This makes sense to me; what doesn't make sense is claiming 50% less heat and power consumption while still hashing at full capacity.
Well, it could be hashing "at full capacity" with less heat, because the hash code is different. There are parts of the chip left unused, the code itself doesn't demand operations from what's memory related. There are not random addressing jumps like scrypt. Those parts just sit there because they just have nothing to do, while the arithmetic core is already at 100% usage.
I'll try to give a counter-example; the so called "CPU only" Heavycoin where the devs struggled to destroy paralellism and prevent GPU mining. It took 2 weeks(?) for the first heavycoin GPU miner to be mentioned. cgminer-heavy work-in-progress (that one needed to compile from source) could only reach 7(?) Mh/s on a 280x in the very beginning, then it raised to 11, to 15 Mh/s and I don't know how much it is now. Christian's ccminer could reach 13 Mh/s on a 750Ti and for a few hours it beat a R9-290. Reorder had to program, operate the pool, go on with his life, etc... Was someone holding the R9-290 speed vs a simpler 750Ti due to a conspiracy? No, of course not. It takes massive brain power, time, skill and personal effort to optimize miner code. Now, go and mine Heavycoin at the maxed out performance, and look at the temperatures.
Of course, I'm not trying to distract from the possibility that there are secret X11 miners out there and a few guys are mining at much higher speed than the rest of us. It could be! There have been lots of suspicions (or even confirmations) regarding private miners or optimizations during last months. However, that's a different discussion from what I'm trying to get at here.