Are you saying that any algorithm and combination of algorithms can be parallelized and optimized to use 100% your GPU?
Any? No, of course not. Look at CryptoNight, XMR's PoW, for example. Running that one 100% on any current GPU would be a really neat trick - seeing as memory accesses are going to just kill your GPU usage per thread. Normally, what you do in that circumstance is just run an asston of threads until you're using a decent amount of the compute capabilities... but you can't do that with CN. Why? Because each hash requires 2MiB of scratchpad, you are gonna run outta memory LONG before you run outta compute capability. So, it's a dog on GPUs and is going to be for the forseeable future, I think.
X11? No, you don't have that excuse - not even close. Yes, it uses memory, but the amount is TINY. We're talking MINISCULE levels compared to CN, and as a matter of fact, it can be reduced by (don't quote me, this is a rather rough estimate since I'm not counting bytes) probably half. This means that you're already going to run out of compute capabilities a LONG ASS TIME before you even think you might be low on memory. And even if that weren't the case - several parts of X11 can have their memory usage VASTLY reduced if you're willing to trade memory access for computations. In the current situation (being that compute capability is our bottleneck) that still is sometimes a good idea, but sometimes not. Since a LOT of memory usage and accesses can be removed from the current X11 kernel for almost no hit when it comes to computation, increasing overall speed, that makes it even less likely you'd be limited by memory.
So, no. SOME algorithms are always gonna be dogs on GPUs as far as I can tell - using very little of its power. X11 is not one of them.
By the way, combinations of algorithms are probably gonna be a LOT easier to parallelize. Why? You can do them in any order - so if one hogs memory and the other hogs compute, run them in parallel. Now, before you say I'm retarded and X11 must be done in order (which would be correct, the latter part) - consider that you are doing the same thing over and over. If I run one hash, then start another thread later, I could conceivably, by running two completely different X11 hashes at the same time on the same GPU, work it out so that they aren't synced. What I mean by that is, while thread 0 is running the first, I could have thread 1 running the fifth. By doing this, it could be possible to use more compute, if memory access were a bottleneck. (Again, it isn't.)
You're missing the whole point of distributed mining.
IT DOESN'T FUCKING MATTER whether you get 2MH or 6MH or 657MH, as long as everyone else is getting about the same.
Total hash is irrelevant, DISTRIBUTION of it matters.