I could make a coin then code a GPU miner around it's new algo *mod*
Then only post a CPU miner and rape the shit out of it ..it's been done ..lots ..and YOU KNOW IT TOO !
The situation is worse than you think. For years, almost every CPU coin is actually cripplemined (to different degrees). The reason is that all the "sse, avx" ets enhanced miners,
do not use packed commands / SIMDs.
This means that if a hash has like 3 steps (usually lot more than 10), a sequential-non-packed version will go like
1. 5-10 cpu cycles : first step of hashing
2. 5-10 cpu cycles: second step
3. 5-10 cpu cycles: last step of hashing
If you load 4 different hash candidates simultaneously, in the same thread, you can "pack" them with AVX/SSE and go like
1. 5-10 cpu cycles: first step of hashing for ALL 4 hashes
2. 5-10 cpu cycles: second step of hashing for ALL 4 hashes
3. 5-10 cpu cycles: last step of hashing for ALL 4 hashes
Again, that's in the context of the
same cpu thread btw. It's kind of parallelism within the same core/thread.
In this way, Haswell can go from something like 8cycles per byte to ~2.5 with AVX2 for SHA256 and down to <1cycle/byte with AVX-512.
I initially thought coins who use memory hard algorithms are more immune, and they are to some extent, but since memory use (reduced scratchpad) can be traded for more processing work (shortcut), if the processing work is multiplied -say- by 8x, then the underlying assumptions of trading less memory use (shortcut) for (supposedly waaaaay) more cpu performance could be invalidated (to some degree). Because the assumption of what cpu power levels are, is based on ...scalar and not SIMD use.