1. Got the Scrypt wrong and can still be supermined by GPUs
Thanks for your comments and links on this. I haven't had time to fully digest them - there's a lot of information there and there are some competing demands on my time. From what I've seen, adding cores does not produce linear improvements, even with the small numbers of cores in a CPU. But if MC is vulnerable to GPU mining, I'd like to see it fail fast, so I'd encourage you to post any ideas you have for how one might go about creating a GPU miner.
It doesn't need to be linear, because the FLOPS cost in GPUs is so much lower than in a CPU system.
It appears to me that what happens in a GPU (which is why Intel's hyperthreading is faster than just 4 hardware cores) is that when there are many logical threads, then thread blocks on main memory latency are not a factor, because some other thread can run which has already loaded its main memory access into cache. Thus the GPU is always able to achieve the 200+GB/s main memory throughput, because the latency is masked by the probability of numerous threads.
My idea is the way to defeat this is require so much memory for the Scrypt that the GPU can not run enough threads to get that probability to work in its favor. (Or even better run in a larger main memory footprint than any known GPU can handle, but this is not an absolute since demand for GPUs from miners can in theory drive larger main memories).
But to hash over such a large main memory footprint makes the hash slow, as MemoryCoin demonstrates at a claimed 1 second. Also CPU main memory bandwidth is an order-of-magnitude slower than for top-of-line GPUs. So my idea is to run an inner Scrypt (think of a for loop inside of a for loop) and outer Scrypt, linking them together in cryptographic sequentiality (will explain this later in pseudo-code), such that the entire algorithm becomes memory-bound at the speed of L1 cache, i.e. we force the CPU to compete with us on L1 cache speed not on FLOPS and main-memory bandwidth. So then the computation of the hash is very fast (relatively speaking to MemoryCoin's thrashing of main memory latency) and the GPU then has to compete on L1 cache speed and on having enough multiples of the main memory foot print to run that many multiples of extra threads.
So then the GPU-cost-ROI becomes tied to GDDR memory cost versus CPU system cost. I worked some numbers, and the GPU can't gain an order-of-magnitude advantage in any theoretical case (not to mention market dynamics pragmaticism).
Also we need to make the Salsa hash in the BlockMix run much more slowly than the Salsa hash in the ROMix so that GPUs can't use their superior FLOPS (and idle CU cores due to our large main memory footprint) to lower the memory requirements by recomputing at say modulo 4 of every element.
I am willing to rewrite your Scrypt if you think my idea has merit?
4. Afaik, has made no advances on improved anonymity
Actually, MC has gone backwards on this a little by always sending change to first address in a wallet to support the voting system, and because this is more intuitive for new users. Generally with MC where there is a choice between ease of use and anonymity, I'll choose ease of use. I care about privacy too, but I care about adoption more. The privacy options of Bitcoin are obviously still there in the protocol, but users will need to take more care to ensure they are utilized.
I am going to avoid talking about your holistic design because we have some disagreement, but it does not preclude me from helping you on the Scrypt refinements, which will help me prove them for the holistic coin design I want.
Open source (sharing what is mutual, separating orthogonal concepts so we can) at its best!