Coblee's thread about GPU-mining and Litecoin:
https://bitcointalksearch.org/topic/thread-about-gpu-mining-and-litecoin-64239
Well, the N factor increases memory requirements for computing a single hash (thus it's using more memory and memory bandwidth). Current GPUs will quickly run out-of-memory (or there's other GPU-specific constraint that prevents the code from running at higher N, dunno). However, it also affects CPUs really hard (around 40% hashrate decrease if I remember correctly).
Nah, all you have to do is increase the lookup gap (via the previously published TMTO solution for scrypt from cgminer/reaper) and then you can compute the same hashes with less memory.
There's a probably bug in mtrlt's current code that doesn't allow calculation above N=4096, but it's possible that this particular TMTO implementation is not really optimized well for the GPU and that in the future with some hacking we'll see the gap further widen.
The further up the N value you get, the greater dependence on memory access speeds you typically observe (or at least, I observed using scrypt-jane on a CPU). I wouldn't be surprised if eventually an implementation for GPUs came along that was optimized and destroyed CPUs for efficiency and speed.
BLAKE is used as an entropy source to randomize memory access too, I wouldn't be surprised if you looked at accesses to the lookup table and found that they end up being less than random as well due to consistent ordering of some types of data in the block header (thus also diminishing the amount of memory required). I think pooler observed this when he was writing his CPU miner.