Main surprises are lower thread concurrency, and thread concurrency and a modest worksize gave me better performance than the prescribed numbers. I'm underclocking gpu and memory to keep heat down, I can reach 3Mhs with "--gpu-engine 1000 --gpu-memclock 1550" but with heat problems >95C.
This got me to 2.66Mhs (scrypt) stable but hot ~90C. Pretty exciting, since I was getting 900khs to 1.2Mhs and constantly overheating. Planning on re-applying heat grease, and undervolting soon.