Other than that I am (so far) not seeing the 1khash/s improvement on YAC yet, best improvement is 0.6khash/s and GPU usage went from 80% to 99%.
Going to autotune with -L 7 now... results soon!
you can run autotune with the -D flag and abort as soon as you see that there's not much to be gained from waiting longer. It's also a good idea to log the output to a file for later review.
the 2>&1 syntax merges stderr and stdout into a single output stream.
cudaminer.exe -D -d 0 -l T 2>&1 >autotune_logfile.txt