I thought I'd take a try at integrating some of the recent changes in both phatk kernel threads. (here and here)
I've also changed the method used to specify various #defines for the kernels by simply specifying the defines as build options for clBuildProgram() instead of editing the .cl text on the fly. (so much simpler!)
Compared to the current kernel in cgminer 1.5.3, this does benefit from reduced ALU OPs, although not as much as either of the other phatk kernels:
1.5.3:
VLIW4: 1699 ALU OPs
VLIW5: 1376 ALU OPs
Mine:
VLIW4: 1694 ALU OPs
VLIW5: 1368 ALU OPs
My changes are here at github, but it's all in one big ugly commit, and not broken out by the individual changes, so i don't feel comfortable in the slightest offering a pull request to CK for it.