I ran this new kernel against stock poclbm using my 5970. Although the MHash/s was +10 for the modified kernel, it ended up getting less accepted shares in the long run (several hours). That may just be terrible luck, but I tried it twice; once under Windows, and then under Ubuntu. Both times for several hours. Both times with the same results (stock poclbm with more accepted shares).
I have not tried swapping which core the respective kernels were running on, but it's been enough downtime for me today
I'm in a position to speak objectively about this as I log all my found shares.
More data would helpful, but it's only run for a few hours. Ideally I would have collected data from two cards in parallel over the same time to isolate network effects, instead I'll just exclude the extreme outliers (>90s).
Using the 1814 shares before the change and 1814 since the change on a single node (the 5870), I found that the mean time between shares before was 11.127 seconds and the mean time after was 10.8. This difference is not large enough to make the 95% confidence intervals assuming an exponential distribution, and a permutation test finds only p=0.369, so with this amount of data I can't say it made it better for _sure_ but it's certainly more likely than not, and it's also very unlikely to have made it worse.
10.8 seconds at difficulty 1 implies 397,688,225 h/s and 11.127 implies 386,000,973 h/s, which is basically what the tool shows... well, a little less— it looks like the performance was overstated a bit before and its less so now?
(The formula for hashrate from share gaps is 281474976710656/(65535*seconds)=h/s)