any idea why this is?
This is exactly what happens to mine, maybe even lower, 470, 460... no idea why
Higher temps can help, true, but it might be a case of simply the boards are too far gone to help and the cores are just getting turned off due to excessive errors.
One thing that I did during the course of my experimentation was to use BFGMiner instead of cgminer. The first release did not have any mechanism to turn off the cores so it just bulled its way through. Got lots of errors but overall hash rate stayed high (measured at the pool). Later releases implemented the core disable mechanism, but I modified the source code of the knc driver to effectively turn off that functionality (changed the number of errors in a row needed for a core disabled to 10000, and time spent as disabled to 1 second). That worked pretty well too (on v.94 with the higher voltage) but the VRMs couldn't hack the extra current so I'd lose whole dies at a time over time. I haven't tried the "locked" bfgminer trick with .98 because I haven't needed to..but it might prove useful for someone with boards that continue to misbehave with .98. Tho, honestly, if it's still not working well with .98 it might be time for RMA.
Your loss If you don't try. An RMA would cost you more
Just sayin'
I've tried high temps, low temps, multiple miner programs, and every firmware released. As I said, I've got a nice working rig now with .98. I was just commenting that higher temps don't always fix the problem (I have hard data to prove it) depending on what is wrong. They can help, but are not a fix-all solution.
This sounds like a problem that previous firmwares had for most people. I wonder if you have some old files that 0.98 is not overwriting. Did you try to downgrade to one that didn't have this problem, like 0.93, do a reset, reboot, re-upgrade from 0.93 back to 0.98? Edit: reading better the things you tried you probably did already but just in case...