I am working on some firmware updates that will fix most (or all?) of the problems the boards are having. For example, one failure I think I have figured out is occasionally a board will jump to more than 100 GH/s but have 100% hardware errors. I have one board that does this once per day or so and needs to be rebooted. After chasing this for a while, I believe I have finally figured out what is going on and will have a fix in the next release.
I have a semi-dead board I recently bought from Keefe that does exactly this. It has always been prone to freezing. I received it from him last Friday, then went out of town overnight. On Saturday it was hung hard, and no amount of restarting/cooling would help it. It has been thoroughly inspected and has no visible flaws. There was an apparent short 3v3->ground, but I found some TIM under the edge of an ASIC, so overnight cleaning in an ultrasound bath should have removed all debris. (The TIM had an electrical conductivity I could measure with my ohmmeter.)
Now, it starts seemingly normally, then reports a ridiculously high speed, 100% error rate for 10 proofs, and stops. BFGminer reports it as sick, 27C. I don't know how to tell what firmware it has, etc.
Any suggestions welcomed.