For the second one, try flashing it again with 1V1, it shouldn't go (much) above 1.1V though 1.11V wouldn't be unexpected. Does it go up to 1.15? You don't need to keep the bottom heatsinks warm during operation, the issues with the boards needing to be warmed up prior to startup with a hair dryer is exclusive to the boards made by Lucko. It shouldn't have an effect on your board.
Thanks, I was wondering about the hairdryer thing. The problem board does get above 1.15v in cgminer sometimes - that seems to be the majority of the times it's crapping out (bfgminer doesn't show it getting that high when it craps out). It also sometimes now does the 200+ GHs with practically all errors too, if that info helps.
Hmm, tried flashing it again (several times actually) - no joy. In a cold room it still craps out and often reboots itself (it seemed to behave a bit better during the day as it was warmer while I was at work but not totally fixed). Unfortunately when it comes back up the miner doesn't automatically find it (bfgminer - I think cgminer does the same thing), so I have to manually restart the miner. Not optimal, for obvious reasons. Occasionally, I have to power cycle it to get it to come back, which I can do remotely since I have a USB-controlled powerstrip (thanks to tiebing's suggestions at an old blog, works great! http://tiebing.blogspot.com/2011/01/use-linux-to-control-outlet.html )
I've also invoked with --cmd-sick to try to get it to restart the miner but that doesn't seem to do the trick since the miner is happy just letting it send error messages and misreading the temperature ("Received unexpected queue result reponse:" and "Error: temp returned empty string/timed out" and "Failed to send queue"). It seems to do this at 1.102 V and higher whereas the other one runs relatively rock solid at 1.107 V. Sometimes the bad one will creep up to 1.15 V or so before crapping out, sometimes a bit below 1.1 V. When it throws these errors hashing just slowly drops to 0GH/s, no increased errors, no increased rejects, no accept (of course), until I SBY restart or kill and restart, after which only the good chili comes up and is recognized unless I wait some number of minutes and the bad one can reboot.
Anything else I can try?
Here are some of the errors:
[2014-01-16 22:57:37] BFL 1: Error: Get temp returned empty string/timed out
[2014-01-16 22:57:37] BFL 1: Received unexpected queue result response:
[2014-01-16 22:57:38] BFL 1: Received unexpected queue result response:
[2014-01-16 22:57:39] BFL 1: Error: Get temp returned empty string/timed out
[2014-01-16 22:57:39] BFL 1: Received unexpected queue result response:
[2014-01-16 22:57:40] BFL 1: Received unexpected queue result response:
[2014-01-16 22:57:41] BFL 1: Error: Get temp returned empty string/timed out
[2014-01-16 22:57:41] BFL 1: Received unexpected queue result response:
[2014-01-16 22:57:42] BFL 1: Received unexpected queue result response:
[2014-01-16 22:57:42] BFL 1: Received unexpected queue result response:
[2014-01-16 22:57:42] BFL 1: Failed to send queue
[2014-01-16 22:57:42] BFL 1: Failed to send queue
[2014-01-16 22:57:42] BFL 1: Failed to send queue
[2014-01-16 22:57:43] BFL 1: Error: Get temp returned empty string/timed out
[2014-01-16 22:57:43] BFL 1: Received unexpected queue result response: