I'm still trying to get my second Chili back up and reliable. Right now, it stops talking to cgminer/bfgminer after 2-8 minutes and needs to be powered off/on and the miner restarted to resume mining. Rinse and repeat. I've posted the errors earlier (
https://bitcointalksearch.org/topic/m.4557778). Anyone have any ideas/suggestions?
So far, I've tried: reflashing the the 1.1v-limited firmware, reflowing the FTDI chip, reseating a nearby small SMD capacitor that might have gotten dislodged, but so far still unstable. Before I had the pin 5-6 shorting mishap (hit an adjacent pin at the same time during a reset, trashing my FTDI chip), it was pretty solid, though I did have to reset it about once a day. Replaced the FTDI chip with a new one, programmed it (thanks Mr. Teal!), and was back hashing, but never stable or even usable.
Help?
So, it doesn't reset or anything, it just stops talking?
Do you have a multimeter, and the next time it stops talking can you measure the voltage across C19 (one of the unpopulated ceramic capacitors between the PSU and ASICs).
When that happens, the indicator lights go dark and the miner can't talk to it anymore. It will actually reset smoe/most of the time, but of course the miner program doesn't notice that and has to restart before it'll catch again. Once it's back up it repeats the crash with these three errors randomly streaming/repeating (mostly the timed out and unexpected queue result ones):
[2014-01-16 22:57:42] BFL 1: Failed to send queue
[2014-01-16 22:57:43] BFL 1: Error: Get temp returned empty string/timed out
[2014-01-16 22:57:43] BFL 1: Received unexpected queue result response:
I'll hook it back up again once I have a free PSU and look at the voltage across C19. What should it be, and what are you hoping to see from that?
Somewhere between 0.85V and 1.15V.
Most crashes are caused by the power supply turning off for some reason, but this sounds like it might be different. Do you know which other pin you shorted out? Some of the other ones around there are the communication pins between the VRM and the microcontroller. The micro will shut the power down and reset if it reads funky values from the power supply, perhaps that could be a source of the error.
Thanks, but supposedly the non-Lucko boards don't have the hairdryer problem. Still, I might try that for good luck. Maybe the initial problem somehow duplicated the strangeness of the Lucko boards...
Lucko and another user who has both types of boards reported they could get similar results on one board if they cool them down substantially (I believe Lucko did it at -10C). I tried a couple of mine at -15C and couldn't reproduce it, but I can't rule out that there is some kind of temperature dependent thing happening. I've only heard of the two units displaying those symptoms though, and none with the ambient above freezing.