Looks like the "bug" - if we can call it that - started with the 3.6.x build:
https://bitcointalk.org/index.php?topic=28402.msg3443712#msg3443712
Yikes, you and I may have separate issues after all and/or there may be more than one issue to be found. The plot thickens. I sure hope my problem isn't right under my nose.
In that case let me try
http://ck.kolivas.org/apps/cgminer/temp/cgminer-to.exeAfter 10.75 hours the "to" version has one zombie, AMU5. I'm not sure how long it was there before I checked. I think the "to" version is the best of the recent candidates on my machine. I'll swap the zombie and keep the run going if I can.
Edit: the swap of AMU5 worked fine, but I just did a double-take looking at the allocations. The AMUs usually start out as AMU 0-12, with BAL0 somewhere in the middle or at the end. Now I have AMU0-4, BAL0, AMU7-8, 11, 14-17 - yikes it just changed again. It's saying AMU5 has gone zombie but there was no AMU5 a moment ago because it had been reallocated when the first zombie showed up. There is an LED on solid, but it's not the erupter that I just swapped. Bottom line - apparent allocation anomalies, and I get the feeling that it might have recovered from errors on its own and done some reallocations without intervention. I'll swap out the new zombie AMU5 and keep going if possible. Total run time now is 11 hours.
Edit: Getting weird. The "second AMU5" was reallocated to AMU18 when it was restarted - as expected. A few seconds later AMU3 went zombie. I think this is the same physical erupter as the original AMU5, which I restarted without moving it to a different location on the hub. Maybe I haven't had enough coffee but I thought it would be known as AMU17. Bottom line - the run seems to be getting flaky now, but my observations may include human error. I'll try to keep the run going though.
Edit: the AMU3 zombie was restarted as AMU19, as expected. Now showing AMU 0 1 2 4 7 8 11 14-19, with BAL 0 after AMU 4.
Edit: about an hour later -
the display now says that AMU3 is a zombie - but there was no AMU3 (see list above). The display also shows AMU15 has gone to hashrate zero. Two LEDs are on. I'll keep it all going if possible.
Edit: I unplugged an erupter with its LED on. The display said AMU5 had gone zombie. Five? There was no five. AMU15 had morphed to five, it seems, as it was no longer showing at zero hashrate. Plugged it back in and it got reallocated to AMU20. The list is now AMU 0 1 2 4 BAL0 AMU 7 8 11 14 17-20 then AMU3 showing as a zombie. I imagine 3 will get reallocated to 21 when I restart it.
Edit: Yes, 3 became 21 when plugged back in, and it is slowly climbing back to full hash rate.
Edit: Another hour or so later I find five zombies, listed as AMU 3,5,6,9 and 10 - none of those numbers were in use (see above). Still reported running are AMU 0, 1, 2, 4, 7, 8, 11, 17 and the BAL. I'll start the run from scratch, I think, sticking with "to" candidate.
Edit: Since the restart, the "to" candidate has run for seven hours with no errors or anomalies, still going strong.