Damn, you're as persistent as I am. Can't sleep with a problem. >.<
Yeah, with BGA as fucking insanely sensitive as it is, it's a goddamn miracle you got the new chips working. I was just looking at my Jally a few minutes ago as I was putting it back together (getting it back in the case, may be a bad idea) I noticed my (all-aluminum) heatsink had room to completely cover exactly 2 more chips. The heat sink design seems very "not fully thought-out" though, allowing WAY too much room for error in over-cranking the screws and cracking the fuck out of the board. I see a *lot* of people messing that up (and not assuming you have, either). Mechanically, over-tightening the screws wouldn't feel like over-tightening at all, but just barely going beyond the point where the screw head meets the board, and the board will begin to bow, causing the chips to bend away from the heatsink instead of pulling them tighter. Be really careful about that.
Yep. My method here is to start the screws, then flip the jally over and press on the bottom plate to take up slack evenly on the chips. Then hand tighten the screws down with a torx bit, stopping when they are *just* snug. That way I am not cranking into the board and I know the chips are getting finger pressure.
With temps in the 60s and 70s though, you've got issues for sure. Mine maxes out around 55-60 right now with the case on and running the 8.5gh firmware, with freshly applied Ceramique II and screws lightly tightened just to remove the rotating wobble from the board/heatsink, and a thin (credit-card-thickness) slice of cardboard between the aluminum plate and PCB for insulation and padding (WTF was BFL thinking, metal-on-PCB?!).
Remember I am now running three chips (was 4, soon to be six) so my temps are going to be high. Yes this will lower the life expectency of the unit, but after Feb it's not going to be able to mine enough coin to matter, so it doesn't matter. Might as well go for broke :-)
Error checking shouldn't affect startup though. Maybe the code you're turning on is set to check all (non-existent) chips and craps out trying to test the ones that aren't there. /shrug... random shotgun guess without sifting through the code.
It's possible that is what RUN HEAVY DIAGNOSTICS is. It's also possible that last night I was flashing it without the fan which caused the board to heat up enough to trip the "I'm too hot" sensor. Because it started working again after I went to hit the can with it unplugged. I'll try more tonight, but I really should just hard-wire the damn jtag port so I'm not always pulling the sink.
Running it with bad cores disabled cuts power usage (I think bad cores run 100% *on* and suck full power for zero benefit and take work thus slowing the overall chip down) so I'll re-flash again tonight. I think 1.2.6 might work with the newer chips, and ck has a compiled elf for that so I might give it a try. Or try Tarkin's load. Or just go back to a stock 1.2.9 with speed 7 and little_single. That worked well.
Anyway back to work.