No challenge for me, just fun.
I did not claim that there is any stuck-at fault which will not be found by cgminer. This is of course the final end-to-end test. But that alone does not help you for compensating dead hashing cores by adjusting the chip frequency. This feature must be implemented in the miner firmware to guarantee 100 GH/s per chip. I also don’t doubt that KnC can handle this. The question is only how long it takes.
The Bitfury ASIC is a full-custom digital design, maybe they did things as mentioned by you.
The KnC ASIC is a standard semi-custom digital design, which means automatic RTL synthesis and place&route of the standard cells, done hopefully by an experienced design enablement partner of the foundry (not ORSoC) . So it is almost sure, that the supporting logic is implemented with the same 28nm standard cell lib as the hash cores. This is no problem, because the standard cell libs are most likely already silicon proven and showed good yield. But 100% yield are just impossible.
I’m not sure, if you are the right person to discuss any 28nm yield issues. How many 28nm ASICs did you characterize for yield over temperature and voltage in lab so far?
1) Fun is fine.
2) They have a whole "Embedded Linux SO-DIMM module" to handle initialization and self-test. Why waste even a single gate and single trace on dedicated test logic? Hopefully they included some sort of "clock disable" register to avoid wasting dynamic power on the engines that keep delivering erroneous nonces. This is such a trivial design that any undergrad could do that. I'm willing to give KnC the benefit of the doubt.
3) All standard-cell libraries that I've seen have some "high fanout", "high load", "low-skew clock" cells that are drawn much wider than the nominal feature size of the process. What are you trying to imply?
4) He, he. My 28nm e-peen has measure-zero. How many circuits have you implemented/characterized that had no JTAG chain (for a valid reason, not due to a fault) and were completely multi-way redundand, including power?
5) Extra-credit question for people who aren't engineers but have experience mining: how many mining devices/rigs you had that booted and started mining correctly, but kept failing after several hours or only in specific circumstances? Were you willing to consider them completely faulty and throw them away or were you willing to delve in and debug the problem? How many hours of "testing" was your cutoff time before you decided to throw that device away?
6) Extra-extra-credit question for engineers: What do you think about other engineers that have an obsesive-compulsive disorder about some testing methodology but have no practical experience running an actual bitmine?